Copilot Reveals Private GitHub Pages, Leading to Removals by Microsoft

A screenshot illustrating that Copilot still has access to tools that Microsoft attempted to remove from GitHub.
Credit: Lasso
Understanding the Copilot Data Access Issue
Recent investigations by Lasso uncovered that Microsoft’s attempts to secure sensitive information involved restricting access to a unique Bing user interface, which was once accessible at cc.bingj.com. Despite these efforts, private information persisted in the cache, allowing Copilot to access data not available to average users.
Details of the Findings
The Lasso research team highlighted that even after disabling Bing’s cached link feature, users could still see these cached pages in search results. This suggested that Microsoft’s solution was merely a temporary fix that did not eliminate the underlying data.
“When we revisited our investigation of Microsoft Copilot, our suspicions were confirmed: Copilot still had access to the cached data that was no longer available to human users. In short, the fix was only partial, preventing human retrieval but not stopping Copilot.”
The Importance of Secure Coding Practices
Developers often make critical mistakes by embedding security tokens, sensitive keys, and other confidential data directly into their code, which goes against established best practices. Such oversights can lead to severe security breaches, especially when this code enters public repositories.
Consequences of Exposed Credentials
When developers realize their sensitive information has been exposed, they often change the repository status to private to mitigate damage. However, Lasso’s findings indicate that this strategy does little to help; once credentials are exposed, they remain compromised. The only effective measure is to rotate all affected credentials to safeguard the system.
Legal Implications of Data Exposure
Switching a repository from public to private does not fully address the issues of leaked data. Microsoft has faced various legal challenges due to tools exposed on GitHub that allegedly breached multiple laws, such as the Computer Fraud and Abuse Act and the Digital Millennium Copyright Act. Efforts to remove these tools from the platform were successful, yet Copilot continues to provide access to them, undermining these legal actions.
Microsoft’s Stance on the Issue
In response to this situation, Microsoft clarified in a statement: “It is commonly understood that large language models are often trained on publicly available information from the web. If users wish to prevent their content from being utilized for training these models, they are encouraged to keep their repositories private at all times.”
This article has been restructured and rephrased while preserving the core content, making it easier to understand. The use of headings and lists also improves readability, ensuring the information conveyed is clear and concise.