What Data Is Used to Train Microsoft’s AI?

Microsoft and Data Use for AI Training
Recently, there have been significant online discussions surrounding the possibility of Microsoft using customer data from its Word and other Microsoft 365 products to train its artificial intelligence (AI) and large language models (LLMs). Many users have expressed concern about how their data is being utilized. However, Microsoft has clarified that it does not leverage data from Word or any of its Microsoft 365 applications for this purpose.
Understanding Microsoft’s Data Sources
Types of Data Utilized
According to Microsoft, the company utilizes various de-identified data sources to train its AI systems. This data can include aspects such as:
- Bing search data
- Interactions on MSN
- Conversations with Copilot
- Advertising engagements
De-identified data refers to information from which any personal identifiers—such as names, email addresses, or account numbers—have been removed or hidden. This practice ensures that data cannot be traced back to individual users.
Additional Privacy Measures
To further protect user information, Microsoft implements various measures like removing metadata from images and blurring user faces in visual content. In a statement, a Microsoft spokesperson emphasized that the company relies on “a variety of data sources, including publicly available information,” while adhering to copyright and intellectual property laws. Microsoft is committed to responsibly developing AI technology.
Opting Out of Data Usage
If you wish to prevent your interactions with Microsoft’s Copilot from being used for AI training, you have the option to opt out. Changing your privacy settings is a straightforward process:
On Windows:
- Open the Copilot application.
- Click on your profile name or ‘Account’ in the settings menu.
- Navigate to ‘Privacy’ > ‘Model training’.
- Toggle off the ‘Model training’ option.
On Microsoft Edge:
- Open Microsoft Edge.
- Click on the menu (three dots) and select ‘Settings’.
- Go to ‘Sidebar’ > ‘Copilot’ > ‘Copilot Settings’.
- Disable ‘Model Training on Text’ to opt out.
After opting out, Microsoft assures that your past, present, and future conversations with Copilot will not be used in training the AI model. Take note that changes may take up to 30 days to fully take effect.
What Data Microsoft Will Not Use
Microsoft has outlined specific types of data that will not be used to train its AI models:
- Data from commercial customers or individuals using a Microsoft 365 organizational or personal subscription.
- Users who are not logged into Copilot with a Microsoft account or a supported third-party authentication method.
- Authenticated users who are under 18 years of age.
- Users who have chosen to opt out of data usage.
- Users located in nearly 40 countries, including Brazil, China (excluding Hong Kong), Israel, Nigeria, South Korea, and Vietnam, where AI features are accessible but data is not used for training.
In addition to these exclusions, Microsoft does not use certain types of sensitive data, including:
- Microsoft account profile information.
- Contents of emails.
- Contents from files uploaded to Copilot.
While conversations related to these uploaded files may be utilized, Microsoft ensures that any associated imagery is de-identified to enhance privacy protection.
Throughout its operations, Microsoft maintains a strong commitment to responsible AI practices. The company prioritizes safeguarding user data and ensuring that privacy regulations are adhered to, reinforcing that data protection is integral to their mission. The spokesperson stated, "At Microsoft, we take our commitments to responsible AI seriously."