OpenAI Must Hand Over 20 Million ChatGPT Logs in High-Stakes Copyright Battle With News Outlets
OpenAI faces a major legal test as a Manhattan federal judge has ordered the company to turn over 20 million anonymised ChatGPT conversation logs to The New York Times, Chicago Tribune, and other media organisations.
The chats, drawn from users’ interactions with the AI, are considered key evidence in a lawsuit alleging that OpenAI’s language models reproduced copyrighted news content without permission.
Judge Rejects Privacy Objections and Orders Anonymised Data
U.S. Magistrate Judge Ona Wang ruled against OpenAI’s privacy-related objections, noting that the company must remove names, email addresses, phone numbers, and other identifying information before handing over the logs.
Wang stated,
“There are multiple layers of protection in this case precisely because of the highly sensitive and private nature of much of the discovery.”
Once anonymisation is complete, OpenAI has seven days to submit the records.
OpenAI has expressed concern over the precedent this ruling could set, highlighting that the vast majority of user chats—estimated at tens of billions—have no connection to the copyright claims.
Dane Stuckey, OpenAI’s Chief Information Security Officer, warned that such demands “disregard long-standing privacy protections” and could undermine user trust.
Media Outlets Claim Evidence of Content Misuse
The New York Times and newspapers owned by MediaNews Group argue that the logs will reveal whether ChatGPT has been generating paragraphs closely mirroring their articles.
Frank Pine, executive editor of MediaNews Group, criticised OpenAI’s resistance:
“OpenAI’s leadership was hallucinating when they thought they could get away with withholding evidence about how their business model relies on stealing from hardworking journalists.”
The plaintiffs maintain that the case is not intended to block AI development but to ensure fairness and compensation for journalistic work.
They also suggest that the AI could produce copyrighted content without any user prompt, emphasising the need to scrutinise the model’s outputs.
Scope of the Lawsuit and Broader Implications
The lawsuit, first filed in 2023, is part of a wave of copyright claims targeting technology companies including Microsoft, Meta, and Google.
The plaintiffs contend that AI developers used copyrighted material without permission to train their models, resulting in outputs that replicate or summarise paywalled content.
Judge Wang highlighted the proportionality of the 20 million logs, noting that they represent less than 0.05% of all ChatGPT logs retained by OpenAI.
She emphasised that the chats are relevant both to the claims of content reproduction and OpenAI’s defence regarding other user activity.
Legal experts suggest that the ruling signals a willingness by courts to hold AI companies accountable while balancing user privacy.
The decision also raises broader questions about the transparency of AI training processes, intellectual property rights, and the potential need for licensing arrangements between publishers and AI developers.
Could Privacy Concerns Persist Despite Anonymisation
While Wang confirmed that privacy safeguards are in place, experts warn that anonymised data could potentially be reverse-engineered to reveal sensitive information.
OpenAI and other AI companies have stated that user data is not permanently stored, but this case underscores the tension between user confidentiality and corporate accountability.
Industry Impact May Reshape AI Data Practices
For publishers, the case represents an opportunity to secure recognition—and potentially compensation—for the use of their content.
Competitors such as Microsoft and Meta are under similar scrutiny, and licensing deals could become a new norm for AI companies relying on journalistic material.
At the same time, AI developers may explore advanced anonymisation techniques or smaller-scale models to reduce infringement risks.
The Ruling Challenges AI’s Role in Media and Data Rights
Coinlive sees this ruling as a turning point for the AI industry.
The decision forces AI companies to confront how they collect, store, and utilise content while navigating user privacy concerns.
It also raises questions about the sustainability of AI models built on unauthorised data and whether future innovations will require clearer ethical and legal frameworks.
The balance between transparency, intellectual property, and user trust is now under unprecedented scrutiny, and how AI firms respond could define the next chapter of digital media and AI regulation.