ChatGPT in Court: GEMA vs. OpenAI – First Landmark AI Ruling!

On November 11, a ruling was handed down that will likely shape legal and technological debates in Europe for years to come: The Munich Regional Court I ruled that OpenAI violates German copyright law with ChatGPT. GEMA sued – and won.

This ruling carries weight. Not only because it’s the first of its kind across Europe, but because the 42nd Civil Chamber of Munich Regional Court I – specialized in copyright law – examined the technical foundations in depth and rejected all three of OpenAI’s main arguments.

A. What Actually Happened?

GEMA represents German songwriters as a collecting society. They discovered that ChatGPT is capable of reproducing song lyrics from well-known works verbatim. “Atemlos” by Kristina Bach? ChatGPT outputs the complete text. “Wie schön, dass du geboren bist”? Also retrievable – word for word.

In total, nine well-known German songs were at issue. GEMA argued: OpenAI used these texts during training, they are stored in the model, and upon request, ChatGPT reproduces them in full. Without a license. Without permission.

OpenAI defended itself with three central arguments that may initially seem plausible – but the court was not convinced.

B. Argument 1: “We Don’t Actually Store Any Data”

OpenAI’s first line of defense was: The language model doesn’t store specific training data, it merely learns statistical patterns – the texts themselves are not contained within it.

The court did not follow this argument – and made clear: Language models can indeed memorize copyrighted content, meaning they can permanently represent it internally.

The key term is “memorization”. The court phrased it as follows:

“Such memorization occurs when the unspecific parameters during training do not merely extract information from the training dataset, but rather a complete adoption of the training data is found in the parameters specified after training.”

In other words: Even if storage functions differently than in a traditional database, language models can fully adopt content in a legally relevant manner.

The judges verified this through concrete text comparisons. Result: The song lyrics generated by ChatGPT were almost completely identical to the originals. The matches were so specific that coincidence could be ruled out.

This is relevant under copyright law: If content is stored in the model in such a way that it can be reproduced nearly identically on demand, this constitutes a reproduction within the meaning of § 16 German Copyright Act (UrhG) – and this is inadmissible without permission.

The key point: The court made clear: It is irrelevant that storage occurs in probability values. Copyright law is technology-neutral. If reproduction is technically possible, a copy exists. Period.

C. Argument 2: “But Text and Data Mining is Allowed!”

OpenAI invoked the so-called text and data mining exception (§ 44b UrhG). It permits automated analysis of copyrighted content – for example, to identify statistical patterns.

The legislator’s intention: Purely analytical evaluation of protected content – such as recognizing word frequencies or language patterns – should be permitted, as long as no economically exploitable use occurs.

OpenAI argued: That’s exactly their approach – they analyze texts to learn language patterns. That’s classic text and data mining.

The court disagreed.

The exception allows preparatory actions – such as converting texts to another format or temporary caching. What it does not allow: permanent storage of protected works in the model.

“If, as in the present case, not only information is extracted from training data during phase 2 of training, but works are reproduced, this does not constitute text and data mining.”

Put differently: If you merely determine that German songwriters frequently write about feelings, you’re conducting permissible TDM. However, if you store the complete song lyrics and output them on request, you’re reproducing.

Analogous Application? Ruled Out.

The court expressly rejected an analogous application of the TDM exception – the balance of interests is fundamentally different.

With mere information extraction, no exploitation interests of copyright holders are affected – that’s why the law provides no compensation requirement here.

With memorization, however, there clearly is. Allowing an exception here, without compensation obligation, would leave copyright holders largely unprotected.

Important: The risk of memorization lies with OpenAI. They developed the model, chose the training data – and thus bear the responsibility.

D. Argument 3: “That’s the Users, Not Us!”

OpenAI’s third argument I found almost brazen: “We don’t generate the outputs. Users do with their prompts.” If someone asks ChatGPT for the lyrics to “Atemlos,” then the user is the “producer” of the response. Therefore the user must be liable, not us.

The court rightly rejected this.

The outputs were generated through “simply formulated prompts”. In other words: It’s enough if someone writes “Give me the lyrics to Atemlos,” and ChatGPT outputs them completely. There’s no creative contribution from the user. The model does all the work.

“Thus, the models operated by the defendants significantly influenced the outputs issued, the concrete content of the outputs was generated by the models. The mere triggering of reproduction by entering a prompt does not lead to considering the user as the reproducer.”

Translated: OpenAI is liable.

E. Why This Matters (Even If You’re Not an AI Developer)

The good news for entrepreneurs who aren’t AI developers: The ruling makes clear that responsibility lies with the operator, not the user.

The less good news: This doesn’t mean you’re completely free. Especially if you commercially exploit the outputs. If you publish the generated text, you’re responsible for the publication.

The principle corresponds to Google Images: Google is liable for indexing images. But if you take someone else’s photo from Google Images and put it on your website, you are liable for unauthorized use.

F. The International Context (UK Does It Differently)

Brief digression, because it’s illuminating: While Germany convicts OpenAI, the UK has taken exactly the opposite path.

In November – almost simultaneously with the GEMA ruling – the High Court in London ruled that Stability AI (makers of Stable Diffusion) did not violate copyright. Getty Images sued – and lost.

Why? Because the training didn’t take place in the UK. And copyright is territorial. If the infringing act happens outside the UK, British courts lack jurisdiction.

Training Abroad = Legal Arbitrage?

This means: AI companies simply train abroad – in countries with lax copyright laws or in countries where they can’t be sued anyway. And then they roll out the model worldwide.

In the UK, this works (for now). In Germany, it doesn’t. The Munich court said: We don’t care where you trained. ChatGPT is used in Germany, the outputs are delivered to German users, therefore German law applies.

EU vs. UK: The AI Act requires AI providers to demonstrate that they work in compliance with copyright – even if training took place outside the EU. The UK has no AI Act after Brexit. That’s why British rights holders are less protected than German ones.

G. What Comes Next?

OpenAI will likely appeal – so the final word has not yet been spoken. However, the ruling is carefully reasoned: The 42nd Civil Chamber of Munich Regional Court I examined the technical foundations thoroughly, and the argumentation is internally consistent. Complete reversal therefore seems rather unlikely.

The fundamental statement – memorization is reproduction, TDM exceptions don’t apply – is likely to stand.

Parallel Lawsuits in the USA

Similar proceedings are also pending in the USA – such as the New York Times lawsuit against OpenAI or the copyright suits by numerous artists against Stability AI and Midjourney. These cases can be expected to take careful note of the Munich ruling.

The Future: Licensing

Long-term, a licensing model will likely establish itself – analogous to the music industry. Collecting societies like GEMA will negotiate framework agreements with major AI providers. Flat-rate licensing fees could enable legally secure use of copyrighted content in the training process.

The alternative – licensing every single use in advance – is factually unworkable given billions of training data points. Until then, however, the legal situation remains uncertain. And legal uncertainty is a grave risk for startups.

H. My Conclusion

The GEMA ruling is a milestone. Not solely because of its result, but because of the legal precision and technical depth with which the 42nd Civil Chamber argued. The court understood what “memorization” means in the context of generative AI – and clearly established that this is relevant under copyright law.

The days of “Move fast and break things” are over. At least in copyright law.