Adobe is set to defend itself against a class action lawsuit in the United States that alleges the company unlawfully used copyrighted books to train its AI models.
The case was filed by Oregon-based author Elizabeth Lyon, who claims Adobe trained its AI systems on her work and the work of many other authors without permission.
At the center of the lawsuit are Adobe’s SlimLM small language models, which are used for document assistance features on mobile devices.
Table of Contents
The dataset at the heart of the dispute
Adobe has denied the allegations, stating that SlimLM was trained on SlimPajama-627B, an open-source dataset released in 2023 by Cerebras.
However, the lawsuit argues that SlimPajama is a derivative dataset built on RedPajama, which allegedly includes Books3. Books3 is a controversial dataset containing nearly 200,000 pirated books that has already featured prominently in other AI copyright disputes.
Lyon’s argument is that because SlimPajama incorporates RedPajama and therefore Books3, it necessarily includes copyrighted material that was used without authorization.
The complaint states that Adobe “repeatedly downloaded, copied, and processed” these works during both the preprocessing and pretraining phases of its AI models.
Not the first legal battle over Books3
This is far from the first time that RedPajama or Books3 have been named in copyright litigation.
Both datasets have previously appeared in lawsuits against major technology firms, including Apple and Salesforce, as courts begin grappling with how copyright law applies to large-scale AI training.
The Adobe case adds to the growing list of legal challenges facing AI developers accused of relying on datasets that include copyrighted material scraped from the internet.
What the plaintiff is seeking
Lyon has stated she is “committed to vigorously prosecuting this action on behalf of the other members of the class” and claims she has the financial resources to pursue the case.
The lawsuit seeks:
- Statutory and other monetary damages
- Reimbursement of legal and attorney fees
- A judicial declaration that Adobe engaged in willful copyright infringement
If the case proceeds, it could further clarify the legal risks associated with training AI models on large third-party datasets, even when those datasets are labeled as open source.
Adobe response pending
Adobe has not yet issued a formal public response to the lawsuit beyond its position that SlimLM was trained on an open-source dataset.
As courts continue to examine how AI training intersects with copyright law, this case could become another landmark moment for the future of generative AI development.

