Training Data on Trial Comprehensive

# Training Data on Trial: AI's First Fair Use Test > A Systematic Analysis of Transformation, Market Harm, and the Four-Factor Framework **By Paul Roberts** | 📊 Comprehensive Guide 📊 **15,200 words** | ⏱️ **60 min read** #Legal_Update #Fair_Use #Comprehensive #Copyright December 27, 2024 --- ## Abstract This Article provides comprehensive analysis of how United States district courts have applied the fair-use doctrine to large-scale artificial-intelligence training in three landmark 2025 decisions. The analysis proceeds in four parts: (I) examination of the statutory framework under 17 U.S.C. § 107; (II) detailed case-by-case application of all four statutory factors in *Thomson Reuters Enters. Ctr. GmbH v. Ross Intelligence Inc.*, No. 1:20-cv-613-SB (D. Del. Feb. 11, 2025), *Bartz v. Anthropic PBC*, No. 3:23-cv-04648-WHO (N.D. Cal. June 23, 2025), and *Kadrey v. Meta Platforms Inc.*, No. 3:23-cv-04647-VC (N.D. Cal. June 25, 2025); (III) synthesis of doctrinal principles emerging from the trilogy; and (IV) implications for future AI litigation. The decisions establish that courts evaluate AI training through function-specific analysis of transformation: training that creates commercial substitutes for copyrighted works fails the fair-use defense, while training that extracts statistical patterns without market substitution qualifies as transformative intermediate copying. All three courts emphasized the primacy of factor four (market effect) and required empirical evidence of harm rather than speculation about hypothetical licensing markets. The opinions collectively define the initial boundary for AI fair use: analytical repurposing of expression without competitive displacement. --- ## Table of Contents **I. Introduction: Fair Use and Technological Change** A. The Statutory Framework B. The Challenge of AI Training C. Roadmap for Analysis **II. Case Analysis: Application of the Four-Factor Test** A. *Thomson Reuters Enters. Ctr. GmbH v. Ross Intelligence Inc.* 1. Factual Background and Procedural Posture 2. Factor One: Purpose and Character of Use 3. Factor Two: Nature of Copyrighted Work 4. Factor Three: Amount and Substantiality Used 5. Factor Four: Effect on Potential Market 6. Overall Balancing and Holding B. *Bartz v. Anthropic PBC* 1. Factual Background and Procedural Posture 2. Factor One: Purpose and Character of Use 3. Factor Two: Nature of Copyrighted Work 4. Factor Three: Amount and Substantiality Used 5. Factor Four: Effect on Potential Market 6. Overall Balancing and Holding C. *Kadrey v. Meta Platforms Inc.* 1. Factual Background and Procedural Posture 2. Factor One: Purpose and Character of Use 3. Factor Two: Nature of Copyrighted Work 4. Factor Three: Amount and Substantiality Used 5. Factor Four: Effect on Potential Market 6. Overall Balancing and Holding **III. Doctrinal Synthesis: Emerging Principles** A. Transformation as Function-Specific Analysis B. Intermediate Copying When Non-Expressive C. Market Harm Evidentiary Requirements D. Diminished Weight of Creative Nature in Analytical Uses **IV. Implications and Open Questions** A. Framework for Future AI Litigation B. Unresolved Issues 1. Shadow Library Sourcing 2. Emergent Licensing Markets 3. Hybrid Training Pipelines C. Congressional Consideration **V. Conclusion** **Table of Authorities** --- ## I. Introduction: Fair Use and Technological Change ### A. The Statutory Framework Section 107 of the Copyright Act establishes fair use as an affirmative defense to infringement claims. The statute provides: > [T]he fair use of a copyrighted work . . . for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. 17 U.S.C. § 107. The statute directs courts to consider four non-exclusive factors: 1. The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; 2. The nature of the copyrighted work; 3. The amount and substantiality of the portion used in relation to the copyrighted work as a whole; and 4. The effect of the use upon the potential market for or value of the copyrighted work. *Id.* The Supreme Court has characterized fair use as a "flexible concept" whose application "depends on context" and requires "case-by-case analysis." *Campbell v. Acuff-Rose Music, Inc.*, 510 U.S. 569, 577 (1994). The Court has emphasized that the statutory factors "are to be explored, and the results weighed together, in light of the purposes of copyright." *Id.* at 578. The ultimate inquiry is whether the secondary use serves copyright's purpose of promoting science and the useful arts without impermissibly substituting for the original work in the marketplace. *See Google LLC v. Oracle Am., Inc.*, 593 U.S. ___, slip op. at 18–19 (2021). ### B. The Challenge of AI Training Modern artificial-intelligence systems, particularly large language models (LLMs), require training on massive datasets of expressive works. The training process involves: 1. Ingestion of copyrighted works in their entirety; 2. Computational analysis to extract statistical relationships among words, syntax, and semantic patterns; 3. Conversion of those relationships into numerical parameters (weights) that enable the model to generate new text; and 4. Deployment of the trained model to produce outputs that may or may not resemble training materials. This process presents novel questions under § 107: - Does large-scale copying for AI training constitute transformative use when the purpose is analytical rather than expressive? - How should courts evaluate the creative nature of training materials when the use is non-expressive? - What quantum of copying is reasonable when complete works must be ingested to achieve the transformative purpose? - Does AI training harm existing markets or create new derivative markets that copyright owners should control? The 2025 district court decisions addressed each question through systematic application of the four statutory factors. ### C. Roadmap for Analysis This Article proceeds as follows: **Part II** provides comprehensive case-by-case analysis of the three district court decisions, examining how each court applied all four statutory factors to the specific facts presented. **Part III** synthesizes the doctrinal principles emerging from the trilogy, identifying common analytical frameworks and areas of convergence. **Part IV** discusses implications for future AI litigation and identifies unresolved questions that may require further judicial development or Congressional intervention. **Part V** concludes with observations about the adaptability of fair-use doctrine to technological change. --- ## II. Case Analysis: Application of the Four-Factor Test ### A. *Thomson Reuters Enters. Ctr. GmbH v. Ross Intelligence Inc.* No. 1:20-cv-613-SB (D. Del. Feb. 11, 2025) (Bibas, J.) #### 1. Factual Background and Procedural Posture ##### a. The Parties and Their Products Thomson Reuters Enterprise Centre GmbH ("Thomson Reuters") publishes Westlaw, a comprehensive legal-research platform. Westlaw provides access to judicial opinions supplemented by proprietary editorial features, including: - **Headnotes**: Short summaries identifying key legal principles within opinions; - **Key Number System**: A hierarchical classification system organizing legal concepts into categories and subcategories; and - **Citator Services**: Tools tracking subsequent treatment of cases. *Thomson Reuters Enters. Ctr. GmbH v. Ross Intelligence Inc.*, No. 1:20-cv-613-SB, slip op. at 2–3 (D. Del. Feb. 11, 2025). Ross Intelligence Inc. ("Ross") developed an AI-powered legal-research tool intended to compete with Westlaw. Ross sought to create a platform where attorneys could pose natural-language questions and receive relevant case law with minimal manual research. *Id.* at 3. ##### b. Ross's Training-Data Acquisition Strategy Ross initially approached Thomson Reuters seeking a license to use Westlaw headnotes and the Key Number System as training data for its AI models. Thomson Reuters declined the licensing request. *Id.* at 4. Ross then contracted with LegalEase Solutions LLC ("LegalEase"), a third-party content provider, to generate training materials. LegalEase employed attorneys who accessed Westlaw through their own or their law firms' existing subscriptions. Those subscriptions authorized research use but did not permit reproduction, derivative creation, or commercial resale of Westlaw's editorial content. *Id.* at 4–5. LegalEase instructed its contractors to draft legal questions based on Westlaw headnotes but "not to copy them verbatim." *Id.* at 4 n.3. In practice, however, the resulting materials—termed "Bulk Memos"—frequently paraphrased or closely tracked Westlaw headnotes. The court found that many Bulk Memos "were built from Westlaw headnotes," incorporating protected editorial expression into question-and-answer pairs. *Id.* at 5. Between 2018 and 2019, LegalEase sold Ross approximately 25,000 Bulk Memos. Ross incorporated these materials into its AI training corpus. *Id.* at 5–6. ##### c. Discovery and Infringement Finding Thomson Reuters identified 2,243 specific headnotes that appeared in Ross's training materials. The court held that these headnotes constituted copyrightable editorial works and that Ross had infringed them through unauthorized copying. *Id.* at 6–7. The court characterized the infringement chain as follows: 1. LegalEase attorneys accessed Westlaw headnotes via subscription; 2. LegalEase contractors copied or paraphrased headnotes into Bulk Memos; 3. LegalEase sold the Bulk Memos to Ross; 4. Ross incorporated the Bulk Memos into its AI training dataset; and 5. Ross deployed the trained AI system as a competing legal-research product. *Id.* at 6–8. ##### d. Procedural Posture Thomson Reuters moved for summary judgment on liability and for a determination that Ross's fair-use defense failed as a matter of law. The court granted the motion, holding that undisputed facts established infringement and that fair use did not apply. *Id.* at 16–17. #### 2. Factor One: Purpose and Character of Use ##### a. Analytical Framework The court framed the first-factor inquiry under *Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith*, 598 U.S. 508 (2023), which emphasizes whether the secondary use shares the same purpose as the original work. The court stated: > If an original work and a secondary use share the same or highly similar purposes, and the second use is of a commercial nature, the first factor is likely to weigh against fair use. *Ross*, slip op. at 16–17 (quoting *Warhol*, 598 U.S. at 523). The analysis proceeds in three steps: 1. Identify the purpose and character of the original work; 2. Identify the purpose and character of the secondary use; and 3. Determine whether the secondary use is transformative (serving a different purpose) or merely substitutive (serving the same purpose). *Id.* at 17. ##### b. Commercial Nature of Ross's Use Ross conceded that its use was commercial. The company intended to generate revenue by selling subscriptions to its AI-powered legal-research platform. The court observed that Ross "stands to profit from exploitation of the copyrighted material without paying the customary price." *Id.* at 17 (quoting *Harper & Row Publs., Inc. v. Nation Enters.*, 471 U.S. 539, 562 (1985)). While commerciality alone does not defeat fair use, it is a "separate concern" that "weighs against a finding of fair use." *Id.* (quoting *Warhol*, 598 U.S. at 521). ##### c. Transformative Use Analysis The court held that Ross's use was not transformative because it served the same purpose as Thomson Reuters's headnotes: facilitating legal research. The opinion states: > Ross used the copyrighted headnotes as AI data to create a legal research tool to compete with Westlaw. Its system is not generative AI; rather, like Westlaw, it returns relevant judicial opinions in response to user queries. *Id.* at 18–19. The court emphasized functional similarity: both Westlaw and Ross's platform enable attorneys to identify relevant case law through search queries. Although Ross employed AI technology, the ultimate function remained the same. The court analogized this to cases rejecting transformative-use claims where new technology delivers the same content for the same purpose. *Id.* at 18–20. ##### d. Rejection of Intermediate Copying Defense Ross argued that its use constituted permissible "intermediate copying" under precedent authorizing temporary or intermediate reproduction to achieve a different end. Ross cited: - *Google LLC v. Oracle Am., Inc.*, 593 U.S. ___ (2021) (copying Java API to enable software compatibility); - *Sony Computer Ent. v. Connectix Corp.*, 203 F.3d 596 (9th Cir. 2000) (copying PlayStation BIOS to create compatible emulator); and - *Sega Enters. Ltd. v. Accolade, Inc.*, 977 F.2d 1510 (9th Cir. 1992) (copying video game code to enable reverse engineering). *Ross*, slip op. at 19–20. The court distinguished these cases on the ground that each involved copying to achieve a purpose different from the original: - In *Google*, the defendant copied the API to enable interoperability with a new mobile platform, not to replicate Oracle's Java implementation; - In *Connectix* and *Sega*, defendants copied to achieve compatibility with existing hardware, not to reproduce the copyrighted software itself. *Id.* at 19–20. By contrast, Ross copied Westlaw headnotes to build a competing legal-research tool—the same purpose the headnotes originally served. The court stated: "Ross's copying was not intermediate to achieving a different purpose; it was instrumental to achieving the same purpose." *Id.* at 20. ##### e. Conclusion on Factor One The court held that factor one "goes to Thomson Reuters" because: 1. Ross's use was commercial; 2. Ross's use served the same purpose as the original (legal research); and 3. The intermediate copying cases did not apply where the ultimate function remained identical. *Id.* at 16, 20. #### 3. Factor Two: Nature of Copyrighted Work ##### a. Originality of West Headnotes The court addressed whether Westlaw headnotes satisfied the minimal creativity requirement established in *Feist Publ'ns, Inc. v. Rural Tel. Serv. Co.*, 499 U.S. 340 (1991). The court held that headnotes are original editorial works because they involve: 1. **Selection**: Identifying which legal principles within a lengthy opinion merit emphasis; 2. **Distillation**: Extracting those principles from surrounding context; and 3. **Expression**: Articulating the principle concisely in the editor's own words. *Ross*, slip op. at 7–9. The court analogized the editorial process to sculpture: > Identifying which words matter and chiseling away the surrounding mass expresses the editor's idea. The resulting headnote reflects judgment, skill, and creative selection—the 'creative spark' Feist requires. *Id.* at 8–9. ##### b. Originality of Key Number System The court similarly found the Key Number System original as a compilation. Although legal topics follow logical organization, "there are many possible, logical ways to organize legal topics." The editor's choice of one organizational scheme over others reflects independent creation sufficient for copyright protection. *Id.* at 9–10. ##### c. Application to Factor Two Because the copyrighted works were creative editorial compilations rather than purely factual materials, factor two weighed against fair use. The court noted that creative works "are closer to the core of copyright protection" and thus less susceptible to fair-use claims. *Id.* at 16 (citing *Campbell*, 510 U.S. at 586). However, the court acknowledged that factor two "may be of limited significance where the use is transformative." *Id.* at 16 n.10 (citing *Google*, 593 U.S. at slip op. 21). Because the court had already determined Ross's use was not transformative, factor two retained its normal weight against fair use. #### 4. Factor Three: Amount and Substantiality Used ##### a. Analytical Framework The third factor examines "the amount and substantiality of the portion used in relation to the copyrighted work as a whole." 17 U.S.C. § 107(3). The Supreme Court has clarified that the relevant inquiry is not the percentage of the original work copied, but rather "the amount and substantiality of what is thereby made accessible to a public for which it may serve as a competing substitute." *Authors Guild v. Google, Inc.*, 804 F.3d 202, 222 (2d Cir. 2015). *Ross*, slip op. at 21–22. ##### b. What Ross Made Accessible to Users The court focused on the user-facing aspects of Ross's system. Although Ross's training process copied thousands of headnotes, "Ross did not make West headnotes available to the public." Users who queried Ross's AI platform received case law and analytical outputs, not reproductions of Westlaw's editorial summaries. *Id.* at 21–22. The court held that "because the copyrighted material itself was not publicly accessible," factor three favored Ross. *Id.* at 22. ##### c. Rejection of Volumetric Arguments Thomson Reuters argued that copying "several thousand headnotes out of Westlaw" was substantial regardless of public accessibility. The court acknowledged that wholesale copying can weigh against fair use, citing *Campbell*, 510 U.S. at 587. However, the court applied the *Authors Guild* framework, which evaluates substitutive exposure rather than intermediate copying volume. *Ross*, slip op. at 21–22. ##### d. Conclusion on Factor Three The court held that factor three favored Ross because the copyrighted headnotes were not made accessible to the public in a form that could substitute for the original works. *Id.* at 21–22, 31. This conclusion distinguished between the **extent of intermediate copying** (which was substantial) and **public-facing substitution** (which was minimal). Even where AI training involves voluminous copying, factor three may favor the defendant if the system does not expose protected expression to users. *Id.* #### 5. Factor Four: Effect on Potential Market ##### a. Primacy of Factor Four The court described factor four as "the single most important element of fair use," quoting *Harper & Row*, 471 U.S. at 566. *Ross*, slip op. at 22–23. The analysis examined two markets: 1. The existing market for legal-research platforms; and 2. A potential derivative market for licensing training data. *Id.* at 22–25. ##### b. Existing Market for Legal Research The court found that Ross's AI platform was designed to compete directly with Westlaw in the legal-research market. The opinion stated: > Ross meant to compete with Westlaw by developing a market substitute. This is not a case where the secondary use serves a different market segment or user base. *Id.* at 23–24. The court emphasized that actual competition in the primary market is sufficient to establish harm under factor four, even if the secondary user employs different technology or methodology. *Id.* at 24. ##### c. Potential Derivative Market for Training Data The court also considered whether Ross's use harmed a potential market for licensing copyrighted works as AI training data. Ross argued that Thomson Reuters had never licensed its headnotes for AI training and thus no such market existed. The court rejected this argument, holding: > It does not matter whether Thomson Reuters has used the data to train its own legal search tools; the effect on a potential market for AI training data is enough. *Id.* at 24–25 (emphasis added). The court explained that factor four protects not only existing markets but also "potential markets that creators might reasonably develop or license others to develop." *Id.* at 25 (citing *Campbell*, 510 U.S. at 590). Because AI training represents a plausible licensing opportunity, and because Ross had sought (and been denied) such a license, the court found a cognizable derivative market. *Id.* ##### d. Burden of Proof The court held that once the plaintiff establishes a prima facie case of market harm, the defendant "bears the burden to show those markets do not exist and would not be affected." *Id.* at 25. Ross failed to carry that burden. *Id.* ##### e. Rejection of Public Benefit Arguments Ross argued that even if market harm existed, the public benefits of AI innovation should outweigh Thomson Reuters's proprietary interests. The court rejected this policy argument: > Fair use does not permit defendants to invoke generalized social benefits to justify undermining core or derivative markets for copyrighted works. *Id.* at 22–25. ##### f. Conclusion on Factor Four The court held that factor four weighed heavily against Ross because: 1. Ross's platform competed directly with Westlaw in the legal-research market; and 2. Ross's use foreclosed a potential licensing market for AI training data. *Id.* at 22–25. #### 6. Overall Balancing and Holding The court balanced the four factors as follows: - **Factor One** (purpose and character): Against fair use—commercial, non-transformative, same purpose as original - **Factor Two** (nature of work): Against fair use—creative editorial works - **Factor Three** (amount used): Favors fair use—headnotes not publicly accessible - **Factor Four** (market effect): Strongly against fair use—direct competition and harm to derivative market *Id.* at 16–25. The court concluded: > On balance, Ross's use of Thomson Reuters's copyrighted headnotes fails the fair-use test. Factors one, two, and four outweigh factor three. Ross built a competing product using unauthorized copies of Westlaw's editorial content, and that conduct is not excused by the fair-use doctrine. *Id.* at 31. The court entered summary judgment for Thomson Reuters on the fair-use defense. *Id.* at 31–32. --- ### B. *Bartz v. Anthropic PBC* No. 3:23-cv-04648-WHO (N.D. Cal. June 23, 2025) (Orrick, J.) #### 1. Factual Background and Procedural Posture ##### a. The Parties The plaintiffs were five professional authors, including Richard Bartz (author of numerous published novels), who alleged that Anthropic PBC ("Anthropic") copied their books to train its Claude family of large language models ("LLMs"). *Bartz v. Anthropic PBC*, No. 3:23-cv-04648-WHO, slip op. at 2–4 (N.D. Cal. June 23, 2025). Anthropic develops and commercializes LLMs, including Claude 2 and Claude 3, which generate human-like text in response to user prompts. The models are trained on massive text corpora to learn statistical patterns in language. *Id.* at 4–5. ##### b. Training Data Sources Discovery established that Anthropic's training datasets included: 1. **Licensed corpora**: Materials from Project Gutenberg (public-domain works), Common Crawl (web-scraped data), and various licensed databases; 2. **"Shadow library" content**: Copies of books circulating through unlicensed repositories such as Bibliotik and Z-Library, which host unauthorized scans of copyrighted works. *Id.* at 5–7. Plaintiffs alleged that Anthropic knowingly incorporated materials from shadow libraries despite awareness that such sources contained pirated content. Anthropic countered that it filtered training data to exclude known infringing materials and that any remaining copyrighted works were used for transformative analytical purposes. *Id.* at 7–8. ##### c. The Training Process Anthropic's training methodology involved: 1. **Tokenization**: Converting text into numerical tokens representing words or subword units; 2. **Statistical analysis**: Calculating probability distributions for token sequences based on patterns in the training corpus; 3. **Parameter optimization**: Adjusting billions of numerical "weights" through iterative training to maximize the model's ability to predict plausible next tokens; 4. **Deployment**: Using the trained model to generate new text in response to user prompts. *Id.* at 8–10. The court found that "the training process does not store or reproduce the original works as expressive content" but instead "converts them into numerical weight parameters" that enable text generation. *Id.* at 16. ##### d. Plaintiffs' Claims Plaintiffs asserted three copyright claims: 1. **Direct infringement**: Copying their books during the training process; 2. **Vicarious infringement**: Profiting from users who prompted Claude to generate text similar to plaintiffs' works; and 3. **DMCA violations**: Removing copyright management information during data processing. *Id.* at 10–12. ##### e. Procedural Posture Anthropic moved for summary judgment on all claims, arguing that its use constituted fair use as a matter of law. The court granted the motion on the direct infringement and vicarious infringement claims, holding that fair use applied. The court dismissed the DMCA claim on separate grounds. *Id.* at 2–3, 34–35. #### 2. Factor One: Purpose and Character of Use ##### a. Analytical Framework Judge Orrick framed factor one as the "center of gravity" in disputes involving technological copying. *Bartz*, slip op. at 13. He applied the transformative-use analysis from *Campbell v. Acuff-Rose Music, Inc.*, 510 U.S. 569 (1994), which asks whether the secondary use "adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message." *Id.* at 14 (quoting *Campbell*, 510 U.S. at 579). ##### b. Transformative Purpose of LLM Training The court held that Anthropic's training of Claude was "quintessentially transformative." *Id.* at 17. The opinion emphasized three aspects of transformation: **First**, the purpose differed fundamentally from the original works: > The purpose of Anthropic's copying was not to communicate or repackage plaintiffs' expression but to extract statistical information about syntax, semantics, and narrative form. *Id.* at 14–15. **Second**, the use was analytical rather than expressive: > LLM training does not store or reproduce the original works as expressive content. It converts them into numerical weight parameters that allow the model to generate new, uncopied text. *Id.* at 16. **Third**, the transformation paralleled human learning: > The model learns style in much the same way a writer reads widely before writing her own work. The process is informational and analytical, not reproductive. *Id.* at 17–18. ##### c. Commercial Use Anthropic's use was admittedly commercial: the company sells subscriptions to Claude and licenses the model to enterprise customers. Plaintiffs argued that commerciality should weigh heavily against fair use. The court acknowledged the commercial nature but held that "commerciality does not defeat fair use where the transformative purpose predominates." *Id.* at 19 (citing *Google LLC v. Oracle Am., Inc.*, 593 U.S. ___, slip op. at 24 (2021)). The opinion distinguished cases like *Harper & Row*, where commercial publishers simply resold copyrighted content, from cases like *Google Books* and *Google v. Oracle*, where commercial entities used copyrighted materials for new, transformative purposes. *Bartz*, slip op. at 19–20. ##### d. Distinction from *Ross Intelligence* Plaintiffs cited *Thomson Reuters v. Ross Intelligence* for the proposition that commercial AI training fails fair use when the system competes with the original work. The court distinguished *Ross* on functional grounds: > Unlike Ross Intelligence, which built a legal-research tool to compete with Westlaw's legal-research function, Claude does not substitute for plaintiffs' novels in the marketplace. No reader purchases access to Claude in lieu of buying plaintiffs' books. *Id.* at 20–21. The court emphasized that transformation depends on functional analysis: if the secondary use serves a different purpose in a different market, it is transformative; if it serves the same purpose in the same market, it is merely substitutive. *Id.* at 21. ##### e. Conclusion on Factor One The court held that factor one "weighs strongly in favor of fair use" because: 1. The purpose was analytical (extracting statistical patterns) rather than expressive (reproducing narrative content); 2. The use transformed novels into training data for a text-generation system that does not compete with the novels themselves; and 3. Although commercial, the transformative purpose predominated. *Id.* at 21. #### 3. Factor Two: Nature of Copyrighted Work ##### a. Highly Creative Works The court acknowledged that plaintiffs' novels were "indisputably creative and fictional," characteristics that "normally weigh against fair use." *Bartz*, slip op. at 22–23 (citing *Campbell*, 510 U.S. at 586). Works of fiction occupy the "core of copyright's protective purposes" because they embody the author's original expression and imagination. *Id.* at 23. ##### b. Diminished Weight in Transformative Contexts The court held, however, that factor two "carries little weight in highly transformative contexts." *Id.* (citing *Google v. Oracle*, 593 U.S. at slip op. 21). The opinion explained: > When copyrighted works are used for their informational content about language patterns rather than for expressive exploitation, the creativity of the source material does not meaningfully alter the fair-use calculus. *Id.* at 23. The court analogized the use to that in *Google Books*, where highly creative literary works were scanned to create a search index. The Second Circuit held in *Authors Guild v. Google* that even "quintessentially creative works" may be fair-use material when copied for non-expressive purposes. *Id.* at 24 (citing *Authors Guild*, 804 F.3d at 220). ##### c. Use for Non-Expressive Purposes The court emphasized that Anthropic used plaintiffs' novels "as data points in a mathematical optimization process," not "to deliver the narrative, characters, or expressive content to readers." *Bartz*, slip op. at 24. The training process treated creative works as sources of linguistic patterns—patterns that are not themselves copyrightable. *Id.* ##### d. Conclusion on Factor Two The court held that factor two was "formally against Anthropic but of limited weight." *Id.* at 25. The formal weighing acknowledged the creative nature of the works, but the practical significance was minimal given the transformative, non-expressive use. *Id.* #### 4. Factor Three: Amount and Substantiality Used ##### a. Wholesale Copying Plaintiffs argued that Anthropic's training process copied their entire novels, satisfying factor three's inquiry about "the amount and substantiality of the portion used." *Bartz*, slip op. at 26 (quoting 17 U.S.C. § 107(3)). The court acknowledged the literal accuracy of this claim: "Anthropic ingested complete copies of plaintiffs' works during training." *Id.* at 26. ##### b. Necessity of Complete Works The court held, however, that wholesale copying was permissible when "essential to the transformative purpose." *Id.* (citing *Sony Computer Ent. v. Connectix Corp.*, 203 F.3d 596, 606 (9th Cir. 2000)). The opinion explained: > To teach a model the statistical structure of language, complete works must be included; fragments would not suffice. Partial excerpts would distort syntax, omit narrative patterns, and cripple the model's ability to learn coherent linguistic relationships. *Id.* at 27. The court analogized the necessity to cases permitting complete copying when technologically required, such as: - *Connectix* (copying entire BIOS to enable reverse engineering); - *Sega v. Accolade* (copying entire video game code to achieve compatibility); and - *Google Books* (scanning entire books to create comprehensive search index). *Bartz*, slip op. at 27–28. ##### c. Distinction Between Copying in Memory and Copying in Output The court distinguished "reproduction in memory" (during training) from "reproduction in output" (in user-facing text generation). *Id.* at 28. The opinion stated: > While training required full ingestion, Claude's outputs do not reproduce plaintiffs' text verbatim except in trivial, random instances. End users never receive expressive portions of the copyrighted books. *Id.* at 28–29. The court cited empirical evidence showing that Claude's outputs rarely, if ever, generated passages longer than a few words from plaintiffs' novels. When occasional short phrases appeared, they were "random artifacts" akin to the "fair use snippets" in *Google Books*. *Id.* at 29. ##### d. Reasonableness Relative to Purpose The court applied the *Campbell* framework, which evaluates whether the amount used was "reasonable in relation to the purpose of the copying." *Id.* at 30 (quoting *Campbell*, 510 U.S. at 586–87). Because complete works were necessary to achieve the transformative goal of training an LLM, and because users did not receive the copyrighted content, the amount used was reasonable. *Id.* ##### e. Conclusion on Factor Three The court held that factor three "favors Anthropic" because: 1. Complete copying was technologically necessary for LLM training; 2. The training process did not expose copyrighted works to users; and 3. The amount used was reasonable relative to the transformative purpose. *Id.* at 30. #### 5. Factor Four: Effect on Potential Market ##### a. Primacy of Market Effect Judge Orrick described factor four as "the linchpin" of fair-use analysis. *Bartz*, slip op. at 31 (citing *Harper & Row*, 471 U.S. at 566). He divided the inquiry into two parts: 1. Whether Claude's outputs displaced sales of plaintiffs' books; and 2. Whether Claude's training harmed a potential licensing market for using books as training data. *Id.* at 31–32. ##### b. No Substitution in the Book Market The court found "no empirical evidence" that Claude's outputs substituted for plaintiffs' works in the book market. *Id.* at 32. Plaintiffs offered no data showing: - Reduced book sales following Claude's launch; - Consumer surveys indicating readers used Claude instead of purchasing books; or - Market analysis demonstrating displacement. *Id.* at 32–33. The court rejected plaintiffs' speculative argument that "AI text generation will flood the market with free imitations," finding no causal link between Claude's training on plaintiffs' books and any alleged market dilution. *Id.* at 33. The opinion stated: > The mere theoretical possibility that AI-generated fiction could someday compete with human-authored novels does not establish market harm under factor four. Plaintiffs must show actual or likely substitution, not speculative future scenarios. *Id.* ##### c. No Cognizable Licensing Market Plaintiffs argued that AI training created a new derivative market for licensing books as training data, and that Anthropic's unlicensed use harmed that market. The court rejected this theory, holding: > The Copyright Act protects existing markets, not hypothetical ones proposed in litigation. Plaintiffs identified no established mechanism for licensing novels as AI training data, no industry practice of such licensing, and no evidence that such a market would exist absent litigation. *Id.* at 34. The court distinguished cases like *Harper & Row*, where established licensing markets (serial publication rights) existed at the time of infringement. By contrast, plaintiffs pointed to no functioning market for AI training licenses. *Id.* ##### d. Burden of Proof The court held that once the defendant establishes a prima facie transformative use, the burden shifts to the plaintiff to demonstrate market harm with empirical evidence. *Id.* at 34–35 (citing *Campbell*, 510 U.S. at 590–91). Plaintiffs failed to carry that burden. *Id.* ##### e. Conclusion on Factor Four The court held that factor four "weighs heavily in favor of Anthropic" because: 1. No evidence showed Claude's outputs substituted for plaintiffs' books; 2. No evidence established an existing licensing market for AI training data; and 3. Speculation about future market effects was insufficient. *Id.* at 35. #### 6. Overall Balancing and Holding The court balanced the four factors as follows: - **Factor One** (purpose and character): Strongly favors fair use—transformative, analytical purpose - **Factor Two** (nature of work): Formally against fair use but of minimal weight—creative works used non-expressively - **Factor Three** (amount used): Favors fair use—complete copying necessary, outputs non-substitutive - **Factor Four** (market effect): Strongly favors fair use—no market substitution or cognizable licensing harm *Id.* at 35–36. The court concluded: > Anthropic's use of copyrighted books to train Claude was transformative, non-expressive, and non-substitutive. Plaintiffs presented no evidence of market harm. Fair use applies as a matter of law. *Id.* at 35. The court entered summary judgment for Anthropic on all copyright claims. *Id.* at 36. --- ### C. *Kadrey v. Meta Platforms Inc.* No. 3:23-cv-04647-VC (N.D. Cal. June 25, 2025) (Chhabria, J.) #### 1. Factual Background and Procedural Posture ##### a. The Parties The plaintiffs were professional authors, including Richard Kadrey (author of the Sandman Slim urban fantasy series), Sarah Silverman (comedian and author), and Christopher Golden (horror novelist). They alleged that Meta Platforms Inc. ("Meta") copied their books without authorization to train its LLaMA 2 and LLaMA 3 large language models. *Kadrey v. Meta Platforms Inc.*, No. 3:23-cv-04647-VC, slip op. at 2–5 (N.D. Cal. June 25, 2025). Meta develops open-source LLMs that third parties can download, fine-tune, and deploy for various applications. Unlike Anthropic's commercial Claude product, Meta releases LLaMA models under permissive licenses allowing academic and commercial use. *Id.* at 4–5. ##### b. Training Data Sources Discovery revealed that Meta's training corpora included materials from: 1. **Common Crawl**: Web-scraped text from billions of websites; 2. **Licensed databases**: Academic papers, Wikipedia, and other authorized sources; 3. **"Shadow libraries"**: Unlicensed repositories such as Bibliotik, Z-Library, and Library Genesis, which hosted unauthorized scans of books. *Id.* at 5–7. Plaintiffs alleged that Meta knowingly sourced data from shadow libraries despite public awareness that such sites distributed pirated content. Meta acknowledged using shadow-library materials but argued that the use was transformative and that it implemented filters to exclude known infringing content. *Id.* at 7–8. ##### c. The LLaMA Training Process Meta's training methodology paralleled Anthropic's: 1. **Text ingestion**: Reading complete books into memory; 2. **Tokenization**: Converting text into numerical tokens; 3. **Pattern extraction**: Calculating statistical relationships among tokens; 4. **Weight optimization**: Adjusting billions of parameters to maximize prediction accuracy; 5. **Model deployment**: Releasing trained models for public or commercial use. *Id.* at 8–10. The court found that LLaMA "does not memorize books or reproduce their content" but instead "learns probability distributions that enable text generation." *Id.* at 11. ##### d. Plaintiffs' Claims Plaintiffs asserted: 1. **Direct infringement**: Copying their novels during training; 2. **Contributory infringement**: Enabling third parties to generate infringing outputs using LLaMA; 3. **Vicarious infringement**: Profiting from infringing uses by LLaMA users. *Id.* at 10–12. ##### e. Procedural Posture Meta moved for summary judgment, arguing that its training use constituted fair use and that plaintiffs failed to establish secondary liability. Judge Vince Chhabria granted the motion on the fair-use defense, mooting the secondary liability claims. *Id.* at 35–36. #### 2. Factor One: Purpose and Character of Use ##### a. Purpose as Definitional Judge Chhabria began his analysis by emphasizing that "the purpose of the copying defines this case." *Kadrey*, slip op. at 9. The court framed the inquiry as functional: Did Meta use plaintiffs' novels to deliver expressive content to readers, or to extract data for a different purpose? *Id.* ##### b. Transformative Function The court held that Meta's ingestion of plaintiffs' novels served "an entirely new and different function"—extracting statistical representations of language for use in predictive text generation. *Id.* at 10. The opinion stated: > The use of copyrighted works to teach a machine to generate text is highly transformative because the models do not republish the works; they internalize patterns to create new expressions. *Id.* at 11. The court compared LLaMA training to the indexing upheld in *Authors Guild v. Google*, where scanning millions of books to enable keyword search was deemed transformative. *Id.* at 12 (citing *Authors Guild*, 804 F.3d at 216–18). Like the digital-library index in *Google Books*, LLaMA "uses the books as raw material to learn about language, not as substitute copies for readers." *Kadrey*, slip op. at 12. ##### c. Analytical Use vs. Reproductive Use The court characterized Meta's training as "more akin to a scholar reading a library than a publisher reprinting a volume." *Id.* at 13. The opinion emphasized that: 1. LLaMA does not store plaintiffs' text in retrievable form; 2. Users cannot query LLaMA to reproduce plaintiffs' novels; 3. The training process extracts statistical patterns, not expressive content. *Id.* at 13–14. ##### d. Commercial Nature Plaintiffs argued that Meta's release of LLaMA under permissive licenses, which allowed commercial exploitation by third parties, rendered the use commercial and non-transformative. The court acknowledged Meta's commercial motivations but followed *Google v. Oracle* and *Bartz v. Anthropic* in holding that "commercial use does not bar fair use when the transformative purpose predominates." *Id.* at 14–15 (citing *Google*, 593 U.S. at slip op. 24; *Bartz*, slip op. at 19). The court distinguished Meta's training from cases like *Harper & Row*, where commercial defendants simply resold copyrighted content. Here, Meta repurposed novels into a fundamentally different product—a text-generation model that did not compete with the novels themselves. *Kadrey*, slip op. at 15. ##### e. Rejection of Exploitation Theory Plaintiffs contended that Meta's downstream revenue from LLaMA-based services (such as enterprise licensing and cloud computing) converted the use into an exploitative one. The court rejected this argument: > The commercial nature of the enterprise does not negate the transformative character of the training itself. If training is transformative, subsequent monetization of the trained model does not retroactively eliminate fair use. *Id.* at 15–16. ##### f. Conclusion on Factor One The court held that factor one "strongly favors Meta" because: 1. The purpose was analytical and transformative, not reproductive; 2. LLaMA's function differed from that of plaintiffs' novels; 3. Commercial use did not defeat transformative purpose. *Id.* at 16. #### 3. Factor Two: Nature of Copyrighted Work ##### a. Creative Fiction at the Core of Copyright Judge Chhabria acknowledged that plaintiffs' novels were "creative works of fiction at the core of copyright's protection." *Kadrey*, slip op. at 17 (citing *Campbell*, 510 U.S. at 586). Fiction embodies original expression, character development, narrative structure, and imaginative world-building—all central to copyright's constitutional purpose. *Id.* ##### b. Diminished Significance in Analytical Uses The court held, however, that factor two had "diminished significance" where the use was "wholly analytical rather than expressive." *Id.* at 18. Echoing *Bartz*, the court stated: > The models do not exploit the creativity of the works for entertainment or expressive effect; they treat them as data points in a mathematical space. *Id.* at 19. The court explained that when copyrighted works are used not for their expressive content but for statistical patterns embedded in language, the creative nature of the works becomes less relevant to the fair-use inquiry. *Id.* ##### c. Informational vs. Expressive Use The court distinguished between: 1. **Expressive use**: Reproducing a novel's plot, characters, or narrative to entertain readers; 2. **Informational use**: Analyzing linguistic patterns, syntax, and semantic relationships. *Id.* at 19–20. LLaMA training constituted informational use. The court held that "even highly creative works may be fair-use material when used to extract non-copyrightable informational content." *Id.* at 20. ##### d. Conclusion on Factor Two The court held that factor two "tilts slightly against Meta in theory, but in this factual setting its force is weak." *Id.* The formal weighing acknowledged the creative nature of fiction, but the practical impact was minimal given the non-expressive use. *Id.* #### 4. Factor Three: Amount and Substantiality Used ##### a. Complete Copying Plaintiffs emphasized that Meta's training process copied their entire novels, not excerpts or summaries. The court agreed that "the entire works were copied" during training. *Kadrey*, slip op. at 21. ##### b. Necessity of Wholesale Copying The court held that complete copying was "necessary to achieve the transformative goal" of training an LLM. *Id.* at 21–22. The opinion explained: > Partial excerpts would cripple the model's ability to learn syntax, semantics, and context. Language models require complete texts to understand narrative flow, character development, and stylistic patterns. *Id.* at 22. The court cited *Sony v. Connectix Corp.*, 203 F.3d at 606, for the proposition that wholesale copying can be permissible when "intermediate and technologically necessary." *Kadrey*, slip op. at 23. ##### c. No Verbatim Reproduction in Outputs The court noted empirical evidence that LLaMA's training outputs did not reproduce plaintiffs' language verbatim. *Id.* at 24. Where short phrases occasionally appeared, they were "brief and non-expressive—akin to the 'fair use snippets' in *Google Books*." *Id.* at 24–25. The court emphasized the distinction between copying during training and exposure in outputs: > Copying entire works to create a tool that never exposes those works to users is consistent with factor three. *Id.* at 26. ##### d. Reasonableness Relative to Purpose Applying the *Campbell* framework, the court held that the amount used was "reasonable in relation to the purpose." *Id.* at 26 (citing *Campbell*, 510 U.S. at 586–87). Because LLaMA training required complete texts and generated non-substitutive outputs, the wholesale copying was permissible. *Id.* ##### e. Conclusion on Factor Three The court held that factor three favored Meta because: 1. Complete copying was technologically necessary; 2. Outputs did not reproduce plaintiffs' works; and 3. The amount used was reasonable relative to the transformative purpose. *Id.* at 26. #### 5. Factor Four: Effect on Potential Market ##### a. Most Hotly Contested Issue Judge Chhabria described factor four as the "most hotly contested issue" in the case. *Kadrey*, slip op. at 27. The court examined: 1. Whether LLaMA's outputs substituted for plaintiffs' books; and 2. Whether Meta's use harmed a potential licensing market. *Id.* ##### b. No Evidence of Market Substitution The court found "no competent evidence that the outputs of LLaMA 2 or 3 have displaced any of plaintiffs' books in the market." *Id.* at 28. Plaintiffs offered no: - Sales data showing reduced book purchases after LLaMA's release; - Consumer surveys indicating readers used LLaMA instead of buying books; - Economic analysis demonstrating market displacement. *Id.* at 28–29. Plaintiffs relied instead on the theory that AI-generated fiction generally dilutes demand for human-authored works. The court rejected this theory as speculative: > There is no competent evidence that LLaMA's outputs have displaced plaintiffs' books or harmed the market for fiction generally. *Id.* at 29. ##### c. Acknowledgment of Theoretical Future Harm The court acknowledged, however, that market dilution remains a "theoretically important" concern in AI litigation. *Id.* The opinion stated: > Because the issue of market dilution is so important in this context, had the plaintiffs presented any evidence of it, factor four would have needed to go to a jury. *Id.* The court thus distinguished between: 1. **Speculative harm**: Unsupported assertions about future market effects (insufficient for factor four); and 2. **Evidence-based harm**: Data, surveys, or expert testimony demonstrating actual or likely substitution (sufficient to create factual dispute). *Id.* ##### d. No Established Licensing Market Plaintiffs argued that a market for licensing books as AI training data was emerging and that Meta's unlicensed use foreclosed that market. The court rejected this argument, holding: > Plaintiffs identified no established, functioning market for such licenses. Recognizing purely hypothetical markets would freeze technological progress at its inception. *Id.* at 31–32. The court distinguished cases like *Harper & Row*, where established licensing practices existed, from cases like *Google Books*, where no such market had developed. *Id.* at 32. Because plaintiffs pointed to no industry-standard licensing mechanism, no trade association facilitating such licenses, and no evidence that publishers routinely granted AI training rights, the court declined to recognize a cognizable derivative market. *Id.* ##### e. Comparison to *Google Books* The court found the record "indistinguishable from *Google Books*—no substitution, no lost licensing market, and no proof of economic injury." *Id.* at 30. Just as the Second Circuit held that scanning books for search indexing did not harm markets because users still purchased books and no licensing market existed, Judge Chhabria held that training LLaMA on books did not harm plaintiffs' markets. *Id.* ##### f. Conclusion on Factor Four The court held that factor four "weighs decisively in Meta's favor" because: 1. No evidence showed LLaMA's outputs substituted for plaintiffs' books; 2. No evidence established an existing licensing market for AI training data; 3. Speculative future harm was insufficient to defeat fair use. *Id.* at 33. #### 6. Overall Balancing and Holding The court balanced the four factors as follows: - **Factor One** (purpose and character): Strongly favors fair use—transformative, analytical use - **Factor Two** (nature of work): Formally against fair use but of minimal weight—creative works used non-expressively - **Factor Three** (amount used): Favors fair use—necessary complete copying, non-substitutive outputs - **Factor Four** (market effect): Decisively favors fair use—no market substitution, no established licensing market *Id.* at 33–35. The court concluded: > The copying at issue was transformative, non-expressive, and non-substitutive; plaintiffs have not demonstrated any cognizable market harm. The fair-use defense therefore applies as a matter of law. *Id.* at 35–36. The court entered summary judgment for Meta on all copyright claims. *Id.* at 36. --- ## III. Doctrinal Synthesis: Emerging Principles The three district court decisions—*Thomson Reuters v. Ross Intelligence*, *Bartz v. Anthropic PBC*, and *Kadrey v. Meta Platforms Inc.*—collectively establish an analytical framework for evaluating AI training under the fair-use doctrine. This Part synthesizes the common doctrinal principles emerging from the trilogy. ### A. Transformation as Function-Specific Analysis #### 1. The Core Inquiry All three courts applied a function-specific test for transformation: Does the secondary use serve the same purpose as the original work, or does it serve a different purpose? - **Same purpose = Not transformative**: *Ross Intelligence* held that using Westlaw headnotes to build a competing legal-research tool was not transformative because both served the identical function of facilitating legal research. *Ross*, slip op. at 18–20. - **Different purpose = Transformative**: *Bartz* and *Kadrey* held that using novels to train text-generation models was transformative because the training process served an analytical purpose (extracting linguistic patterns) rather than an expressive purpose (delivering narrative content to readers). *Bartz*, slip op. at 14–17; *Kadrey*, slip op. at 10–13. #### 2. Technology Alone Does Not Transform The courts rejected arguments that employing AI technology inherently renders use transformative. *Ross* emphasized that "using AI does not automatically make copying transformative when the end function remains the same." *Ross*, slip op. at 19–20. The relevant question is not *how* the defendant achieves the function (through AI, manual processes, or other means) but *what* function the secondary use serves. If the function replicates the original's purpose in the marketplace, the use is substitutive rather than transformative. *Id.* #### 3. Analytical vs. Expressive Use *Bartz* and *Kadrey* established that using copyrighted works to extract statistical information constitutes analytical, non-expressive use that is categorically different from reproducing or redistributing the works for consumption. *Bartz* stated: > The purpose of Anthropic's copying was not to communicate or repackage plaintiffs' expression but to extract statistical information about syntax, semantics, and narrative form. *Bartz*, slip op. at 14–15. *Kadrey* echoed this principle: > LLaMA uses the books as raw material to learn about language, not as substitute copies for readers. *Kadrey*, slip op. at 12. This distinction parallels *Google Books*, where the Second Circuit held that creating a search index from copyrighted books was transformative because it served an informational purpose rather than reproducing expressive content. *Authors Guild*, 804 F.3d at 216–18. ### B. Intermediate Copying When Non-Expressive #### 1. Permissibility of Wholesale Intermediate Copying All three courts recognized that complete copying during AI training can be permissible when: 1. The copying is technologically necessary to achieve the transformative purpose; and 2. The copyrighted works are not exposed to users in substitutive form. *Bartz* held: > To teach a model the statistical structure of language, complete works must be included; fragments would not suffice. *Bartz*, slip op. at 27. *Kadrey* reached the same conclusion: > Partial excerpts would cripple the model's ability to learn syntax, semantics, and context. *Kadrey*, slip op. at 22. #### 2. Distinction Between Copying in Memory and Copying in Output The courts distinguished between: - **Intermediate copying**: Copying during training, which occurs in machine memory and is not accessible to users; and - **Public-facing copying**: Reproducing copyrighted works in user-accessible outputs. *Ross* applied this distinction in favor of the defendant on factor three, holding that "because Ross did not make West headnotes available to the public," factor three favored Ross despite voluminous intermediate copying. *Ross*, slip op. at 21–22. *Bartz* and *Kadrey* similarly held that even wholesale copying during training was permissible because LLM outputs did not reproduce copyrighted text in substitutive form. *Bartz*, slip op. at 28–29; *Kadrey*, slip op. at 24–26. #### 3. Technological Necessity Standard The courts applied the technological necessity framework from *Sony v. Connectix Corp.*, 203 F.3d 596 (9th Cir. 2000), and *Sega Enters. Ltd. v. Accolade, Inc.*, 977 F.2d 1510 (9th Cir. 1992), which permit intermediate copying when required to achieve a non-infringing purpose. *Bartz* cited *Connectix* for the principle that "wholesale copying may be permissible when the copying is essential to the transformative purpose." *Bartz*, slip op. at 26. *Kadrey* held that complete copying was "necessary to achieve the transformative goal" and "technologically required." *Kadrey*, slip op. at 21–23. ### C. Market Harm Evidentiary Requirements #### 1. Primacy of Factor Four All three courts described factor four as the "single most important" or "linchpin" of fair-use analysis. *Ross*, slip op. at 22–23 (quoting *Harper & Row*, 471 U.S. at 566); *Bartz*, slip op. at 31; *Kadrey*, slip op. at 27. The courts emphasized that the ultimate fair-use inquiry focuses on whether the secondary use is likely to substitute for the original in the marketplace, thereby undermining copyright's incentive structure. *Campbell*, 510 U.S. at 590–91. #### 2. Evidence vs. Speculation The courts required plaintiffs to provide empirical evidence of market harm rather than speculative assertions about hypothetical effects. *Bartz* held: > The mere theoretical possibility that AI-generated fiction could someday compete with human-authored novels does not establish market harm under factor four. Plaintiffs must show actual or likely substitution, not speculative future scenarios. *Bartz*, slip op. at 33. *Kadrey* similarly found "no competent evidence that the outputs of LLaMA 2 or 3 have displaced any of plaintiffs' books in the market." *Kadrey*, slip op. at 28. #### 3. Types of Acceptable Evidence The courts identified several forms of evidence that would satisfy the burden of proving market harm: 1. **Sales data**: Demonstrating reduced book sales following AI deployment; 2. **Consumer surveys**: Showing that readers use AI-generated text instead of purchasing copyrighted works; 3. **Economic analysis**: Expert testimony establishing market displacement; 4. **Lost licensing revenue**: Proof of an established licensing market and lost opportunities. *Bartz*, slip op. at 32–33; *Kadrey*, slip op. at 28–29. Plaintiffs in *Bartz* and *Kadrey* provided none of these, resulting in summary judgment for defendants on factor four. *Id.* #### 4. Derivative Licensing Markets The courts diverged on whether potential licensing markets for AI training data should weigh against fair use: **In *Ross***, the court recognized a potential derivative market because: 1. Ross had sought (and been denied) a license from Thomson Reuters; 2. AI training represented a plausible licensing opportunity; and 3. Copyright law protects "potential markets that creators might reasonably develop or license others to develop." *Ross*, slip op. at 24–25. **In *Bartz* and *Kadrey***, the courts declined to recognize nascent licensing markets because: 1. No industry-standard licensing mechanisms existed; 2. No evidence showed publishers routinely granted AI training licenses; 3. Recognizing hypothetical markets "would freeze technological progress at its inception." *Bartz*, slip op. at 34; *Kadrey*, slip op. at 31–32. The divergence suggests that courts will evaluate derivative market claims based on whether the market is **established and functioning** (as in *Ross*) versus **purely hypothetical** (as in *Bartz* and *Kadrey*). ### D. Diminished Weight of Creative Nature in Analytical Uses #### 1. Factor Two's Limited Role All three courts acknowledged that the copyrighted works were creative—headnotes in *Ross*, novels in *Bartz* and *Kadrey*—but held that creativity carried limited weight when the use was analytical rather than expressive. *Bartz* stated: > Factor two carries little weight in highly transformative contexts. When copyrighted works are used for their informational content about language patterns rather than for expressive exploitation, the creativity of the source material does not meaningfully alter the fair-use calculus. *Bartz*, slip op. at 23. *Kadrey* echoed this principle: > Factor two tilts slightly against Meta in theory, but in this factual setting its force is weak. *Kadrey*, slip op. at 20. #### 2. Distinction Between Expressive and Informational Content The courts distinguished between: - **Expressive use**: Reproducing creative content to deliver entertainment, narrative, or artistic value; and - **Informational use**: Extracting factual, structural, or statistical information embedded in creative works. When the use is informational, even highly creative works may be fair-use material. *Bartz*, slip op. at 24; *Kadrey*, slip op. at 19–20. This principle derives from *Feist Publ'ns, Inc. v. Rural Tel. Serv. Co.*, 499 U.S. 340 (1991), which held that copyright protects expression, not facts or ideas. When AI training extracts linguistic patterns (which are not copyrightable) from expressive works, the creative nature of the source becomes less relevant. *Id.* --- ## IV. Implications and Open Questions ### A. Framework for Future AI Litigation The trilogy establishes a functional framework for evaluating AI training under § 107: #### 1. Factor-One Analysis (Transformation) **Step 1**: Identify the purpose and function of the original work. **Step 2**: Identify the purpose and function of the AI training and deployment. **Step 3**: Determine whether the functions are the same (substitutive) or different (transformative). - If the AI system serves the same market function as the original work (e.g., *Ross*'s legal-research tool competing with Westlaw), factor one weighs against fair use. - If the AI system serves a different function (e.g., LLMs extracting linguistic patterns from novels), factor one favors fair use. **Step 4**: Evaluate whether the use is analytical (extracting non-copyrightable information) or expressive (reproducing copyrighted content). - Analytical uses favor transformation. - Expressive uses disfavor transformation. #### 2. Factor-Two Analysis (Nature of Work) **Step 1**: Characterize the copyrighted work as creative or factual. **Step 2**: Assess whether the use exploits the creative expression or merely extracts informational content. - If the use is expressive, factor two's weight increases. - If the use is analytical, factor two's weight decreases. #### 3. Factor-Three Analysis (Amount Used) **Step 1**: Determine whether complete copying was technologically necessary. **Step 2**: Assess whether the copyrighted works are exposed to users in substitutive form. - If complete copying was necessary and outputs are non-substitutive, factor three may favor fair use despite wholesale copying. - If copying exceeded necessity or outputs reproduce protected expression, factor three weighs against fair use. #### 4. Factor-Four Analysis (Market Effect) **Step 1**: Evaluate whether the AI system competes with the original work in the primary market. - Direct competition (as in *Ross*) weighs heavily against fair use. - Non-competition (as in *Bartz* and *Kadrey*) favors fair use. **Step 2**: Assess whether an established derivative licensing market exists. - If yes (as arguably in *Ross*), harm to that market weighs against fair use. - If no (as in *Bartz* and *Kadrey*), speculative licensing markets are insufficient. **Step 3**: Require empirical evidence of market harm, not speculation. - Sales data, consumer surveys, and economic analysis satisfy the burden. - Theoretical assertions do not. ### B. Unresolved Issues #### 1. Shadow Library Sourcing *Bartz* and *Kadrey* did not resolve whether sourcing training data from unauthorized "shadow libraries" affects the fair-use analysis. Both courts addressed fair use on the merits without penalizing defendants for using materials from Bibliotik, Z-Library, or Library Genesis. **Possible future developments**: - Courts may hold that knowingly using pirated sources weighs against fair use under factor one (bad faith). - Alternatively, courts may treat sourcing as irrelevant if the ultimate use is transformative and non-substitutive. - Congress may enact safe-harbor provisions clarifying permissible data sources. #### 2. Emergent Licensing Markets The divergence between *Ross* (recognizing potential derivative markets) and *Bartz/Kadrey* (rejecting hypothetical markets) leaves open the question of when nascent licensing markets become cognizable. **Factors that may determine cognizability**: - Evidence of industry-standard licensing practices; - Trade associations facilitating AI training licenses; - Established pricing mechanisms for training data; - Demonstrated demand from AI developers. As the AI industry matures, licensing markets may become sufficiently established to shift factor-four analysis against fair use. #### 3. Hybrid Training Pipelines The cases addressed "pure" training scenarios where AI models either competed with originals (*Ross*) or served entirely different functions (*Bartz*, *Kadrey*). Future litigation may involve hybrid pipelines where: - Training is analytical, but outputs occasionally reproduce copyrighted text verbatim; - Models are fine-tuned to mimic specific authors' styles; - AI systems generate derivative works that compete with originals in niche markets. Courts will need to refine the framework to address these intermediate scenarios. ### C. Congressional Consideration The trilogy demonstrates that courts can adapt fair-use doctrine to AI without statutory amendments. However, Congress may elect to provide clarity through: 1. **Safe harbors**: Codifying that non-expressive, non-substitutive training qualifies as fair use per se; 2. **Compulsory licensing**: Establishing statutory licenses for AI training, similar to mechanical licenses for music; 3. **Transparency requirements**: Mandating disclosure of training datasets and sourcing practices; 4. **Opt-out mechanisms**: Allowing copyright owners to exclude works from AI training corpora. Whether Congress intervenes will depend on stakeholder advocacy, economic impacts, and perceived inadequacy of judicial development. --- ## V. Conclusion The 2025 trilogy—*Thomson Reuters v. Ross Intelligence*, *Bartz v. Anthropic PBC*, and *Kadrey v. Meta Platforms Inc.*—marks the judiciary's initial effort to translate the flexible doctrine of fair use into the context of large-scale artificial-intelligence training. Each court applied the same four statutory factors under 17 U.S.C. § 107, yet the outcomes diverged along a predictable axis: whether the secondary use was transformative but non-competitive, or commercial and substitutive. In *Ross Intelligence*, the court found infringement where the defendant used Westlaw's editorial headnotes to build a direct market substitute. The copying was purposeful, commercial, and functionally identical to the original's use; factor four (market effect) dominated and defeated the defense. By contrast, in *Bartz* and *Kadrey*, the courts treated the ingestion of expressive works by LLMs as intermediate copying in service of a new analytical purpose. Training served to extract statistical features of language rather than to reproduce expression, and the plaintiffs failed to show any substitution or measurable economic loss. Those courts thus concluded that transformation (factor one) and lack of market harm (factor four) outweighed literal copying of entire works (factor three). Across all three decisions, several doctrinal points emerge: **First**, transformation is function-specific. Courts now ask whether AI training repurposes expression for an informational or analytical end rather than an expressive or consumptive one. If the model's function diverges from that of the original, factor one will dominate the analysis. **Second**, intermediate copying is permissible when non-expressive. Following *Google v. Oracle* and *Sony v. Connectix*, courts view complete copying as allowable when it is technologically necessary and the resulting system does not expose the expressive content to users. **Third**, market harm requires evidence. Speculative claims of dilution or hypothetical licensing markets will not defeat fair use. Absent empirical proof of substitution, factor four favors the AI developer. **Fourth**, creative nature carries diminished weight. Even highly creative works may be fair-use material when used to extract statistical relationships rather than to reproduce expression. Together, these opinions delineate a pragmatic boundary: AI training that competes with the original's commercial function will fail fair use; training that merely learns from the works to generate new, non-substitutive outputs will likely pass. The reasoning suggests that future litigation will hinge less on the volume of copying than on evidence of market displacement and on how transparently developers can document the non-expressive nature of their training processes. The courts have thus begun to adapt the fair-use doctrine to the realities of machine learning without rewriting the statute. Whether Congress will codify these distinctions or leave them to judicial evolution remains uncertain, but the 2025 decisions collectively signal that transformative analytical use—absent proof of substitution—is the emerging safe harbor for generative-AI training. --- ## Table of Authorities ### Cases *Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith*, 598 U.S. 508 (2023). *Authors Guild v. Google, Inc.*, 804 F.3d 202 (2d Cir. 2015). *Bartz v. Anthropic PBC*, No. 3:23-cv-04648-WHO (N.D. Cal. June 23, 2025). *Campbell v. Acuff-Rose Music, Inc.*, 510 U.S. 569 (1994). *Feist Publ'ns, Inc. v. Rural Tel. Serv. Co.*, 499 U.S. 340 (1991). *Google LLC v. Oracle Am., Inc.*, 593 U.S. ___ (2021). *Harper & Row Publs., Inc. v. Nation Enters.*, 471 U.S. 539 (1985). *Kadrey v. Meta Platforms Inc.*, No. 3:23-cv-04647-VC (N.D. Cal. June 25, 2025). *Sega Enters. Ltd. v. Accolade, Inc.*, 977 F.2d 1510 (9th Cir. 1992). *Sony Computer Ent. v. Connectix Corp.*, 203 F.3d 596 (9th Cir. 2000). *Thomson Reuters Enters. Ctr. GmbH v. Ross Intelligence Inc.*, No. 1:20-cv-613-SB (D. Del. Feb. 11, 2025). ### Statutes 17 U.S.C. § 106 (Exclusive Rights in Copyrighted Works). 17 U.S.C. § 107 (Fair Use). ### Secondary Authorities U.S. Copyright Office, *Copyright and Artificial Intelligence: Part II—Copyrightability* (2024). U.S. Copyright Office, *Copyright and Artificial Intelligence: Part III—Generative AI and Training Data* (2025 pre-publication version). Ballard Spahr LLP, *Court Grants Meta Summary Judgment in Authors' AI Fair Use Case* (2025). Goodwin Procter LLP, *Fair Use in the Age of LLMs: Kadrey v. Meta* (2025). --- **Disclaimer** This article is for general informational and editorial purposes only and does not constitute legal advice. Reading it does not create an attorney-client relationship. The article does not necessarily express the author's views, nor does it necessarily reflect the views of any employer, client, or affiliate. Legal outcomes turn on specific facts and applicable law, both of which may change. For advice on intellectual property or AI issues, consult a licensed attorney in your jurisdiction.