Training Data on Trial Essential

# Training Data on Trial: AI's First Fair Use Test > Function, Evidence, and the Fair Use Boundary **By Paul Roberts** | ✂️ Just the Essentials 📊 **1,600 words** | ⏱️ **6 min read** #Legal_Update #Fair_Use #Essential #Copyright December 27, 2024 --- ## The Rule Courts evaluate AI training through function-specific transformation analysis. Training that competes with the original fails. Training that learns without substituting succeeds. Three 2025 decisions define the boundary. --- ## The Cases ### Thomson Reuters v. Ross Intelligence No. 1:20-cv-613-SB (D. Del. Feb. 11, 2025) **What happened:** Ross built a legal-research AI to compete with Westlaw. LegalEase contractors copied Westlaw headnotes into training materials. Ross bought those materials. Ross trained its AI. Ross launched a competing product. **The court held:** Not fair use. Infringement. **Why:** Same function. Same market. Direct competition. Ross used headnotes to build legal research. Westlaw uses headnotes for legal research. No transformation. Commercial use. Market substitution. **Factor one:** Against fair use. Commercial, non-transformative, same purpose. **Factor two:** Against fair use. Creative editorial works. **Factor three:** Favors fair use. Headnotes not publicly accessible. **Factor four:** Against fair use. Direct market competition. Potential licensing market harmed. **Result:** Factors one, two, and four outweigh factor three. Fair use denied. --- ### Bartz v. Anthropic PBC No. 3:23-cv-04648-WHO (N.D. Cal. June 23, 2025) **What happened:** Anthropic trained Claude on novels. Authors sued. Anthropic moved for summary judgment. **The court held:** Fair use. No infringement. **Why:** Different function. No market substitution. Claude extracts statistical patterns. Claude doesn't reproduce novels. No evidence of harm. **Factor one:** Strongly favors fair use. Transformative. Analytical purpose. Extracts language patterns, doesn't reproduce expression. **Factor two:** Formally against fair use. Creative works. But minimal weight. Use was analytical, not expressive. **Factor three:** Favors fair use. Complete copying necessary. Outputs don't reproduce text. Users don't receive novels. **Factor four:** Strongly favors fair use. No market substitution. No established licensing market. No empirical harm. **Result:** Fair use as a matter of law. --- ### Kadrey v. Meta Platforms Inc. No. 3:23-cv-04647-VC (N.D. Cal. June 25, 2025) **What happened:** Meta trained LLaMA on novels from shadow libraries. Authors sued. Meta moved for summary judgment. **The court held:** Fair use. No infringement. **Why:** Transformative. Non-expressive. Non-substitutive. No proven harm. **Factor one:** Strongly favors fair use. Entirely new function. Statistical learning. Not reproduction. **Factor two:** Formally against fair use. Creative fiction. But weak force. Analytical use. **Factor three:** Favors fair use. Complete copying necessary. Outputs don't expose novels. Brief random fragments only. **Factor four:** Decisively favors fair use. No market displacement. No evidence. No licensing market. Speculation insufficient. **Result:** Fair use as a matter of law. --- ## The Principles ### Transformation Is Function-Specific Courts ask: Does the AI serve the same purpose as the original? **Same purpose = Not transformative.** Ross used headnotes for legal research. Westlaw uses headnotes for legal research. Not transformative. **Different purpose = Transformative.** LLMs extract statistical patterns. Novels deliver narrative. Transformative. Technology alone doesn't transform. Function determines transformation. --- ### Intermediate Copying When Non-Expressive Complete copying is permissible when: 1. Technologically necessary 2. Outputs don't expose protected expression Training requires complete works. Fragments won't work. But if outputs don't reproduce text, copying is intermediate. Distinction: Copying in memory vs. copying in output. Memory copying = permissible if outputs are non-substitutive. --- ### Market Harm Requires Evidence The most important factor. Requires proof, not speculation. **Acceptable evidence:** - Sales data - Consumer surveys - Economic analysis - Lost licensing revenue **Insufficient:** - Theoretical harm - Hypothetical markets - Speculation about future effects Bartz and Kadrey plaintiffs provided no evidence. Factor four favored defendants. --- ### Creative Works, Analytical Uses Creative nature weighs against fair use. But weight diminishes in analytical contexts. When use extracts statistical information rather than reproducing expression, creativity matters less. Fiction is creative. But using fiction to learn language patterns is analytical. Factor two carries minimal weight. --- ## The Divergence: Licensing Markets **Ross recognized a potential derivative market.** Ross sought a license. Thomson Reuters refused. Market exists. Harm cognizable. **Bartz and Kadrey rejected hypothetical markets.** No established licensing practice. No industry standards. No functioning market. Purely speculative. The difference: Established vs. hypothetical. --- ## The Framework **Step 1 (Factor One):** Does the AI serve the same function as the original? - Same function → Not transformative → Weighs against fair use - Different function → Transformative → Favors fair use **Step 2 (Factor Two):** Creative or factual work? - Creative work + expressive use → Weighs against fair use - Creative work + analytical use → Minimal weight **Step 3 (Factor Three):** How much was copied? How much is accessible? - Complete copying + necessary + outputs non-substitutive → Favors fair use - Complete copying + unnecessary or substitutive outputs → Weighs against fair use **Step 4 (Factor Four):** Market harm? - Direct competition → Weighs against fair use - No substitution + no evidence → Favors fair use - Established licensing market harmed → Weighs against fair use - Hypothetical licensing market → Insufficient --- ## The Boundary **AI training that competes with the original's commercial function fails.** - Ross built legal research to compete with legal research. - Same market. Same function. No transformation. - Fair use denied. **AI training that learns from works to generate new, non-substitutive outputs succeeds.** - LLMs extract patterns from novels. - Different market. Different function. Transformation. - Fair use granted. The line: Competitive substitution vs. analytical repurposing. --- ## Open Questions **Shadow libraries:** Does sourcing from pirated content affect fair use? Courts didn't penalize defendants. Question remains open. **Emergent markets:** When do licensing markets become cognizable? When established practices exist. Not before. **Hybrid pipelines:** What if training is analytical but outputs occasionally reproduce text? Framework needs refinement. **Congressional action:** Will Congress codify safe harbors? Create compulsory licenses? Require transparency? Unknown. --- ## The Takeaway Fair use adapts to AI without statutory changes. Courts apply the four-factor test. Transformation and market effect dominate. Evidence beats speculation. Training for competitive substitution fails. Training for analytical learning succeeds. That's the rule. --- ## Citations 17 U.S.C. § 107. Thomson Reuters Enters. Ctr. GmbH v. Ross Intelligence Inc., No. 1:20-cv-613-SB (D. Del. Feb. 11, 2025). Bartz v. Anthropic PBC, No. 3:23-cv-04648-WHO (N.D. Cal. June 23, 2025). Kadrey v. Meta Platforms Inc., No. 3:23-cv-04647-VC (N.D. Cal. June 25, 2025). Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569 (1994). Google LLC v. Oracle Am., Inc., 593 U.S. ___ (2021). Authors Guild v. Google, Inc., 804 F.3d 202 (2d Cir. 2015). Harper & Row Publs., Inc. v. Nation Enters., 471 U.S. 539 (1985). Sony Computer Ent. v. Connectix Corp., 203 F.3d 596 (9th Cir. 2000). Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508 (2023). Feist Publ'ns, Inc. v. Rural Tel. Serv. Co., 499 U.S. 340 (1991). Sega Enters. Ltd. v. Accolade, Inc., 977 F.2d 1510 (9th Cir. 1992). --- **Disclaimer** Informational purposes only. Not legal advice. No attorney-client relationship. Consult a licensed attorney for legal advice.