That jaw-dropping Sora dragon on X? Slam the brakes. It’s not innovation; it’s industrialised plagiarism at hyperscale.

Introduction

Scroll through social media and you will find breath-taking clips from OpenAI’s Sora or photorealistic portraits from Midjourney. They look like the future, yet every pixel and syllable sits on top of someone else’s graft. Foundation models vacuum up billions of pictures, songs and scripts without permission, then sell access to the remix. The transaction is hidden, but the pattern is familiar: extract value at the bottom, concentrate profit at the top.

In what follows I trace the legal, economic and psychological contours of this cultural land-grab, using recent data rather than marketing slides. The argument unfolds in a single arc, moving from the origin of the datasets, through the mounting lawsuits, into the macro-economics of compute, and finally towards the question of public policy.

1. The Training-Data Problem

Generative models are not born creative; they are born hungry. They must ingest colossal amounts of data before they can produce a single convincing frame. Scraping is therefore not an incidental step but the beating heart of the business model.

European lawmakers tried to regulate the practice with the Artificial Intelligence Act, yet Article 3 quietly preserves a blanket exemption for “text and data mining” provided the right-holder has not opted out. Axel Voss, the German MEP who helped craft the 2019 Copyright Directive, now calls that exemption “irresponsible”, warning that it converts private creative effort into free industrial feedstock (theguardian.com)

This is enclosure in reverse. Historically, elites privatised common land; today, platforms communalise private culture while keeping the compute stack proprietary.

2. Hollywood Strikes Back

On 11 June 2025 Disney and NBCUniversal filed a 110-page complaint against Midjourney in a California federal court. The studios allege that the company trained on protected frames, then served paying users fully branded characters such as Elsa, Yoda and Shrek. Intellectual-property scholars note that Midjourney’s own “safe-mode” filters prove the firm can exclude content when it chooses, undermining any fair-use defence. (wired.com egyptindependent.com)

The case matters because Hollywood employs the most formidable copyright lawyers on earth. If Midjourney cannot persuade them to licence, smaller creators have no chance. A preliminary injunction is possible before the year ends, placing tangible commercial risk on any model that ignores licences.

3. The Music Lawsuits

Film is not alone. In June 2024 the three major labels sued Suno and Udio for ingesting 2,332 sound recordings. Damages could reach one hundred and fifty thousand dollars per track. Even if the suits settle, they set a price signal that makes unlicensed training materially expensive. (reuters.com)

Economic substitution explains the ferocity. A single GPU cluster can already churn out radio-length pop songs every few minutes. At that speed the marginal cost of an additional track is near zero, so the only remaining scarcity is the legal right to publish. Suing is rational self-preservation.

If you want to understand why music rights are worth fighting over and what the economics of catalogue ownership actually look like, this guide to music catalogue investment breaks it down.

4. The Limits of Sora’s Magic

Promotional reels depict Sora as cinema without cinematographers, yet early testers report rubber-limbed characters and physics errors such as biscuits that refuse to lose bite marks. The Atlantic observes that OpenAI has released astonishing samples but very little technical detail, feeding scepticism about real-world reliability. (theatlantic.com)

Computer-graphics research explains why. Diffusion models optimise each frame against a statistical target, not against Newtonian constraints across time. Engineers can bolt constraints on later, but every patch raises inference cost, so the “cheap video” dream recedes.

5. Medium’s Sludge and the Discovery Crisis

When Wired sampled 274,466 Medium articles it found that forty-seven per cent were likely generated by AI (wired.com). The result is a digital commons littered with derivative prose, challenging human readers to sift novelty from noise.

6. Audiences Smell the Synthetic

In a Nielsen study reported by Forbes fifty-five per cent of surveyed consumers said they felt uneasy when content was AI-generated, citing authenticity concerns. Brands that hide provenance therefore gamble with trust capital that took decades to build. (forbes.com)

Communication scholars call this the authenticity premium. When origin is ambiguous, audiences discount emotional value and monetary value in tandem. That has direct revenue implications for advertising and subscription models.

7. Winner Take Compute

While creatives debate royalties Nvidia quietly became the most valuable company in the world in June 2024 at roughly three point three trillion dollars (reuters.com). Classical labour economics predicts surplus flows to the bottleneck factor, in this case high-end silicon and the cloud racks that house it.

The implication is stark. If models can approximate the style of thousands of anonymous mid-tier professionals, their bargaining power collapses. Value migrates to infrastructure owners and to a thin layer of superstar personalities whose uniqueness remains hard to clone.

8. Curation Replaces Creation

A Wondercraft survey of five hundred creators shows that eighty-three per cent already insert AI somewhere in the workflow (digiday.com). The headline sounds like liberation, yet ethnographic follow-up reveals mornings lost to deleting unusable variants and afternoons spent fact-checking hallucinations. Automation has not removed labour; it has shifted it from making to moderating, a form of invisible toil that seldom appears in productivity metrics.

9. Why Human Insight Still Matters

Neuroscientists published a May 2025 Nature Communications paper showing that “aha” moments activate visual cortex, hippocampus and amygdala simultaneously, binding memory and emotion (nature.com psypost.org). Large language models possess neither autobiographical memory nor feeling states, so they simulate pattern but not lived stakes.

This gap is not poetry, it is physiology. Until machines own bodies that can fail, they will not experience suspense, regret or catharsis. They will only mimic the textual residue of those states.

10. Policy Choices

Policymakers face three clear options. Opt in with enforceable licensing that pays creators. Maintain the current opt-out loophole and invite more lawsuits. Or encourage creator-owned domain models trained on consensual data. The European Union claims to strike a balance yet continues to allow commercial text and data mining by default, a position creators label devastating (theguardian.com).

Legislatures move slowly, so the near-term battlefield is the court docket. Disney versus Midjourney and Suno versus the labels will shape de-facto norms long before new statutes land.

A Tale of Two Futures

Worst-case dystopia
Picture 2035 after a decade of unlicensed scraping and political dithering. Foundation models have flooded every feed with infinite synthetic media that feels polished yet hollow. Search engines return ten variants of the same AI article before a single human insight. Studio backlots are empty because deepfake franchises dominate streaming. Session musicians, matte painters and mid-tier writers now stack supermarket shelves. Schools teach prompt engineering instead of critical writing because meaning has been pushed aside by speed. Meanwhile the only people paid for creativity are the compute landlords who rent GPUs by the second. Culture has become fast fashion for the mind and it is already in the clearance bin.

Best-case renaissance
Now imagine regulators close the text and data mining loophole in 2026. Training requires licensed datasets and automatic royalty pipes route revenue back to the source. Transparent provenance tags let audiences filter for work that is fully human, AI assisted or machine generated, restoring trust overnight. Small creator-owned models bloom, an orchestral model trained by musicians who share in every sync fee, a graphic novel engine funded by illustrators who receive micro-royalties on each panel. Instead of drowning in sludge, platforms surface fewer but richer stories because the economic incentives reward depth. GPUs still hum, but they hum in service to people who tell them where to look, not in place of them.

Both futures are still on the table. The path we walk will be decided less by algorithms than by the collective impatience or resolve of the creative majority. Choose, and choose loudly.

Culture Heist: Why AI Is the Biggest Theft in Human History