The Atlantic's searchable database of AI music training data
TECH

The Atlantic's searchable database of AI music training data

31+
Signals

Strategic Overview

  • 01.
    The Atlantic published a searchable database, reported by staff writer Alex Reisner around June 15, 2026, exposing four music datasets that together contain more than 21 million tracks used to train generative AI music models. The tool pulls a previously opaque part of the AI pipeline into public view.
  • 02.
    The four datasets break down to roughly 12 million tracks in one, 9 million in another, and two smaller sets of about 100,000 songs each. The largest is LAION-DISCO-12M, released in November 2024, and one smaller set draws from the Free Music Archive. The databases include hit tracks from major artists including Taylor Swift, Bad Bunny, Nirvana, Billie Eilish, Pearl Jam, and the Beatles.
  • 03.
    Google and Stability AI are named as confirmed users that drew tracks from the Free Music Archive, while the datasets have been downloaded thousands of times, suggesting a far wider network of developers training on this music. The searchable tool lets artists and rights holders check whether their own work was included without consent.

Evidence, not allegation: the debate just got a search bar

For years the AI music fight ran on inference: labels argued models could only sound this good if they had ingested copyrighted recordings, and developers stayed quiet about their data. The Atlantic's tool collapses that ambiguity by making a previously opaque part of the pipeline publicly examinable [1]. The most consequential shift is in who gets named. This is not just two startups: Google and Stability AI are identified as confirmed users that drew tracks from the Free Music Archive [4], dragging well-resourced incumbents into the same consent and copyright scrutiny that has dogged Suno and Udio. And the datasets have been downloaded thousands of times, implying a network of developers far wider than the named companies [4].

The licensing gray zone that makes 21 million songs 'available'

The licensing gray zone that makes 21 million songs 'available'
Two datasets (LAION-DISCO-12M and a 9M-track set) account for nearly all of the 21 million-plus tracks The Atlantic uncovered across four AI music training datasets.

The mechanism here is a gap between what is accessible and what is licensed. The Free Music Archive offers tracks free for personal streaming, but those rights do not extend to commercial AI training [2]. Similarly, LAION-DISCO-12M, the largest set at roughly 12 million tracks, was released in November 2024 for stated research purposes yet circulates among commercial developers [2]. That gap is what lets more than 21 million tracks become de facto training fuel without anyone formally clearing rights. The money side is already visible: the indie duo The American Dollar alleged that Suno cut their licensing revenue by nearly 80% [2], while Suno itself acknowledged training on copyrighted music and leaned on a fair-use defense [5].

Provenance is becoming product risk, not just an ethics problem

The sharpest community read came from Reddit's reception, which reframed the story around supply chains rather than morality: if nobody can prove what went into a model, every launch inherits a legal supply-chain problem, and the datasets AI companies treated like exhaust are now being treated by courts like inventory. That framing matters because it ties directly to the litigation timeline. UMG settled with Udio in October 2025 and Warner settled with both Suno and Udio in November 2025, while Sony kept fighting [2]. A searchable, song-level record of what is inside these datasets gives the remaining plaintiffs and any new claimants concrete, examinable proof rather than circumstantial argument [2], and it explains why fresh evidence is being described as fuel for the copyright fight [6].

Why artists are finding out from a journalist

The reception across X, YouTube, and Reddit converged on one thing: musicians are discovering their own catalogs in the data only because a reporter built the tool. Ed Newton-Rex found recordings he sang on with the King's College Choir, and rapper Open Mike Eagle found 184 of his songs on stream. Named artists reacted the same way in the reporting: Tre Mission said he was certain he never consented, and Lunice did not know his work was included until told [3]. The consent problem is concrete and personal, with entries running into the hundreds for individual artists such as Drake (~800), The Weeknd (460), and Justin Bieber (458) [3]. YouTube coverage featuring an entertainment attorney also pointed at distributors like DistroKid and TuneCore as a plausible leak path, sharpening the question of how unconsented work ended up in these sets at all.

Historical Context

2020
Scraped roughly 1.2 million songs for its Jukebox music-generation platform.
2022
Trained a music model on about 44 million songs.
2024-06
Filed landmark copyright infringement suits against Suno (Massachusetts) and Udio (New York) alleging mass infringement in AI training.
2024-08
Acknowledged training on copyrighted music in a court filing but argued the practice is protected fair use.
2024-11
Released LAION-DISCO-12M (~12 million tracks) for stated research and academic purposes.
2025-10
UMG settled its litigation with Udio.
2025-11
Warner settled with both Suno and Udio, while Sony continued litigating.
2026-06-15
Published the searchable database of 21M+ songs across four AI music training datasets.

Power Map

Key Players
Subject

The Atlantic's searchable database of AI music training data

TH

The Atlantic / Alex Reisner

Publisher and reporter who built and published the searchable database, shifting the training-data debate from abstract argument to examinable evidence.

GO

Google

Named as a confirmed user that drew tracks from the Free Music Archive; reportedly trained a music model on 44 million songs in 2022.

ST

Stability AI

Named as a confirmed user of Free Music Archive tracks in research work.

SU

Suno and Udio

Generative AI music platforms at the center of the legal fight; Suno admitted in a 2024 court filing that it trained on copyrighted music, claiming fair use. The two face at least 12 lawsuits.

RI

RIAA (for UMG, Sony, Warner)

Filed landmark copyright suits against Suno and Udio in June 2024 alleging mass infringement of sound recordings used in training.

AF

Affected artists

Musicians including Tre Mission, Lunice, Drake, and The Weeknd discovered their work in the datasets via the searchable tool; several said they never consented.

Fact Check

6 cited
  1. [1] Investigation by The Atlantic reveals many millions of songs used for AI music training
  2. [2] Four music datasets holding millions of tracks are being shared among AI developers, The Atlantic reports
  3. [3] The Atlantic's AI database and Canadian artists
  4. [4] The Atlantic Exposes 21M Songs in AI Training Data Searchable DB
  5. [5] AI music startup Suno responds to RIAA lawsuit
  6. [6] New evidence fuels copyright fight against AI music companies

Source Articles

Top 3

THE SIGNAL.

Analysts

"Says he never consented to the use and is firmly opposed to AI in music."

Tre Mission
Recording artist (20 songs found in the database)

"Was unaware his work had been included until informed of it."

Lunice
Recording artist and producer (93+ entries found)

"Notes that AI-generated music has become a mainstream, sizeable phenomenon on streaming platforms rather than a marginal one."

Alexis Lanternier
CEO, Deezer
The Crowd

"The Atlantic just released a tool that lets you see if your music has been scraped for AI training. Recordings I sang on in King's College Choir are in there. So is the music of millions of other musicians. Great work by @_alexreisner. Check it here: https://t.co/2oT8tLeCJM"

@@ednewtonrex1107

"Late last night I found out over 100+ songs from our catalog were used to train AI models. Thanks to The Atlantic, they leaked a database of millions of songs that have been used by the biggest AI music companies like Udio and Suno. To be honest, until the major labels go https://t.co/3d2cmei0u9"

@@VinceValholla195

"The Atlantic built a searchable database of music used to train AI models Reporter Alex Reisner identified four datasets used to train AI music models and made them publicly searchable. Two of the collections are substantial: one contains 12 million tracks, another 9 million."

@@cre8or_aihub2

"Investigation by The Atlantic reveals many millions of songs used for AI music training"

@u/Plastic_Ninja_9014575
Broadcast
Atlantic AI Watchdog: Was Your Music Used To Train AI? (ft. @TopMusicAttorney )

Atlantic AI Watchdog: Was Your Music Used To Train AI? (ft. @TopMusicAttorney )

184 of my songs (and one random dumb youtube video) were used to train AI

184 of my songs (and one random dumb youtube video) were used to train AI

Is AI Ruining Music?

Is AI Ruining Music?