Streaming News

Meta Admits Use of ‘Pirated’ Book Dataset to Train AI * TorrentFreak

meta logometa logoIn recent months, rightsholders of all ilks have filed lawsuits against companies that develop AI models.

The list includes record labels, individual authors, visual artists, and more recently the New York Times. These rightsholders all object to the presumed use of their work without proper compensation.

Several of the lawsuits filed by book authors include a piracy component as well. The cases allege that tech companies, including Meta and OpenAI, used the controversial Books3 dataset to train their models.

The Books3 dataset has a clear piracy angle. It was created by AI researcher Shawn Presser in 2020, who scraped the library of ‘pirate’ site Bibliotik. This book archive was publicly hosted by digital archiving collective ‘The Eye‘ at the time, alongside various other data sources.

Bibliotik and other sources previously hosted at The Eye

the eyethe eye

The general vision was that the plaintext collection of more than 195,000 books, which is nearly 37GB in size, could help AI…

You can read the full Torrent Freak article here

Related Articles

UK ISP Will Ban Two Million Kids From Accessing Pirate Sites

Top Tutorials

FitGirl Game Repacker May Be The Most Productive Pirate Online Today * TorrentFreak

Top Tutorials

UPDATE IPTV case takes an interesting turn 2022

Top Tutorials

Leave a Comment