Streaming News

Meta Admits Use of ‘Pirated’ Book Dataset to Train AI * TorrentFreak


meta logometa logoIn recent months, rightsholders of all ilks have filed lawsuits against companies that develop AI models.

The list includes record labels, individual authors, visual artists, and more recently the New York Times. These rightsholders all object to the presumed use of their work without proper compensation.

Several of the lawsuits filed by book authors include a piracy component as well. The cases allege that tech companies, including Meta and OpenAI, used the controversial Books3 dataset to train their models.

The Books3 dataset has a clear piracy angle. It was created by AI researcher Shawn Presser in 2020, who scraped the library of ‘pirate’ site Bibliotik. This book archive was publicly hosted by digital archiving collective ‘The Eye‘ at the time, alongside various other data sources.

Bibliotik and other sources previously hosted at The Eye

the eyethe eye

The general vision was that the plaintext collection of more than 195,000 books, which is nearly 37GB in size, could help AI…

You can read the full Torrent Freak article here

Related Articles

HOW TO LOGIN TWO SERVICES SIMULTANEOUSLY ON TIVIMATE 3.0.1| ADD 2ND PLAYLIST 2020 | BEST IPTV PLAYER

Top Tutorials

HUGE FIRESTICK MOVIE & TV STREAMING APP !!

Top Tutorials

Install Sport HD Kodi Addon on Firestick/Android

Top Tutorials

Leave a Comment