View Single Post
Staro 02.04.2025., 09:18   #157
tomek@vz
Premium
Moj komp
 
tomek@vz's Avatar
 
Datum registracije: May 2006
Lokacija: München/Varaždin
Postovi: 4,741
I opet...


Citiraj:
A new paper [PDF] from the AI Disclosures Project claims OpenAI likely trained its GPT-4o model on paywalled O'Reilly Media books without a licensing agreement. The nonprofit organization, co-founded by O'Reilly Media CEO Tim O'Reilly himself, used a method called DE-COP to detect copyrighted content in language model training data.

Researchers analyzed 13,962 paragraph excerpts from 34 O'Reilly books, finding that GPT-4o "recognized" significantly more paywalled content than older models like GPT-3.5 Turbo. The technique, also known as a "membership inference attack," tests whether a model can reliably distinguish human-authored texts from paraphrased versions.

"GPT-4o [likely] recognizes, and so has prior knowledge of, many non-public O'Reilly books published prior to its training cutoff date," wrote the co-authors, which include O'Reilly, economist Ilan Strauss, and AI researcher Sruly Rosenblat.
__________________
Lenovo LOQ 15AHP9 83DX || AMD Ryzen 5 8645HS / 16GB DDR5 / Micron M.2 2242 1TB / nVidia Geforce RTX 4050 / Windows 11 Pro
Lenovo Thinkpad L15 Gen 1 || Intel Core i5 10210U / 16GB DDR4 / WD SN730 256GB / Intel UHD / Fedora Workstation 42
tomek@vz je offline   Reply With Quote