View Single Post
Staro 19.04.2025., 07:08   #169
tomek@vz
Premium
Moj komp
 
tomek@vz's Avatar
 
Datum registracije: May 2006
Lokacija: München/Varaždin
Postovi: 4,735
Citiraj:
OpenAI's latest reasoning models, o3 and o4-mini, hallucinate more frequently than the company's previous AI systems, according to both internal testing and third-party research. On OpenAI's PersonQA benchmark, o3 hallucinated 33% of the time -- double the rate of older models o1 (16%) and o3-mini (14.8%). The o4-mini performed even worse, hallucinating 48% of the time. Nonprofit AI lab Transluce discovered o3 fabricating processes it claimed to use, including running code on a 2021 MacBook Pro "outside of ChatGPT." Stanford adjunct professor Kian Katanforoosh noted his team found o3 frequently generates broken website links.

OpenAI says in its technical report that "more research is needed" to understand why hallucinations worsen as reasoning models scale up.
__________________
Lenovo LOQ 15AHP9 83DX || AMD Ryzen 5 8645HS / 16GB DDR5 / Micron M.2 2242 1TB / nVidia Geforce RTX 4050 / Windows 11 Pro
Lenovo Thinkpad L15 Gen 1 || Intel Core i5 10210U / 16GB DDR4 / WD SN730 256GB / Intel UHD / Fedora Workstation 42
tomek@vz je offline   Reply With Quote