PC Ekspert Forum - View Single Post

Exy · 24.06.2025., 15:46

https://www.livescience.com/technolo...ld-we-even-try

"Research conducted by OpenAI found that its latest and most powerful reasoning models, o3 and o4-mini, hallucinated 33% and 48% of the time, respectively, when tested by OpenAI's PersonQA benchmark. That's more than double the rate of the older o1 model. While o3 delivers more accurate information than its predecessor, it appears to come at the cost of more inaccurate hallucinations."

48% lol.
Shvaćam da živimo u uznapredovalom stadiju clown worlda kojem se ne nazire kraj, ali svejedno bih volio da mi netko objasni kako to llm čudo može biti "more accurate" od starijeg modela ako istovremeno outputa duplo više besmislica od starijeg modela.

24.06.2025., 15:46	#231
Exy Premium Moj komp Datum registracije: Sep 2006 Lokacija: Zagreb, Črnomerec Postovi: 2,510	https://www.livescience.com/technolo...ld-we-even-try "Research conducted by OpenAI found that its latest and most powerful reasoning models, o3 and o4-mini, hallucinated 33% and 48% of the time, respectively, when tested by OpenAI's PersonQA benchmark. That's more than double the rate of the older o1 model. While o3 delivers more accurate information than its predecessor, it appears to come at the cost of more inaccurate hallucinations." 48% lol. Shvaćam da živimo u uznapredovalom stadiju clown worlda kojem se ne nazire kraj, ali svejedno bih volio da mi netko objasni kako to llm čudo može biti "more accurate" od starijeg modela ako istovremeno outputa duplo više besmislica od starijeg modela.