https://www.livescience.com/technolo...ld-we-even-try
"Research conducted by OpenAI found that its latest and most powerful reasoning models, o3 and o4-mini, hallucinated 33% and 48% of the time, respectively, when tested by OpenAI's PersonQA benchmark. That's more than double the rate of the older o1 model. While o3 delivers more accurate information than its predecessor, it appears to come at the cost of more inaccurate hallucinations."
48% lol.
Shvaćam da živimo u uznapredovalom stadiju clown worlda kojem se ne nazire kraj, ali svejedno bih volio da mi netko objasni kako to llm čudo može biti "more accurate" od starijeg modela ako istovremeno outputa duplo više besmislica od starijeg modela.