11.04.2025., 13:05
|
#163
|
Premium
Datum registracije: May 2006
Lokacija: München/Varaždin
Postovi: 4,808
|
Citiraj:
Some of the best AI models today still struggle to resolve software bugs that wouldn't trip up experienced devs. TechCrunch: A new study from Microsoft Research, Microsoft's R&D division, reveals that models, including Anthropic's Claude 3.7 Sonnet and OpenAI's o3-mini, fail to debug many issues in a software development benchmark called SWE-bench Lite. The results are a sobering reminder that, despite bold pronouncements from companies like OpenAI, AI is still no match for human experts in domains such as coding.
The study's co-authors tested nine different models as the backbone for a "single prompt-based agent" that had access to a number of debugging tools, including a Python debugger. They tasked this agent with solving a curated set of 300 software debugging tasks from SWE-bench Lite.
According to the co-authors, even when equipped with stronger and more recent models, their agent rarely completed more than half of the debugging tasks successfully. Claude 3.7 Sonnet had the highest average success rate (48.4%), followed by OpenAI's o1 (30.2%), and o3-mini (22.1%).
|
Citiraj:
Meta says in its Llama 4 release announcement that it's specifically addressing "left-leaning" political bias in its AI model, distinguishing this effort from traditional bias concerns around race, gender, and nationality that researchers have long documented. "Our goal is to remove bias from our AI models and to make sure that Llama can understand and articulate both sides of a contentious issue," the company said.
"All leading LLMs have had issues with bias -- specifically, they historically have leaned left," Meta stated, framing AI bias primarily as a political problem. The company claims Llama 4 is "dramatically more balanced" in handling sensitive topics and touts its lack of "strong political lean" compared to competitors.
|
AI bi morao biti objektivan...kakti.
__________________
Lenovo LOQ 15AHP9 83DX || AMD Ryzen 5 8645HS / 16GB DDR5 / Micron M.2 2242 1TB / nVidia Geforce RTX 4050 / Windows 11 Pro
Lenovo Thinkpad L15 Gen 1 || Intel Core i5 10210U / 16GB DDR4 / WD SN730 256GB / Intel UHD / Fedora Workstation 42
|
|
|