View Single Post
Staro 20.02.2025., 06:35   #125
tomek@vz
Premium
Moj komp
 
tomek@vz's Avatar
 
Datum registracije: May 2006
Lokacija: München/Varaždin
Postovi: 4,721
Citiraj:
Leading AI models can fix broken code, but they're nowhere near ready to replace human software engineers, according to extensive testing [PDF] by OpenAI researchers. The company's latest study put AI models and systems through their paces on real-world programming tasks, with even the most advanced models solving only a quarter of typical engineering challenges.

The research team created a test called SWE-Lancer, drawing from 1,488 actual software fixes made to Expensify's codebase, representing $1 million worth of freelance engineering work. When faced with these everyday programming tasks, the best AI model â" Claude 3.5 Sonnet -- managed to complete just 26.2% of hands-on coding tasks and 44.9% of technical management decisions.

Though the AI systems proved adept at quickly finding relevant code sections, they stumbled when it came to understanding how different parts of software interact. The models often suggested surface-level fixes without grasping the deeper implications of their changes.

The research, to be sure, used a set of complex methodologies to test the AI coding abilities. Instead of relying on simplified programming puzzles, OpenAI's benchmark uses complete software engineering tasks that range from quick $50 bug fixes to complex $32,000 feature implementations. Each solution was verified through rigorous end-to-end testing that simulated real user interactions, the researchers said.
__________________
Lenovo LOQ 15AHP9 83DX || AMD Ryzen 5 8645HS / 16GB DDR5 / 1TB Micron M.2 2242 / nVidia Geforce RTX 4050 / Windows 11 Pro
Lenovo Thinkpad L15 Gen 1 || Intel Core i5 10210U / 16GB DDR4 / WD SN730 256GB / Intel UHD / Fedora Workstation 42
tomek@vz je offline   Reply With Quote