Home BlockchainOpenAI Says Benchmark Used to Measure AI Coding Skill Is 'Contaminated'—Here's Why

OpenAI Says Benchmark Used to Measure AI Coding Skill Is 'Contaminated'—Here's Why

by Decrypt
0 comments
OpenAI Says Benchmark Used to Measure AI Coding Skill Is 'Contaminated'—Here's Why

In brief

  • OpenAI argues that SWE-bench Verified no longer reflects real coding ability because the benchmark is allegedly contaminated.
  • It is now pushing SWE-bench Pro as tougher replacement.
  • Scores plunged from ~70% to ~23% on the newer benchmark,

The number that every major AI lab has been using to claim coding supremacy was just declared meaningless.

OpenAI published a post this week announcing that SWE-bench Verified, the go-to benchmark for measuring AI coding capabilities, is so riddled with flawed…

You may also like

Leave a Comment