Model Evaluations Sumbol

News

Many safety evaluations for AI models have significant limitations

The study’s co-authors first surveyed academic literature to establish an overview of the harms and risks models pose today, and the state of existing AI model evaluations. They then interviewed ...

11don MSN

China's DeepSeek Upgrades R1 AI Model, Narrowing Gap with Western Counterparts

The latest version of the R1 model reportedly performs just below OpenAI's o3 and o4-mini, based on evaluations by ...

TechCrunch27d

OpenAI pledges to publish AI safety test results more often

OpenAI is moving to publish the results of its internal AI model safety evaluations more regularly in what the outfit is saying is an effort to increase transparency. On Wednesday, OpenAI launched ...

ZDNet2mon

OpenAI is pushing for industry-specific AI benchmarks - why that matters

and many others are missing a unified source of truth for model benchmarking." As a result, OpenAI will now work with multiple companies across each industry to develop those evaluations ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results