News
The study’s co-authors first surveyed academic literature to establish an overview of the harms and risks models pose today, and the state of existing AI model evaluations. They then interviewed ...
The latest version of the R1 model reportedly performs just below OpenAI's o3 and o4-mini, based on evaluations by ...
OpenAI is moving to publish the results of its internal AI model safety evaluations more regularly in what the outfit is saying is an effort to increase transparency. On Wednesday, OpenAI launched ...
and many others are missing a unified source of truth for model benchmarking." As a result, OpenAI will now work with multiple companies across each industry to develop those evaluations ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results