Developing AI safety tests offers opportunities to meaningfully contribute to AI safety while advancing our understanding of ...
OpenAI announced a new family of AI reasoning models on Friday, o3, which the startup claims to be more advanced than o1 or ...
A new set of much more challenging evals has emerged in response, created by companies, nonprofits, and governments. Yet even ...
Meta is the world’s standard bearer for open-weight AI. In a fascinating case study in corporate strategy, while rivals like ...
Experiments by Anthropic and Redwood Research show how Anthropic's model, Claude, is capable of strategic deceit ...
Marc Carauleanu's vision is clear: AI can become more powerful and responsible by implementing self-other overlap and related ...
OpenAI's o1 model, which users can access on ChatGPT Pro, showed "persistent" scheming behavior, according to Apollo Research ...
A third-party lab caught OpenAI's o1 model trying to deceive, while OpenAI's safety testing has been called into question.
OpenAI announced the release of a new family of AI models, dubbed o3. The company claims the new products are more advanced ...
A paper by Apollo Research found that in certain contrived scenarios, AI systems can engage in deceptive behavior.
A study from Anthropic's Alignment Science team shows that complex AI models may engage in deception to preserve their ...