Could you pass ‘Humanity’s Last Exam’? Probably not, but neither can AI


Did you know some of the smartest people on the planet create benchmarks to test AI’s capabilities at replicating human intelligence? Well, scarily enough most AI benchmarks are easily completed by artificial intelligence models, showcasing just how smart the likes of ChatGPT’s GPT-4o, Google Gemini’s 1.5, and even the new o3-mini really are.

In the quest to create the hardest benchmark possible, Scale AI and the Center for AI Safety (CAIS) have teamed up to create Humanity’s Last Exam, a test they’re calling a “groundbreaking new AI benchmark that was designed to test the limits of AI knowledge at the frontiers of human expertise.”



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *