OpenAI recently introduced SimpleQA, a new benchmark for evaluating the factual accuracy of large language models (LLMs) that underpin generative AI (genAI). Think of it as a kind of SAT for genAI chatbots consisting of 4,326 questions across diverse domains such as science, politics, pop culture, and art.
Source: ComputerWorld