Large language models are better than humans at answering chemistry questions

Close up of head half human and half robot

Source: © Nick Lowndes/Ikon Images

AI models outperform human chemists in every topic area. But are they really better chemists?

A new framework provides a way to assess how well text-generating AI models can answer chemistry questions. The researchers that developed the model show that large language models (LLMs) consistently out-perform human chemists across all topics in a new preprint report, which hasn’t yet been peer-reviewed. However, the assessment also highlighted the limitations of these systems, including the LLMs’ inability to reliably apply chemical reasoning or accurately evaluate their own performance. The team hopes that ChemBench will be a ‘stepping stone’ to developing improved AI systems and more robust evaluative tools in future.