AI Beats Humans on Unsolved Math Problem

Large language model does better than human mathematicians trying to solve combinatorics problems inspired by the card game Set

By Davide Castelvecchi & Nature magazine

Thumb holding 6 white playing cards with symbols with green table backround. — In the game Set, players must identify combinations of cards based on the shape, colour, shading and number of symbols.

Valery Voennyy/Alamy Stock Photo

The card game Set has long inspired mathematicians to create interesting problems.

Now, a technique based on large language models (LLMs) is showing that artificial intelligence (AI) can help mathematicians to generate new solutions.

The AI system, called FunSearch, made progress on Set-inspired problems in combinatorics, a field of mathematics that studies how to count the possible arrangements of sets containing finitely many objects. But its inventors say that the method, described in Nature on 14 December¹, could be applied to a variety of questions in maths and computer science.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

“This is the first time anyone has shown that an LLM-based system can go beyond what was known by mathematicians and computer scientists,” says Pushmeet Kohli, a computer scientist who heads the AI for Science team at Google Deepmind in London. “It’s not just novel, it’s more effective than anything else that exists today.”

This is in contrast to previous experiments, in which researchers have used LLMs to solve maths problems with known solutions, says Kohli.

Mathematical chatbot

FunSearch automatically creates requests for a specially trained LLM, asking it to write short computer programs that can generate solutions to a particular mathematical problem. The system then checks quickly to see whether those solutions are better than known ones. If not, it provides feedback to the LLM so that it can improve at the next round.

“The way we use the LLM is as a creativity engine,” says DeepMind computer scientist Bernardino Romera-Paredes. Not all programs that the LLM generates are useful, and some are so incorrect that they wouldn’t even be able to run, he says. But another program can quickly toss the incorrect ones away and test the output of the correct ones.

The team tested FunSearch on the ‘cap set problem’. This evolved out of the game Set, which was invented in the 1970s by geneticist Marsha Falco. The Set deck contains 81 cards. Each card displays one, two or three symbols that are identical in colour, shape and shading — and, for each of these features, there are three possible options. Together, these possibilities add up to 3 × 3 × 3 × 3 = 81. Players have to turn over the cards and spot special combinations of three cards called sets.

Mathematicians have shown that players are guaranteed to find a set if the number of upturned cards is at least 21. They have also found solutions for more-complex versions of the game, in which abstract versions of the cards have five or more properties. But some mysteries remain. For example, if there are n properties, where n is any whole number, then there are 3ⁿ possible cards — but the minimum number of cards that must be revealed to guarantee a solution is unknown.

This problem can be expressed in terms of discrete geometry. There, it is equivalent to finding certain arrangements of three points in an n-dimensional space. Mathematicians have been able to put bounds on the possible general solution — given n, they have found that the required number of ‘cards on the table’ must be greater than that given by a certain formula, but smaller than that given by another.

Human–machine collaboration

FunSearch was able to improve on the lower bound for n = 8 by generating sets of cards that satisfy all the requirements of the game. “We don’t prove that we cannot improve over that, but we do get a construction that goes beyond what was known before,” says DeepMind computer scientist Alhussein Fawzi.

One important feature of FunSearch is that people can see the successful programs created by the LLM and learn from them, says co-author Jordan Ellenberg, a mathematician at the University of Wisconsin–Madison. This sets the technique apart from other applications, in which the AI is a black box.

“What’s most exciting to me is modelling new modes of human–machine collaboration,” Ellenberg adds. “I don’t look to use these as a replacement for human mathematicians, but as a force multiplier.”

This article is reproduced with permission and was first published on December 14, 2023.