An Alternative to Conventional Neural Networks Could Help Reveal What AI Is Doing behind the Scenes

Despite their performance, current AI models have major weaknesses: they require enormous resources and are indecipherable. Help may be on the way

Artificial intelligence neural network layers, conceptual illustration

Neural network layers.

Thom Leach/Science Photo Library/Getty Images

ChatGPT has triggered an onslaught of artificial intelligence hype. The arrival of OpenAI’s large-language-model-powered (LLM-powered) chatbot forced leading tech companies to follow suit with similar applications as quickly as possible. The race is continuing to develop a powerful AI model. Meta came out with an LLM called Llama at the beginning of 2023, and Google presented its Bard model (now called Gemini) last year as well. Other providers, such as Anthropic, have also delivered impressive AI applications.

The new LLMs are anything but perfect, however: A lot of time and computing power are needed to train them. And it is usually unclear how they arrive at their results. In fact, current AI models are like a black box. You enter something, and they deliver an output without any accompanying explanation. This makes it difficult to figure out whether a program is making something up (“hallucinating”) or providing a meaningful answer. Most companies focus on achieving reliable results by training the models with even more data or optimizing them for specific tasks, such as solving mathematical problems.

The basic principle of AI models generally remains untouched, however: the algorithms are usually based on neural networks, which are modeled on the visual cortex of our brain. But a team of experts led by physicist Ziming Liu of the Massachusetts Institute of Technology has now developed an approach that surpasses conventional neural networks in many respects. As the researchers reported in late April in a preprint paper that has not yet been peer-reviewed, so-called Kolmogorov-Arnold networks (KANs) can master a wide range of tasks much more efficiently and solve scientific problems better than previous approaches. And probably the biggest advantage is that their results can be reproduced. The experts hope to be able to integrate KANs into LLMs to enhance their performance.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


“It is important to look for more efficient, more interpretable and less training-intensive structures for AI in mathematics,” says mathematician Geordie Williamson of the University of Sydney, who was not involved in the work.

There are many different machine-learning algorithms, such as so-called decision trees and linear regression. Since the 2010s, however, neural networks have been incorporated in most applications. These programs’ structure is based on that of the visual cortex of mammals. Several computing units (neurons) are arranged in layers, with one behind the other, and connected by edges (synapses). A signal propagates from front to back and is processed at each layer. Although the idea for such programs dates back to the 1950s, it was not until the 2010s that computers were powerful enough to run them successfully.

This is because neural networks require extensive training for their inputs (such as pixels in an image) to produce the appropriate output (such as a description of the image). For the training, the input values are transferred to the “neurons” of the first layer. These are then multiplied by so-called weights (numerical values) of the relevant “synapses.” If the product reaches a certain threshold value, it is passed on to the next layer. The neurons of the second layer then adopt the synapses’ incoming numerical values from the first layer. Then the process continues: the neurons of the second layer are multiplied by the weights of the subsequent synapses, which are passed on to the third layer, and so on until the signal reaches the final output layer. During training, the neural network adjusts the weights of the synapses so that an input produces the desired output.

Here, in more detail, is how the process works: The first layer of neurons (say, n1, n2 and n3) corresponds to the input. The program is given values to process, such as the pixels of an image. Each synapse has a weight, which is multiplied by the value of the preceding neuron. If the product reaches a certain threshold value, the result is passed on. The second layer of neurons then receives the corresponding forwarded products. If several synapses lead to a neuron, the corresponding products are added together.

In this way, the input values are processed layer by layer until they produce an output at the last layer. The weights of the synapses must be adjusted so that the neural network can fulfill its task, such as, say, providing a suitable description of an image. The extensive training process uses hundreds of thousands of inputs of sample data, and a network can select the weights so that it reliably fulfills its task.

The principle behind neural networks ultimately reduces to a simple mathematical task. You want to generate an output (y)—for example, the image description—for certain input data (x1, x2, x3,...), such as the image pixels. Therefore, you’re looking for the appropriate function: f(x1, x2, x3,...) = y. The aim is to determine a function that provides a corresponding description for each type of image. The function itself is terribly complex, and an exact solution can seem hopeless.

Neural networks, however, offer the possibility of approximating the function using simple expressions. In principle, a neural network consists of nothing more than a concatenation of linear terms: the values of neurons are multiplied by the weights of synapses and added together. From a mathematical point of view, the question of which functions a neural network can represent arises. In fact, which functions can a neural network represent at all? What if the functions are so complicated that they defy simple representation? An important result here is the “universal approximation theorem,” which addresses this question. In recent years experts have been able to prove the minimum number of layers a neural network must consist of in order to approximate a certain type of function—and thus solve a desired task satisfactorily.

In fact, there is a mathematical result that allows complicated functions of the type f(x1, x2, x3,...) to be expressed precisely in simpler terms—and not merely approximated, as is the case with conventional neural networks. The basis for this is a theorem developed by mathematicians Vladimir Arnold and Andrey Kolmogorov in the 1960s. According to this theorem, a function that depends on numerous inputs (x1, x2, x3,...) can be expressed precisely by a sum of functions: one can add g1(x1), g2(x2), g3(x3), ..., each of which depends on only one variable. This may still seem complex at first glance, but from a mathematical perspective, it represents a drastic simplification. This is because it is extremely difficult to work with functions that depend directly on countless variables such as x1, x2, x3,...

Liu’s team has now used Arnold and Kolmogorov's theorem to develop KANs, which deliver more accurate and comprehensible results. “The Kolmogorov-Arnold representation theorem is not unknown in the neural network community,” says computer scientist Kristian Kersting of the Technical University of Darmstadt in Germany, who was not involved in Liu and his colleagues’ latest research. In the 1980s and 1990s experts assumed that this approach could not be used for neural networks. Although that view has changed in recent years, a direct implementation of the principle has so far failed.

The structure of KANs is similar to that of conventional neural networks. The weights do not have a fixed numerical value, however. Instead they correspond to a function: w(x). This means that the weight (w) of the synapse depends on the value (x) of the preceding neuron. During training, the neural network therefore does not learn to adapt the weights as pure numerical values but rather as the associated functions of the synapses. In this way, it is at least theoretically possible to represent a highly complex function, f(x1, x2, x3,...), by a finite network—and thus solve a task using AI with a high degree of precision.

The representation of the synapses’ functions offers a further advantage: making it easier to understand how a KAN works. While simple numerical values such as weights are not very meaningful, this is not the case with functions. For example, you can visually recognize how the output depends on the input by looking at the corresponding graphs of the functions.

The new networks also have one significant disadvantage, however: KANs have to incorporate entire functions instead of just numerical values during the learning process. So the training phase becomes much more complex and takes more time.

In their work, Liu and his colleagues compared the KANs they developed with conventional neural networks, known as multilayer perceptrons (MLPs). In an initial test, they used various known functions, f(x1, x2, x3,...) = y, with the corresponding data, x1, x2, x3,... and y. The task here was to find out how quickly ordinary MLPs and KANs could deduce the underlying function from the data. As it turned out, the KANs were able to approximate the functions much faster than the MLPs of comparable size.

The experts then tested the KANs on real problems, such as solving partial differential equations, which play an important role in physics. The majority of such equations do not have known exact solutions, and computers are needed to obtain results. Liu and his colleagues discovered that the KANs also delivered more accurate results for solving these equations than the MLPs.

In addition, the researchers applied the new networks to current scientific problems, including those in the mathematical field of knot theory. One of the main questions in the field has to do with how to find out whether different two-dimensional representations of knots actually correspond to the same knot. In 2021 Geordie Williamson the University of Sydney in Australia and his colleagues used neural networks to tackle this question and revealed previously unsuspected connections. As Liu’s team has now shown, KANs can produce exactly the same result but with less effort. While Williamson’s team had to train a neural network with around 300,000 parameters, the KANs used in Liu and his colleagues’ study achieved better results with just 200 parameters.

Liu and his colleagues are optimistic that they will be able to apply their new methods to a wide range of problems, from mathematics and physics to improving LLMs. And the AI community is also enthusiastic on social media: “A new era of ML [machine learning] has started!” wrote one poster on X (formerly Twitter). “The Kolmogorov-Arnold Networks (KAN) looks more and more like it's going to change EVERYTHING,” noted software developer Rohan Paul on the same platform.

Whether the hype surrounding KANs is really justified, however, will only become clear in practice. "KANs should also be assessed in the areas in which MLPs work well," Kersting says. “Without such a comparison, it is unclear whether KANs are a promising new alternative.” At the same time, however, the computer scientist emphasizes the value of the new work. “Bringing the theorem back to the attention of the community is something I think is very good. The applications are exciting, even if they are not exactly the main focus of the deep-learning community.”

The biggest limitation of the new method so far is the slow training: for the same number of parameters, a KAN takes around 10 times as long as a conventional MLP. This becomes a particular problem if you want to use the approach for LLMs, which already require very long training times because of their sheer size. KANs’ learning speed could be improved, however, according to Liu: “The physicist in my body would suppress my coder personality so I didn’t try (know) optimizing efficiency,” he wrote in a post on X. Thanks to the enormous amount of attention the approach is currently receiving, this weakness may soon be addressed.

This article originally appeared in Spektrum der Wissenschaft and was reproduced with permission.