When we talk about machine learning, we often talk about neural networks. But as with many new analytical tools used in the industry today, we often talk about and use neural networks without understanding what they do. Today we decided to pull back the curtain a bit, and give you a basic introduction about what neural networks are, and what they do, and how they accomplish it.
Neural networks belong to a mathematical branch known as machine learning. Machine learning, in its basic definition, is a way of representing empirical data in an efficient form, so that it can either be reproduced, generalized, recognized or some pattern in it can be discovered. The goal of machine learning is, then, to summarize data in the form of (a set of) equations.
A very simple example of machine learning would be measuring a series of points that lie on a straight line. The straight line is a model that represents the data we have measured. We use a simple equation to represent the straight line: "y", the vertical axis, is equal to "m", the slope of that straight line, multiplied by "x", the horizontal axis, plus a constant "b" that is the intercept of the line with the vertical axis. So y = mx + b is a summary of the data and it includes two parameters: m and b. As the measurements for both x and y are imprecise, the model parameters m and b also have uncertainties. This allows the model to cope with the imprecise nature of real life data sets.
Finding the values of m and b from a particular set of empirical data is, as we said, a simple example of machine learning. Thousands of data points have been reduced to two parameters of an equation.
The method of how to compute the parameters of a model from data is called an algorithm. Beyond the algorithm needed to compute the parameters of a model from data, machine learning often needs a second algorithm to update the parameters whenever additional data becomes available. This is not always possible, depending on the model, and represents a serious advantage as initial learning is usually time consuming.
The way a machine approaches learning is not the same as a human, but considering an analogy can be quite useful in the case of neural networks.
The brain is essentially a network of neurons that are connected by synapses. Each neuron and each synapse are individually quite simple objects but in a network they are able to carry out some astonishingly complex actions. Consider putting a name to the face of a person, for example, and connecting that to memories with that person. All this happens unconsciously, within fractions of a second. That's a neural network at work.
As a person is not born knowing all the people they will meet in their life, the brain's neural network gets trained as it experiences things, learns them, and can then draw on this knowledge quickly.
That is similar to how an artificial neural network learns. You start with a prototypical neural network and you start feeding it experiences. The more experiences you show the neural network the more it becomes capable of correctly representing those experiences and recalling them in the future.
The learning of a straight line we discussed is a simple example of machine learning. Neural networks, on the other hand, are capable of representing extremely complex data sets.
A neural network consists of nodes that are connected to each other (not unlike the neurons, connected by synapses). The input data enters the network on special nodes and then begins to travel through the network. Every time a piece of data travels over a connection or encounters a node, it is modified according to mathematical rules. The modified data then exits the network on other special nodes. Each connection and each node carry parameters that specify exactly what is done there. In this way, the network represents the relationship between the output and the input data and thus represents a mathematical function just like the straight line. So a neural network is nothing more than an equation with parameters to be determined from the data that this network is supposed to represent.
All the fuss about neural networks is based on the fact that there exists a mathematical proof which we won't go into this article, that shows that if the network has enough nodes in it, then it is capable of accurately and precisely representing any data set, as long as the data set is internally consistent. This holds true even if the relationships in that data set are highly non-linear or time-dependent.
What does internally consistent mean? It means that the source of that data should always obey the same laws. In the case of industrial production, the data is produced by the laws of nature. As they do not change, the data set is internally consistent. The driving forces underlying the stock market, for example, are not constant over time and that is why neural networks cannot represent these data sets well.
In summary, a neural network is a complex mathematical formula that is capable of representing accurately any set of internally consistent data. The general approach to doing this is to take a formula with parameters and to determine the values of the parameters using a computational method. That is what we call machine learning.