Neural networks have seen a meteoric rise in popularity recently. Unfortunately, there is no one model that works best for every problem. This is known as the “No Free Lunch” theorem positing that "if algorithm A outperforms algorithm B on some cost functions, then loosely speaking there must exist exactly as many other functions where B outperforms A". Thus, it is common in machine learning to try multiple models and find one that works best for a particular problem.
Therefore, whilst the accomplishments of neural nets can't be underplayed, it is important to consider other algorithms that receive less attention in the mainstream media. That said, here we will take a look at these popular algorithms and how they can help us build smarter models in the future.
Artificial neural networks (ANNs), are modeled on the architecture of the human brain. The brain consists of interconnected neurons, each of which can be triggered by other neurons, and in turn, can then itself trigger other neurons. Although the concept of an ANN has been around for a long time (first proposed in 1944 by Warren McCullough and Walter Pitts), it hasn't been until recently that they have become particularly well utilized.
After their conception, ANNs were a major area of research until around 1969. There was then a partial resurgence in the 1980s and by the early 1990s, ANNs were starting to achieve some useful results in e.g. recognizing handwritten numbers. However, even with a number of impressive successes, attempts to get ANNs to do more complex tasks ran into trouble. In particular, the standard training technique didn’t work with larger networks comprised of more layers.
Artificial Neural Networks
An ANN consists of three interconnected layers comprised of artificial neurons or processing elements. The structure of the ANN starts with an input layer, there can then be one or more hidden layers and finally an output layer. The number of neurons in each layer, as well as the number of hidden layers, will depend on the application and data being used.
In the popular feedforward network, the input layer will send information to the hidden layer and the hidden layer sends data to the output layer. In each layer, neurons have weighted inputs (synapses), an activation function and an output. Synapses are the adjustable parameters that must be optimized when training the model.
There are a number of popular structures that are commonly used in different applications. These include, but are not limited to:
- Multilayer Perceptron
- Convolutional Neural Network
- Recursive Neural Network
- Long Short-Term Memory
There are many great posts on the differences between these architectures that we won't cover here.
Recently there have been a number of important advances that have allowed these methods to become much more widely adopted, moving away from shallow to deep architectures. The rise of the internet means that we now live with an abundance of data. ANNs learn by example and so, with the widespread availability of billions of text documents, images, videos and audio recordings it is possible to train these algorithms reducing uncertainty on unknowns.
The ability to ingest this increased cache of data is largely due to advancement in hardware. Recently, a research group at Stanford University led by Andrew Ng found it possible to increase the training speed of a neural net by up to 100-fold when employing graphics processing units (GPUs) compared to traditional CPUs. The employ of GPUs allows for exploitation of the parallelism in the GPU architecture to be leveraged when performing mathematically intensive operations that a CPU is not designed to handle as optimally.
Finally, there have been important advancements in the techniques employed to train the ANN. Traditional gradient-based optimization tends to get stuck in poor solutions when trying to optimize deep architectures when randomly initialized. Strategies like greedy layer-wise unsupervised training helped progression from more traditional shallow architectures to the deep-learning frameworks that tend to work well for these big data applications.
In 2012, the annual image-recognition contest was won by a deep-learning model. This was able to vastly outperform other methods being employed and pushed ANN into notoriety both within academic and industry communities.
It is hard not to mention the work Google is doing in the field when talking about future directions. They have open-sourced the TensorFlow codebase used in a lot of their machine learning work. A couple of days ago, they also opened up their training materials for non-ML engineers within the company. They have generally displayed a desire to lead a lot of progress in the field.
Perhaps more specifically, Google has also been building its own custom chip architecture for a while now. These Tensor Processing Units (TPUs) have gone through several iterations and have been made more generally available recently. Whilst a number of people around the valley have used these and suggest very promising results, it will be interesting to see what happens when it is financially feasible for a larger number of users to employ them in widespread application.
RecruitSumo Inc, sharing our passion for machine learning and artificial intelligence. We specialize in predictive analytics for human capital adding value by helping build the right organization, culture, team, and talent to succeed.