Neural networks are remarkably efficient tools to solve a number of complicated problems. The first applications of neural networks usually revolved around classification problems. Classification means that we have an image as an input, and the output is, let’s say a simple decision whether it depicts a cat or a dog. The input will have as many nodes as there are pixels in the input image, and the output will have 2 units, and we look at the one of these two that fires the most to decide whether it thinks it is a dog or a cat. Between these two, there are hidden layers where the neural network is asked to build an inner representation of the problem that is efficient at recognizing these animals.
So what is an autoencoder? An autoencoder is an interesting variant with two important changes: first, the number of neurons is the same in the input and the output, therefore we can expect that the output is an image that is not only the same size as the input, but actually is the same image. Now, this normally wouldn’t make any sense, why would we want to invent a neural network to do the job of the copying machine? So here goes the second part: we have a bottleneck in one of these layers. This means that the number of neurons in that layer is much less than we would normally see, therefore it has to find a way to represent this kind of data the best it can with a much smaller number of neurons. If you have a smaller budget, you have to let go of all the fluff and concentrate on the bare essentials, therefore we can’t expect the image to be the same, but they are hopefully quite close. These autoencoders are capable of creating sparse representations of the input data and can therefore be used for image compression. I consciously avoid saying “they are useful for image compression”.
Autoencoders, offer no tangible advantage over classical image compression algorithms like JPEG. However, as a crumb of comfort, many different variants exist that are useful for different tasks other than compression.
There are denoising autoencoders that after learning these sparse representations, can be presented with noisy images. As they more or less know how this kind of data should look like, they can help in denoising these images. That’s pretty cool for starters! What is even better is a variant that is called the variational autoencoder that not only learns these sparse representations but can also draw new images as well. We can, for instance, ask it to create new Monalisa image and we can actually make her smile 🙂