Getting to AI – Generative Models

Getting to AI – Generative Models

Learn about generative models in this article by Patrick D. Smith, the Data Science Lead at Excella in Arlington, Virginia.

Generative models are a class of neural networks that try to predict features, given a certain label. They do this by having a parameter set that is much smaller than the amount of data they are learning, which forces them to comprehend the general essence of the data in an efficient manner.

There are two main types of generative models, VAE and GAN. First, you’ll start with the motivations for generative models, then, the architecture and inner workings of each and work through a practical example for each model.


Autoencoders and their encoder/decoder frameworks are the inspiration behind generative models. They are a self-supervised technique for representation learning, where your network learns about its input so that it may generate new data just as input. In this article, you’ll learn about their architecture and uses as an introduction to the generative networks that they inspire.

Network Architecture

Autoencoders work by taking input and generating a smaller vector representation for later by reconstructing its own input. They do this using an encoder to impose an information bottleneck on incoming data and then utilizing a decoder to recreate the input data based on that representation. This is based on the idea that there are structures within data (that is, correlations, and so on) that exist but are not readily apparent. Autoencoders are a means of automatically learning these relationships without explicitly doing so.

Structurally, autoencoders consist of an input layer, a hidden layer, and an output layer, as demonstrated in the following diagram:

The encoder learns to preserve as much of the relevant information as possible in the limited encoding, and intelligently discards irrelevant parts. This forces the network to maintain only the data required to recreate the input; you do this using a reconstruction loss with a regularization term to prevent overfitting. As the task of the autoencoders is to recreate their output, they utilize a type of loss function known as reconstruction loss. These loss functions are usually mean squared error or cross entropy loss functions that penalize the network for creating an output that is markedly different from the input.

The information bottleneck is the key to helping you minimize this reconstruction loss; if there was no bottleneck, information could flow too easily from the input to the output, and the network would likely overfit from learning generic representations. The ideal autoencoder is both of the following:

* Sensitive enough to its input data that it can accurately reconstruct it

* Insensitive enough to its input data that the model doesn’t suffer from overfitting that data

The process of going from a high input dimension to a low input dimension in the encoder process is a dimensionality reduction method that is almost identical to principal component analysis (PCA). The difference lies in the fact that PCA is restricted to linear manifolds, while autoencoders can handle non-nonlinear manifolds (a manifold is a continuous, non-intersecting surface).

A sphere is a manifold; really, any surface in space that doesn’t intersect with itself could be a manifold. For the sake of neural networks, learning, and loss functions, be sure to always think of manifolds as a topological map.

Building an Autoencoder

If you’re thinking that the task of reconstructing an output doesn’t appear that useful, you’re not alone. Autoencoders help extract features when there are no known labeled features at hand. To illustrate how this works, here’s a walkthrough of an example using TensorFlow. You’ll reconstruct the MNIST dataset here and compare the performance of the standard autoencoder against the variational autoencoder in relation to the same task.

Now, get started with your imports and data. MNIST is contained natively within TensorFlow, so you can easily import it:

For ease, you can build the auto-encoder with the tf.layers library. You’ll want your Autoencoder architecture to follow the convolutional/de-convolutional pattern, where the input layer of the decoder matches the size of the input and the subsequent layer squash the data into a smaller and smaller representation. The decoder will be the same architecture reversed, starting with the small representation and working larger.

It should look something like the following:

Start with the encoder; define an initializer for the weight and bias factors first, and then define the encoder as a function that takes an input, x. Then use the tf.layers.dense function to create standard, fully connected neural network layers. The encoder will have three layers, with the first layer size matching the input dimensions of the input data (784), with the subsequent layers getting continually smaller:

Next, build your decoder; it will use the same layer type and initializer as the encoder, only now invert the layers so that the first layer of the decoder is the smallest and the last is the largest.

Before you get to training, define some hyper-parameters that will be needed during the training cycle. Define the size of your input, the learning rate, number of training steps, the batch size for the training cycle, as well as how often you want to display information about your training progress.

Now define the placeholder for your input data so that you can compile the model:

And subsequently, compile the model and the optimizer:

Lastly, code up the training cycle. Start a TensorFlow session and iterate over the epochs/batches, computing the loss and accuracy at each point:

For this particular example, add in a little something more to this process; a way to plot the reconstructed images alongside their original versions. Keep in mind that this code is still contained within the training session, just outside the training loop:

After training, you should end up with a result along the lines of the following, with the actual digits on the left and the reconstructed digits on the right:

By training the autoencoder on unlabeled digits, you’ve done the following:

* Learned the latent features of the dataset without having explicit labels

* Successfully learned the distribution of the data and reconstructed the image from scratch, from that distribution

Now, say that you wanted to take this further and generate or classify new digits that you haven’t seen yet. To do this, remove the decoder and attach a classifier or generator network:

The encoder, therefore, becomes a means of initializing a supervised training model.

If you found this article interesting, you can explore Hands-On Artificial Intelligence for Beginners to grasp the fundamentals of Artificial Intelligence and build your own intelligent systems with ease. Hands-On Artificial Intelligence for Beginners will teach you what artificial intelligence is and how to design and build intelligent applications.

Leave a Reply

Your email address will not be published. Required fields are marked *