Creating a clothing-identification algorithm with Deep Learning (Tensorflow)

Hammaad Memon
7 min readMar 14, 2021

Deep Learning is at the cutting edge of today’s technology. While neural nets, optimizers, and loss functions may rely on complex math and technology, we don’t need to be mathematicians in order to make use of them.

Today we’re going to create an image identifier using python along with the library Keras. Keras is a wrapper library which is used to build and train models, using Tensorflow under the hood to run matrix calculations. We’ll get back to these in a second.

Prerequisites

Before you can get started, you’ll need to install python version 3.8 (64-bit). Tensorflow is only compatible up to python 3.8 so unfortunately we can’t use the latest version of python. Tensorflow also requires the 64-bit version of python — so make sure you aren’t using the 32-bit version. I also recommend using a virtual environment for this project as it keeps the older version of python and the Tensorflow library separate from your other projects. If you are new to creating virtual environments, you can read a quick tutorial here. Navigate to the root directory of your project or virtual environment and run the following command in the terminal or command prompt to install tensorflow.

pip install tensorflow

If you run into an error and are unable to install tensorflow, make sure you have the right python version. You can run the following script to examine version details:

import sysprint(sys.version)

Make sure you are using any version of python between 3.5–3.8 and double check with the prerequisites mentioned above. If you see a later version of python being used despite you having downloaded the older version, read through the article here to learn how to setup a virtual environment with a specific version.

Once tensorflow is installed, go ahead and run your python file with the following statements:

import tensorflow as tffrom tensorflow import kerasimport numpy as npprint("success")

If your file runs and “success” is printed, you’re all set. If you run into a module not found error, go ahead and separately install any packages missing.

Okay, we are now all setup. Before we get coding though, we must design our neural network.

The Neural Network

The heart of every deep learning algorithm: The Neural Network, consists of multiple layers. Our neural net will consist of 3 layers: the input layer, the hidden layer, and the output layer. You can think of the data passing through as beads of varying sizes. The paths connecting the hidden layer to the output layer serve as filters, routing only like-shaped beads to specific output nodes. The node with the highest distribution of beads (pixel data in our case) is the clothing matching the image. For this to work however, our model needs to know what size beads should go to which output node. We’ll come back to this when we train our model. For now, lets take a closer look at the layers themselves.

source: https://databricks.com/glossary/neural-network

My explanation of a neural network is a bit oversimplified, here’s a more detailed look at neural networks if you’re interested.

The Input Layer

We will be using 28x28 grayscale images from a test dataset by Keras. Every image in the dataset is labeled with one out of ten possible clothing objects. Here are some examples of the images we will be using:

source: https://storage.googleapis.com/tfds-data/visualization/fig/fashion_mnist-3.0.1.png

From the images we will store the intensity of each pixel (0–255) into an array. We will then pass that data through our neural net. Thus our input layer will be a 784 character long array of pixel intensity data for each image.

The Hidden Layer

This is where the magic happens. Our neural net will evolve this layer as we train it to find patterns across images — essentially numbers coming through the input layer. Data coming though this layer will activate certain nodes and be routed to the respective output node. Neural nets can have multiple hidden layers but we are using one as it is enough for the scale of our input. It is generally a good idea to minimize the amount of hidden layers as each one exponentially increases the time required to train the model. The second choice we have to make is the number of nodes in this layer. This is a matter of trial and error but for now we will stick with 128 nodes.

The Output Layer

We are aiming to be able to identify each image from 10 possible categories. Thus, our output layer will consist of 10 nodes. The node ‘activated’ the most times will determine what clothing the algorithm is looking at. Here is a table representing the possible labels for each image.

Building the neural network

Enough talk. Lets write some code . The first thing we need to do is load the data from the dataset.

The load_data() function returns the pixel data and labels for 70k images. Of those, we are seperating 60k into training_images and training_labels. We are storing the remaining 10k in test_images and test_labels for testing our model once we are done training.

Here we use the Keras’s Sequential method to create our neural network architecture. Sequential simply means that data must pass through every layer of the neural network in order.

Step 1: The Input layer. We create the input layer using Flatten which converts our 28x28 array into a 784x1 array.

Step 2: The Output layer. We create our output layer using the Dense method which specifies that every node should be connected to every other node in adjacent layers. We also specify 10 units (or nodes) and our activation function as softmax. Softmax simply routes incoming data to the closest value from the output nodes and 10 output nodes means 10 possible outcomes.

Step 3: The Hidden layer. Like the output layer, we use the dense function but this time we specify 128 units and use Relu as our activation function. Relu removes any negative numbers traveling through the net and replaces them with 0.

And with that, we have created our very own neural network!

Compiling our Model

Before we can shoot some numbers through our model, we need to do one last thing: compile it. In order to compile our model, we need to specify two key functions for training.

Loss functions

When training our model, it’ll guess the label for each image, and the loss function will identify what the model incorrectly predicted. The specific loss and optimizer functions to use differs across projects with different needs — choosing the right one is simply a matter of research.

Optimizer functions

After the loss function finds an error in the model’s guess, the ‘optimizer’ is run to reconcile the differences between the actual label and the model’s guess. It adjusts the weights of the model to change the prediction to the desired output.

Training

We are now ready to put our neural net to work. Lets give it a few hundred thousand images worth of data to chew through!

Yes, a few hundred thousand. Unfortunately machines can’t tell an ankle-boot from a shirt after looking at 60k data samples. Instead, they need to be optimized multiple times for each image to get it right. In our case, 5 rounds (or epochs) of optimizations should be enough.

After running your code, you should see a similar output as below. There is a lot of information here, but what we are looking for is the loss output between each epoch. The loss decreases dramatically as the model is optimized over multiple epochs — using those optimizer and loss function discussed earlier.

A key aspect to note here is the negligible decrease in loss between epochs 4 & 5. It really isn’t worth running this model further than 5 epochs as the decrease in loss isn’t worth the time or computation. It also becomes hazardous to train a model on the same dataset too many times as the algorithm becomes too specialized on the sample data to analyze other samples. However, there are of course cases where this would be useful.

Testing

Our model is complete — now its time to see how it does with the test dataset we saved earlier. Lets evaluate our model with the test images and their respective labels.

Predictions stores the model’s guesses for each image — which we then compare with the respective label to determine the model’s results.

Results

After running our code, we find ourselves with a stunning ~80 percent accuracy!

Conclusion

There’s still much more to do with Keras and Tensorflow — even with this particular project. You can try experimenting with different amounts of hidden layers, nodes, and other hyperparameters such as learning rates…etc

Keep an eye out for more projects in the future! Meanwhile, here’s the complete code if you missed something:

--

--