- Jun 29, 2023
- 4 min read

Quantum-Inspired MNIST

Posted on December 17, 2021, By Brian N. Siegelwax

It doesn’t get easier than this….

What if you could do Machine Learning without building and training a model? What if you could forget about weights and activation functions and optimizers? What if you could exchange normalization and derivatives for addition and subtraction?

You can.

I’ll use classification as an example, using the MNIST dataset. This should work for clustering, as well, since the quantum algorithm that inspired this experiment works for both.

MNIST, for those who don’t know, is a popular dataset of handwritten digits, the numbers 0 through 9. The goal of the MNIST exercise is to take a handwritten single-digit number and accurately identify it as the number it’s intended to be. And, it’s probably impossible to get into Machine Learning without encountering it, because it’s a clean dataset. You can focus on building and training your model, not on cleaning your data.

quantum classification

How does classical classification work?

At its most fundamental level, classification starts with inputs and outputs and we determine “weights" that, when multiplied by the inputs, produce the known outputs. We then multiply those weights by test inputs and obtain outputs that suggest the most likely classification for each input.

inputs * weights = outputs

Well, that doesn’t seem so hard, does it? What’s so challenging about that?

The challenge is efficiently finding the weights. We start off multiplying the inputs by random weights, determine how far away we are from the known outputs, and adjust the weights in small increments. We do this many, many times, until the small weight adjustments reduce the distances between the weighted inputs and the outputs to their lowest possible values.

But, how big should those adjustments be? What if we’re too liberal and we overshoot our target? Or, what if we’re too conservative and training takes forever? Then, we’ll need another model to tune our hyperparameters?

So, basic classification is conceptually simple, but it can get complicated. We’re on a quest to optimize both runtime and accuracy.

Quantum MNIST

How does quantum classification work?

With classical classification, we use weighting to reduce the “distances” between the inputs and the outputs. The word “distances” is important, because there is a quantum algorithm called the SWAP Test which also goes by the name inner products, kernel methods, and, most importantly for this discussion, distance measures. If we map our classical data to quantum states, the SWAP Test determines how close or how distant two quantum states are.

The difference between classical classification and quantum classification is that we don’t need weighting, and we don’t need to do any adjusting. We just measure. If our test data measures closest to classification A, then that’s the likely classification for our test data. If, on the other hand, our data point measures closest to classification B, then that’s the likely classification instead. It’s that simple.

canonical SWAP Test

What is quantum-inspired classification?

What if we do the same thing classically that we do quantumly, and just determine the distance between values?

distance += absolute_value(MNIST_value - test_data_value)

The MNIST dataset contains images that are 28x28 pixels, or 784 total pixels. What if we just loop through, pixel by pixel, and determine the total distance between a test digit and zero, the same test digit and one, the same test digit and two, and so forth?

.Target: 9

Actual: 9Correct: 72%

The train dataset contains roughly 6,000 records for each digit, so I like to use mean digits. What are the average intensity values per pixel for all the zeroes, all the ones, and so forth? Then I took the first 100 test digits and did exactly what I just described. I went pixel by pixel and summed the distances between a test digit and the mean zero, then the same digit and the mean one, and so on. The lowest sum represented the shortest distance between the test data and one particular classification, and that’s the likely classification.

comparing multi-dimensional data

How do the results compare?

With a small sample of 100 test digits, the accuracy was 72%. Random guessing would be only 10%, so 72% isn’t terrible. And, considering we’re only doing addition and subtraction, it compares to executing 230,000 operations traditionally. On the low end, I found someone struggling with MNIST accuracy as low as 11%, and I found Kaggle competitions with accuracy above 99%. The reason for both extremes, actually, is the level of complexity we can add. This method beats students who are struggling to learn image classification, on the one hand, while simultaneously retaining quite a bit of room for improvement, on the other.

I must also say that quantum-inspired classification is fast. I’ve trained very simple classification models, and the simplest ones still generally take at least a minute. In contrast, 7,840 rounds of addition and subtraction is virtually instantaneous. It actually takes about a second, but that’s probably due to running this in a Jupiter notebook. That’s not optimal.

What’s next?

My starting point is 72% accuracy. That’s with nothing but addition and subtraction. It couldn’t be simpler. However, I have ideas that would keep this method conceptually simple, while hopefully improving accuracy.

First, I have historically been concerned with straying from mean digits, because I’ve been concerned about the train digits overlapping each other. I’m referring to quantum classification here, of course. And, I have had additional concerns regarding dimensionality reduction and limited qubit availability. However, these really aren’t concerns when done classically. Therefore, I do have another “encoding” to try, and we’ll see how that affects accuracy.

Second, classical MNIST uses Convolutional Neural Networks (CNN), not the simple approach I described here. We don’t compare single pixel to single pixel, and we don’t do only one comparison. Therefore, I could expand this to compare small groups of pixels to small groups of pixels, like a CNN, and see what happens.

Third, I could analyse Kaggle submissions. I’ve never actually done that because I’ve only worked with MNIST quantumly until now. But, maybe there are ideas to be found that can be implemented without adding too much complexity. The point of quantum-inspired MNIST is simplicity, because of quantum MNIST’s simplicity.

Brian N. Siegelwax