_{1}

^{*}

Google’s AlphaGo represents the impressive performance of deep learning and the backbone of deep learning is the workhorse of highly versatile neural networks. Each network is made up of layers of interconnected neurons and the nonlinear activation function inside each neuron is one of the key factors that account for the unprecedented achievement of deep learning. Learning how to create quantum neural networks has been a long time pursuit since 1990’s from many researchers, unfortunately without much success. The main challenge is to know how to design a nonlinear activation function inside the quantum neuron, because the laws in quantum mechanics require the operations on quantum neurons be unitary and linear. A recent discovery uses a special quantum circuit technique called repeat-until-success to make a nonlinear activation function inside a quantum neuron, which is the hard part of creating this neuron. However, the activation function used in that work is based on the periodic tangent function. Because of this periodicity, the input to this function has to be restricted to the range of [0, π/2), which is a serious constraint for its applications in real world problems. The function’s periodicity also makes its neurons not suited for being trained with gradient descent as its derivatives oscillate. The purpose of our study is to propose a new nonlinear activation function that is not periodic so it can take any real numbers and its neurons can be trained with efficient gradient descent. Our quantum neuron offers the full benefit as a quantum entity to support superposition, entanglement, interference, while also enjoys the full benefit as a classical entity to take any real numbers as its input and can be trained with gradient descent. The performance of the quantum neurons with our new activation function is analyzed on IBM’s 5Q quantum computer and IBM’s quantum simulator.

History was made in 2016 when Google’s AlphaGo computer program defeated a Go world champion. One of the key components of this program is the deep neural network and the amazing performance of AlphaGo provides motivation and excitement for extensive research in the area of deep learning. One technique accountable for the success of deep learning is the use of many layers of neural networks in which the output of one layer can be the input of the other and each layer is made up of a number of interconnected neurons. More importantly, there is a nonlinear activation functions inside each neuron, otherwise all these deep neural networks are essentially a single layer network. This is due to the fact that composition of many linear transformations is again a linear transformation which can serve only as a linear regression technique. It is the nonlinearity of neural networks that gives their ability to approximate any continuous functions to solve complicated tasks like language translations, image classifications, or even playing the Go game. Furthermore, the main source of this nonlinearity comes from the nonlinear activation function inside each neuron of the networks.

Mathematically neural networks can be viewed as weighted directed graphs in which neurons are nodes and connections among the neurons are directed edges with weights. The weights represent the strength of the interconnection between neurons. Neural networks learn by adjusting its weights and bias iteratively during a training session to produce the desired output. So each neuron makes decisions by summing the weighted evidence (input). The rules to change these weights during training are called the learning algorithm. Neural networks have found a wide array of applications in supervised learning, unsupervised learning, and reinforcement learning.

Quantum computers are rapidly developing and along with it, the availability of public accessible quantum computers is more of a reality today. This tend makes the study of quantum machine learning algorithms on a real quantum computer possible. With the unique quantum mechanical features such as superposition, entanglement, interference, quantum computing offers a new paradigm for computing. Research has shown that artificial intelligence and in particular machine learning can benefit from quantum computing. It is reasonable to hope that the next historical breakthrough in artificial intelligence like AlphaGo may well be realized on a quantum computer.

We have finished two reports recently using IBM’s 5Q quantum computer [1 , 2]. One is analyzing a distance-based quantum classifier where we show the prediction probability distributions of this classifier on several well-designed datasets to reveal the inner working of this classifier and extend the original binary classifier quantum circuit to a multi-class classifier. The second work is to compare quantum hardware performance on the decision making of an AI agent between an ion trap quantum system and a superconducting quantum system. Our investigation suggests that the superconducting quantum system tends to be more accurate and underestimate the values which are used by the learning agent while a previous research shows their system tends to overestimate [

Since the early days of quantum computing, how to use it in the areas of machine learning to gain the quantum speed up has been a long time endeavor [4 - 7]. Neural networks can perform versatile learning tasks like clustering, classification, regression, pattern recognition, and more. As such the classical neural networks are among the top targets for researchers to find their quantum counterpart. Numerous efforts have been made and claimed but unfortunately without much success [

Using a new technique called repeat-until-success, the work in [

One work after [

Neural networks are typically organized in layers, each of which is made of neurons. One kind of commonly used neurons have binary states which are proposed by McCulloch-Pitts [

There are several inputs for one neuron with one weight for each input, the weight of that specific connection. When the neuron receives inputs, it sums all the inputs multiplied by its corresponding connection weight plus a bias. The purpose to have a bias is to make sure that even when all the inputs are 0 there can be an output from the neuron. For mathematical convenience, we usually treat the bias as a normal weight that corresponding to an input of constant 1. After computing the weighted sum of its inputs, the neuron passes it through its activation function, which normalizes the result to get the desired output, depending on the purpose of the learning task such as classification or regression. So the key feature of neurons is that they can learn. The behavior of a neural network depends on both the weights and the activation function. Some simple examples of activation functions are step function that returns 1 if the input is positive or 0 otherwise and sigmoid function which is a smooth version of the step function.

The output of one layer becomes the input for the next layer. The first layer (input layer) receives its inputs and its output serves as an input for the next layer. This relay of information is repeated until reaching the final layer (output layer). The networks with this kind of layout are called feedforward neural networks. Other layouts of neural networks are also possible. A neural network can learn from data and store its learned knowledge inside the weights for the connections among the neurons.

Biologically inspired, classical artificial neurons is a mathematical function serving as a model of biological neurons. The hard part of creating a quantum neuron is the design of a nonlinear activation function as the laws of the quantum mechanics require the operations on quantum states be linear.

To implement a quantum algorithm on a quantum computer needs to translate the high level description of the algorithm into a low level physical quantum circuit representation. This task is usually accomplished by two steps: the first is to select a universal gate set, and the second is a decomposition algorithm that can create a quantum circuit with a sequence of the gates from this set. The Solovay-Kitaev theorem [12 , 13] is the first result that guarantees a single qubit unitary operation can be efficiently approximated by a sequence of gates from a universal gate set. Since then, many advances have been made but the circuits designed so far are all deterministic. In [

Different versions of neurons have been proposed, but all of them fall short as true quantum neurons [

The idea to create a nonlinear activation function inside a quantum neuron [

R y ( a π 2 + π 2 ) | 0 〉 = cos ( a π 4 + π 4 ) | 0 〉 + sin ( a π 4 + π 4 ) | 1 〉 , where a ∈ [ − 1 , 1 ] and R y ( t ) = exp ( − i t Y 2 ) is a

quantum operator defined by the Pauli Y operator. When a = − 1 or = 1, this qubit can be either | 0 〉 or | 1 〉 . In this case, it works like a classical neuron, but when a ∈ ( − 1 , 1 ) , this qubit is in the superposition of | 0 〉 and | 1 〉 and no classical neurons can do that. The novelty of the work in [

to move any point θ ∈ [ − 1 , 1 ] closer to one of the ends of the interval [ 0 , π 2 ] . So when θ > π 4 this can

move the output of the circuit closer to | 1 〉 and otherwise move it closer to | 0 〉 . It takes k extra ancilla qubits to carry out the k repetitions.

The activation function used in [

their neuron. Our approach to resolve this difficulty is to use this function, f ( x ) = arcsin ( sigmoid ( x ) ) ,

based on sigmoid function that is not periodic (

Note that sigmoid function f ( x ) = 1 1 + e − x is a widely used activation function in classical neural

networks. With its range to be between 0 and 1, it is a good choice if the network needs to produce a probability at the end. It has a very nice mathematics property, f ′ ( x ) = f ( x ) ( 1 − f ( x ) ) , so its derivative is easy to compute and the function itself can be considered as a smoothed out version of a step function. But the maximum value of its derivative is 0.25, during the backpropagation training process the errors are being squeezed by a quarter at each layer, to say the least. Near the two ends of the sigmoid function, its values tend to be flat, implying the gradient is almost zero. As a result, it gives rise to a problem of vanishing gradients. Therefore it may not be a good choice for deep neural networks.

In order to show the ability of the quantum neurons, Nelder-Mead algorithm is employed to train them to solve the XOR problem in [

We use the neuron training circuit in

Another common activation function is Rectified Linear Units (ReLU) f ( x ) = max ( 0 , x ) . This is a simple function, nonetheless is nonlinear. It has become popular in recent years because it is efficient to calculate and can speed up the training process for large networks, compared with the more complicated functions like sigmoid function. Its derivative takes a value of 1 so it rectifies the gradient vanishing problem introduced by sigmoid function. The gradient of the ReLU function does not vanish as x increases, a sharp contrast with sigmoid function. However, its derivative is zero when x is negative, which can result

Dataset 1 | Dataset 2 | ||
---|---|---|---|

Data point | Class label | Data point | Class label |

x 0 = ( 0 , 0 ) | 0 | x 00 = ( 0 , 0 ) | 0 |

x 1 = ( 1 , 1 ) | 1 | x 01 = ( 0 , 1 ) | 1 |

x 10 = ( 1 , 0 ) | 1 | ||

x 11 = ( 1 , 1 ) | 1 |

in “dead” neurons and they can never be activated during the whole training period. Nonetheless, it is used in most convolutional neural networks or deep learning.

The work in [

R U S ( θ − π 4 ) , 1 from R y ( 2 f ( θ ) ) , and 1 from R y ( 2 R e L U ( θ ) ) . These three measurements are taken in sequence to generate the ReLU curves in

To summarize our findings, our nonlinear activation function offers the following advantages that the original one in [

As demonstrated by Google’s AlphaGo, deep learning has gained its reputation from its unprecedented success in so many areas including computer vision, speech recognition, natural language processing, and many more. The backbone of this great achievement is the workhorse of neuron networks, which is a computational model inspired from the biological neural networks. Mathematically neural networks are nonlinear statistical techniques to estimate unknown functions based on the training of the input data. The most important element in a neural network is neuron and key feature of neuron is that it can learn.

As quantum computing gets more into machine learning, how to create quantum neural networks has been an attractive topic [

One recent breakthrough [

Inspired by the work in [9 , 10], we think the activation function in [

After finishing our study reported here, we find one recent paper [

We thank the IBM Quantum Experience for the use of their quantum computers and the IBM researchers, Dr. Andrew W. Cross, Dr. Douglas T. McClure, and Dr. Ali Javadi, who help me learn how to use their quantum computers.