Continuous optimization, stemming from the mathematical modeling of realworld systems, is ubiquitous in applied mathematics, operations research, and computer science. These problems often come with high dimensionality and nonconvexity, posing great challenges for the design and implementation of optimization algorithms. With the advance in the fabrication of quantum computers in the past decades, extensive progress has prevailed in the pursuit of quantum speedups for continuous optimization. A conventional approach toward this end is to quantize existing classical algorithms by replacing their components with quantum subroutines, while carefully balancing the potential speedup and possible overheads. However, proposals following this approach usually achieve at most polynomial quantum speedups, and more importantly, they do not improve the quality of solutions because they essentially follow the same solution trajectories in the original classical algorithms.
Gradient descent (and its variant) is arguably the most fundamental optimization algorithm in continuous optimization, both in theory and in practice, due to its simplicity and efficiency in converging to critical points. However, many realworld problems have spurious local optima, for which gradient descent is subject to slow convergence since it only leverages firstorder information. On the other hand, quantum algorithms have the potential to escape from local minima and find nearoptimal solutions by leveraging the quantum tunneling effect. Therefore, it is desirable to identify a quantum counterpart of gradient descent that is simple and efficient on quantum computers while leveraging the quantum tunneling effect to escape from spurious local optima. With such features, the quality of the solutions is improved. Prior attempts (e.g., [1]) to quantize gradient descent, which followed the conventional approach, unfortunately fail to achieve the aforementioned goal, which seems to require a completely new approach to quantization.
A genuine quantum gradient descent
Our main observation is a seemingly unrelated connection between gradient descent and dynamical systems satisfying classical physical laws. Precisely, it is known that the continuoustime limit of many gradientbased algorithms can be understood as classical physical dynamical systems, e.g., the BregmanLagrangian framework derived in [2] to model accelerated gradient descent algorithms. Conversely, variants of gradientbased algorithms could be emerged through the time discretization of these continuoustime dynamical systems. This twoway correspondence inspired us a second approach to quantization: instead of quantizing a subroutine in gradient descent, we can quantize the continuoustime limit of gradient descent as a whole, and the resulting quantum dynamical systems lead to quantum algorithms as seen in Figure 1A. Using the path integral formulation of quantum mechanics (Figure 1B), we quantize the BregmanLagrangian framework to a quantummechanical system described by the Schrodinger equation \(i \frac{d}{d t} \Psi(t) = \hat{H}(t) \Psi(t)\), where \(\Psi(t)\) is the quantum wave function, and the quantum Hamiltonian reads:
\[\hat{H}(t) =e^{\varphi_t}\left(\frac{1}{2}\Delta\right) + e^{\chi_t}f(x),~~~(1)\]
where \(e^{\varphi_t}\), \(e^{\chi_t}\) are damping parameters that control the energy flow in the system. We require \(e^{\varphi_t/\chi_t}\to 0\) for large \(t\) so the kinetic energy is gradually drained out from the system, which is crucial for the longterm convergence of the evolution. \(\Delta\) is the Laplacian operator over Euclidean space, and \(f(x)\) is the objective function. The Schrodinger dynamics in Eq. (1) generate a family of quantum gradient descent algorithms that we will refer to as Quantum Hamiltonian Descent, or simply QHD.
As desired, QHD inherits simplicity and efficiency from classical gradient descent. QHD takes in an easily prepared initial wave function \(\Psi(0)\) and evolves the quantum system described by Eq. (1). The solution to the optimization problem is obtained by measuring the position observable \(\hat{x}\) at the end of the algorithm (i.e., at time \(t=T\)). In other words, QHD is no different from a basic Hamiltonian simulation task, which can be done on digital quantum computers using standard techniques with provable efficiency. The simplicity and efficiency of QHD on quantum machines potentially make it as widely applicable as classical gradient descent.
For convex problems, we prove that QHD is guaranteed to find the global solution. In this case, the solution trajectory of QHD is analogous to that of a classical algorithm. Nonconvex problems, known to be NPhard in general, are much harder to solve. Under mild assumptions on a nonconvex \(f\), we show the global convergence of QHD given appropriate damping parameters and sufficiently long evolution time. Figure 1C shows a conceptual picture of QHD’s quantum speedup: intuitively, QHD can be regarded as a path integral of solution trajectories, some of which are prohibited in classical gradient descent. Interference among all solution trajectories gives rise to a unique quantum phenomenon called quantum tunneling, which helps QHD overcome highenergy barriers and locate the global minimum.
Performance of QHD on hard optimization problems
To visualize the difference between QHD and other classical/quantum algorithms, we test four algorithms (QHD, Quantum Adiabatic Algorithm (QAA), Nesterov’s accelerated gradient descent (NAGD), and stochastic gradient descent (SGD)) via classical simulation for 22 optimization instances with diversified landscape features selected from benchmark functions for global optimization problems.^{1} QAA solves an optimization problem by simulating a quantum adiabatic evolution [3], and has been mostly applied to discrete optimization in the literature. To solve continuous optimization with QAA, a common approach is to represent each continuous variable with a finitelength bitstring so the original problem is converted to a combinatorial optimization defined on the hypercube \(\{0,1\}^N\), where \(N\) is the total number of bits. In our experiment, we adopt the radix2 representation and use 7 bits for each continuous variable – this allows QAA to handle the optimization instances as discrete problems over \(\{0,1\}^{14}\).^{2}
In Figure 2A, we plot the landscape of Levy function, and the solutions from the four algorithms are shown for different evolution times \(t\): for QHD and QAA, \(t\) is the evolution time of the quantum dynamics; for the two classical algorithms, the effective evolution time \(t\) is computed by multiplying the product of learning rate and the iteration number so that it is comparable to the one used in QHD and QAA. Compared with QHD, QAA converges at a much slower rate and no apparent convergence is observed within the time window. Although the two classical algorithms seem to converge faster than quantum algorithms, they have lower success probability because many solutions have been trapped in spurious local minima. Our observation made with Levy function is consistent with the results of other functions: as shown in Figure 2B, QHD has a higher success probability in most optimization instances within the same choice of evolution time.
Zooming in the QHD dynamics, we find rich dynamical properties in different stages of evolution. Figure 3B shows the quantum probability densities of QHD at different evolution times: the wave function is highly oscillatory in the beginning (t = 0.1, 0.5); then, it starts moving towards the global minimum (t = 1, 2); finally, it is clustered around the global minimum and converges as like the classical gradient descent (t = 5, 10). This threestage evolution is not only seen in Levy function, but also observed in many other instances. We thus propose to divide QHD’s evolution in solving optimization problems into three consecutive phases called the kinetic phase, the global search phase, and the descent phase according to the above observation.
The threephase picture of QHD could be supported by a few quantitative characterizations of the QHD evolution. One such characterization is the probability spectrum of QHD, which shows the decomposition of the wave function to different energy levels (Figure 3C). QHD begins with a major groundenergy component and a minor lowenergy component.^{3} During the global search phase, the lowenergy component is absorbed into the groundenergy component, indicating that QHD finds the global minimum (Figure 3D). The energy ratio \(E_1/E_0\) is another characterization of the three phases in QHD (Figure 3E), where \(E_0\) (or \(E_1\)) is the ground (or first excited) energy of the QHD Hamiltonian \(\hat{H}(t)\). In the kinetic phase, the kinetic energy \(\frac{1}{2}\Delta\) dominates in the system Hamiltonian so we have \(E_1/E_0\approx 2.5\), which is the same as in a freeparticle system. In the descent phase, the QHD Hamiltonian enters the ‘‘semiclassical regime’’ and the energy ratio can be theoretically computed based on the objective function.^{4}
The threephase picture of QHD sheds light on why QAA has slower convergence. Compared to QHD, QAA has neither kinetic phase nor descent phase. In the kinetic phase, QHD averages the initial wave function over the whole search space to reduce to risk of poor initialization; while QAA remains in the ground state so it never gains as much kinetic energy. In the descent phase, QHD depicts similar convergence as classical gradient descent and its convergence is insensitive to spatial resolution; such fast convergence is not seen in QAA.
From the perspective of QAA, the use of the radix2 representation scrambles the Euclidean topology so that the resulting discrete problem is even harder than the original problem. Failed to incorporate the continuity structure, QAA is hence sensitive to the resolution of spatial discretization – we observe that higher resolutions often cause worse QAA performance.Of course, radix2 representation is not the only way to discretize a continuous problem. One can lift QAA to the continuous domain by choosing its Hamiltonian over a continuous space in a general way. From this perspective, QHD could be possibly interpreted as a special version of the general QAA with a particular choice of the Hamiltonian. However, some existing results [4] suggest that QHD may have fast convergence that the general theory of QAA fails to explain.
Largescale empirical study based on analog implementation
The great promise of QHD could lie in solving highdimensional nonconvex problems in real world, however, a largescale empirical study is infeasible with classical simulation due to the curse of dimensionality. Even though theoretically efficient, an implementation of QHD instances of reasonable sizes on digital quantum computers would cost a gigantic number of faulttolerant quantum gates,^{5}, which renders any empirical study based on digital implementation a dead end for the near term.
Analog quantum computers (or quantum simulators) are alternative devices that directly emulate certain quantum Hamiltonian evolution without quantum gates, and usually have limited programmability. However, recent experimental results suggest a great advantage of continuoustime analog quantum devices over the digital ones for quantum simulation in the NISQ era due to their scalability and low overhead in simulation tasks. Compared with normal quantum algorithms that are typically described by quantum circuits, a unique feature of QHD is that its description is already a Hamiltonian simulation task by itself, which make it possible to leverage nearterm analog devices for its implementation.
A conceptually simple analog implementation of QHD would be building a quantum simulator whose Hamiltonian exactly matches the QHD Hamiltonian, which is, however, not very feasible in practice. A more pragmatic strategy is to embed the QHD Hamiltonian into existing analog simulators so we can emulate QHD as part of the full dynamics. To this end, we introduce the Quantum Ising Machine (or simply QIM), as an abstract model for some of the most powerful analog quantum simulators nowadays, which is described by the following quantum Ising Hamiltonian:
\[H(t) =  \frac{A(t)}{2} \left(\sum_j \sigma^{(j)}_x\right) + \frac{B(t)}{2} \left(\sum_j h_j \sigma^{(j)}_z + \sum_{j>k} J_{j,k} \sigma^{(j)}_z \sigma^{(k)}_z\right),~~~(2)\]
where \(\sigma^{(j)}_x\) and \(\sigma^{(j)}_z\) are the PauliX and PauliZ operator acting on the jth qubit, \(A(t)\) and \(B(t)\) are timedependent control functions. The controllability of \(A(t), B(t), h_j, J_{j,k}\) represents the programmability of QIMs, which would depend on the specific instantiation of QIM such as DWave systems, QuEra neutralatom system, and so on.
At a high level, our Hamiltonian embedding technique works as follows: (i) discretize the QHD Hamiltonian Eq. (1) to a finitedimensional matrix; (ii) identify an invariant subspace \(\mathcal{S}\) of the simulator Hamiltonian for the evolution; (iii) the simulator Hamiltonian Eq. (2) is properly programmed so its restriction to the invariant subspace \(\mathcal{S}\) matches the discretized QHD Hamiltonian. In this way, we effectively simulate the QHD Hamiltonian in the subspace \(\mathcal{S}\) (called the encoding subspace) of the full simulator Hilbert space. By measuring the encoding subspace at the end of the analog emulation, we obtain solutions to an optimization problem.
Precisely, consider the onedimensional case of QHD Hamiltonian that is \(\hat{H}(t) = e^{\varphi_t}(\frac{1}{2}\frac{\partial^2}{\partial x^2})+e^{\chi_t}f\). Following a standard discretization by the finite difference method, QHD Hamiltonian becomes \(\hat{H}(t)= \frac{1}{2}e^{\varphi_t}\hat{L}+e^{\chi_t}\hat{F}\) where the secondorder derivative \(\frac{\partial^2}{\partial x^2}\) becomes a tridiagonal matrix (denoted by \(\hat{L}\)), and the potential operator \(f\) is reduced to a diagonal matrix (denoted by \(\hat{F}\)).
We identify the socalled Hamming encoding subspace \(\mathcal{S}_H\) which is spanned by (n+1) Hamming states \(\{\ket{H_j}:j=0,1,\dots,n\}\) for any \(n\)qubit QIM. The \(j\)th Hamming state \(\ket{H_j}\) is the uniform superposition of bitstring states with Hamming weight (i.e., the number of ones in a bitstring) \(j\):
\[\ket{H_j} = \frac{1}{\sqrt{C_j}}\sum_{b=j}\ket{b},\]
where \(C_j\) is the number of states with Hamming weight \(j\). For example, there are \(n\) bitstring states with Hamming weight 1: \(\ket{0\dots001}\),\(\ket{0\dots010}\),…,\(\ket{1\dots000}\), and the Hamming1 state \(\ket{H_1}\) is the uniform superposition of all the \(n\) states. By choosing appropriate parameters \(h_j\), \(J_{j,k}\) in Eq. (2), the subspace \(\mathcal{S}_H\) is invariant under the QIM Hamiltonian. Moreover, the restriction of the first term \(\sum^r_{j=1} \sigma^{j}_x\) onto \(\mathcal{S}_H\) resembles the tridiagonal matrix \(\hat{L}\), and the restriction of the second term in the QIM Hamiltonian (with PauliZ and ZZ operators) represents a discretized quadratic function \(\hat{F}\). A measurement on \(\mathcal{S}_H\) can be effectively conducted by measuring the full simulator Hilbert space in the computational basis with a simple postprocessing. The Hamming encoding construction is readily generalizable to higherdimensional Laplacian operator \(\Delta\) and quadratic polynomial functions \(f\).
Our Hamming encoding enables an empirical study of a selfinteresting optimization problem called quadratic programming (QP) on quantum simulators. Specifically, we consider QP with box constraints:
\[\text{minimize}\qquad f(x)=\frac{1}{2} x^\top \mathbf{Q} x + \mathbf{b}^\top x,~~\text{subject to}~~\mathbf{0} \preccurlyeq x \preccurlyeq \mathbf{1},\]
where \(\mathbf{0}\) and \(\mathbf{1}\) are \(n\)dimensional vectors of all zeros and all ones, respectively. QP problems are the simplest case of nonlinear programming and they appear in almost all major fields in computational sciences. Despite of their simplicity and ubiquity, nonconvex QP problems (i.e., the Hessian matrix \(\mathbf{Q}\) is indefinite) are known to be NPhard in general.
We implement QHD on the DWave system^{6}, which instantiates QIM and allows the control of thousands of physical qubits with decent connectivity. While existing libraries of QP benchmark instances are natural candidates for our empirical study, most of them can not be mapped to the DWave system because of its limited connectivity. We instead create a new test benchmark with 160 randomly generated QP instances in various dimensions (5, 50, 60, 75) whose Hessian matrices are indefinite and sparse^{7}, an analog implementation of which are possible on the DWave machine (referred as DWQHD).
We compare DWQHD with 6 other stateoftheart solvers in our empirical study: DWQAA (baseline QAA implemented on DWave), IPOPT, SNOPT, MATLAB’s \(\texttt{fmincon}\) (with “SQP” solver), QCQP, and a basic SciPy \(\texttt{minimize}\) function (with “TNC” solver). In the two quantum methods (DWQHD, DWQAA), we discretize the search space \([0,1]^d\) into a regular mesh grid with 8 cells per edge due to the limited number of qubits on the DWave machine. To compensate the loss of the resolution, we postprocess the coarsegrained DWave results by the SciPy \(\texttt{minimize}\) function, which is a local gradient solver mimicking the descent phase of a higherresolution QHD and only has mediocre performance by itself. The choice of classical solvers covers a variety of stateoftheart optimization methods, including gradientbased local search (SciPy \(\texttt{minimize}\), interiorpoint method (IPOPT), sequential quadratic programming (SNOPT, MATLAB), and heuristic convex relaxation (QCQP). Finally, to investigate the quality of the DWave machine in implementing QHD and QAA, we also classically simulate QHD and QAA for the 5dimensional instances (SimQHD, SimQAA).^{8}
We use the timetosolution (TTS) metric [5] to compare the performance of solvers. TTS is the number of trials (i.e., initialization for classical solvers or shot for quantum solvers) required to obtain the correct global solution^{9} up to 0.99 success probability:
\[\text{TTS} = t_f \times \Big\lceil\frac{\ln(10.99)}{\ln(1p_s)}\Big\rceil,\]
where \(t_f\) is the average runtime per trial, and \(p_s\) is the success probability of finding the global solution in a given trial. We run 1000 trials per instance and compute the TTS for each solver.
In Figure 4, we show the distribution of TTS for different solvers. We also provide spreadsheets that summarize the test results for the QP benchmark. In the 5dimensional case (Figure 4A), SimQHD has the lowest TTS, and the quantum methods are generally more efficient than classical solvers. Note that, with a much shorter annealing time (\(t_f=1\mu s\) for SimQHD and \(t_f=800\mu s\) for DWQHD), SimQHD still does better than DWQHD, indicating the DWave system is subject to a significant load of noises and decoherence. Interestingly, SimQAA (\(t_f=1\mu s\)) is worse than DWQAA (\(t_f=800\mu s\)), which shows QAA indeed has much slower convergence. In the higher dimensional cases (Figure 4B,C,D), DWQHD has the lowest median TTS among all tested solvers. Despite of the infeasibility of running SimQHD in high dimensions, our observation in the 5dimensional case suggests that an ideal implementation of QHD could perform much better than DWQHD, and therefore all other tested solvers in high dimensions.
It is worth noting that DWQHD does not outperform industriallevel nonlinear programming solvers such as Gurobi and CPLEX. In our experiment, Gurobi usually solves the highdimensional QP problems with TTS no more than 0.01s. These solvers approximate the nonlinear problem by potentially exponentially many linear programming subroutines and use a branchandbound strategy for a smart but exhaustive search of the solution.^{10} However, the restriction of the DWave machine (e.g., programmability and decoherence) forces us to test on very sparse QP instances, which can be efficiently solved by highlyoptimized industriallevel branchandbound solvers. On the other side, we believe that QHD should be more appropriately deemed as a quantum upgrade of classical GD, which would more conceivably replace the role of GD rather than the entire branchandbound framework in classical optimizers.
Conclusions
In this work, we propose Quantum Hamiltonian Descent as a genuine quantum counterpart of classical gradient descent through a pathintegral quantization of classical algorithms. Similar to classical GD, QHD is simple and efficient to implement on quantum computers; meanwhile, it leverages the quantum tunneling effect to escape from spurious local minima, which we believe could replace the role of classical GD in many optimization algorithms. Moreover, with the newly developed Hamiltonian embedding technique, we conduct a largescale empirical study of QHD on nonconvex quadratic programming instances up to 75 dimensions, via an analog implementation of QHD on the DWave instantiation of a quantum Ising Hamiltonian simulator. We believe that QHD could be readily used as a benchmark algorithm for other quantum or semiquantum analog devices, for testing the quality of the devices and conducting more empirical study of QHD.
References
[1] Patrick Rebentrost, Maria Schuld, Leonard Wossnig, Francesco Petruccione, and Seth Lloyd, Quantum gradient descent and newton’s method for constrained polynomial optimization, New Journal of Physics 21 (2019), no. 7, 073023.
[2] Andre Wibisono, Ashia C Wilson, and Michael I Jordan, A variational perspective on accelerated methods in optimization, Proceedings of the National Academy of Sciences 113 (2016), no. 47, E7351–E7358.
[3] Edward Farhi, Jeffrey Goldstone, Sam Gutmann, Joshua Lapan, Andrew Lundgren, and Daniel Preda, A quantum adiabatic evolution algorithm applied to random instances of an NPcomplete problem, Science 292 (2001), no. 5516, 472–475.
[4] Gheorghe Nenciu, Linear adiabatic theory: exponential estimates, Communications in Mathematical Physics 152 (1993), no. 3, 479–496.
[5] Troels F Rønnow, Zhihui Wang, Joshua Job, Sergio Boixo, Sergei V Isakov, David Wecker, John M Martinis, Daniel A Lidar, and Matthias Troyer, Defining and detecting quantum speedup, Science 345 (2014), no. 6195, 420–424.
Footnotes

More details of the 22 optimization instances can be found in the ‘‘Nonconvex 2D’’ section. ↩

Effectively, this means we discretize the continuous domain \([0,1]^2\) into a \(128 \times 128\) mesh grid. ↩

No highenergy component with energy level $\ge 10$ is found. ↩

For Levy function, the predicted semiclassical energy ratio reads \(E_1/E_0\approx1.38\), which matches our numerical data. ↩

We compute the count of T gates in the digital implementation of QHD. It turns out that solving 50dimensional problems with low resolution will cost hundreds of millions of faulttolerant T gates. ↩

We access the DWave advantage_system6.1 through Amazon Braket. ↩

See the “Quadratic Programming” section for more details. ↩

Note that we numerically compute SimQHD and SimQAA for \(t_f = 1 \mu s\), which is much shorter than the time we set in the DWave experiment (in DWQHD and DWQAA, we choose \(t_f = 800 \mu s\). ↩

For each test instance, the global solution is obtained by Gurobi. ↩

We show that the runtime of Gurobi scales exponentially with respect to the problem dimension for QP. ↩