A quantum-enhanced support vector machine for galaxy classification

Galaxy morphology, a key tracer of the evolution of a galaxy's physical structure, has motivated extensive research on machine learning techniques for efficient and accurate galaxy classification. The emergence of quantum computers has generated optimism about the potential for significantly improving the accuracy of such classifications by leveraging the large dimensionality of quantum Hilbert space. This paper presents a quantum-enhanced support vector machine algorithm for classifying galaxies based on their morphology. The algorithm requires the computation of a kernel matrix, a task that is performed on a simulated quantum computer using a quantum circuit conjectured to be intractable on classical computers. The result shows similar performance between classical and quantum-enhanced support vector machine algorithms. For a training size of $40$k, the receiver operating characteristic curve for differentiating ellipticals and spirals has an under-curve area (ROC AUC) of $0.946\pm 0.005$ for both classical and quantum-enhanced algorithms. This investigation is among the very first applications of quantum machine learning in astronomy and highlights their potential for further application in this field.


INTRODUCTION
Studying the morphology of galaxies is essential for understanding their evolution and formation.Therefore, classifying galaxies based on their morphology is crucial in observational cosmology.Such classification was initially performed by Hubble (Hubble 1926), in which bulge-dominant and disk-dominant galaxies were differentiated.With the increase of galaxy images collected by sky surveys, automated algorithms, such as machine learning (ML), were developed to achieve high-speed and accurate galaxy classification.One of the first examples of using ML to classify galaxies was presented by Lahav et al. (1995) while recent examples include Barchi et al. (2020); Walmsley et al. (2022); Bhambra et al. (2022).
With the emergence of quantum computers and their remarkable progress in recent years, there is growing confidence that these machines can boost the performance of machine learning techniques, thanks to their distinct characteristics compared to classical computers.In this new computing paradigm, classical bits are replaced with quantum bits (qubits), which can interfere and entangle each other.A classical computer must keep track of 2  parameters to execute a quantum computing algorithm with  well-entangled (non-separable) qubits.On the other hand, such an algorithm is executed naturally and hence more efficiently on a quantum computer.Therefore, an ML algorithm enhanced by quantum computing could potentially outperform fully-classical ML algorithms.With the help of quantum computers, such algorithms can be executed efficiently, while not all ★ E-mail: m.hassanshahi@ucl.ac.uk of these algorithms are guaranteed to be efficiently executable by classical computers alone.Support vector machine (SVM) (Cristianini et al. 2000) is a supervised classification algorithm that finds a hyperplane separating data into two classes.The remarkable feature of SVM is the use of the kernel method, which facilitates non-linear classification by implicitly mapping data to a (typically higher-dimensional) space where a linear classification can be performed.With the kernel method, data could be mapped to the exponentially large Hilbert space of qubits using quantum circuits in a way which is intractable to classical computers (Havlíček et al. 2019).Such quantum circuits are made of a set of unitary quantum gates which can rotate single or multiple qubits by an amount controlled by the value of data points.An advantage of the kernel method is that an explicit mapping of all data points to the feature space is not needed.Instead, it is sufficient to compute their inner product and construct the corresponding kernel matrix.This implicit mapping can significantly reduce computation costs.
A necessary condition for a possible quantum advantage is to use a quantum circuit that is difficult to simulate on classical computers.Although there is a long list of such circuits, they typically include many layers of quantum gates (a.k.a deep circuits) such that they are not implementable on currently-available quantum computers.These quantum computers, named near-term or noisy intermediatescale quantum (NISQ) devices, are prone to noise which can significantly deteriorate the performance of deep quantum circuits.Recently, Havlíček et al. (2019) proposed a family of quantum circuits with an intermediate depth which is not only implementable on nearterm devices but is also conjectured to be difficult to simulate on classical computers.
In this study, we developed an SVM algorithm to classify elliptical and spiral galaxies based on their shapes.The kernel matrix was computed in two different ways: (i) using (simulated) quantum computers and (ii) using classical computers.The classifiers produced with these kernels are called quantum and classical kernel classifiers hereafter.The quantum kernel was computed using a quantum circuit from the quantum circuit family (Havlíček et al. 2019) which was conjectured to achieve a quantum advantage.The kernel was then fed into the classical SVM optimiser for finding the classifier hyperplane.In contrast, the conventional SVM algorithm was performed for the classical kernels.Given the random nature of quantum processes, the quantum kernel can only be estimated by measuring the qubits R times on a real quantum computer.Since the wave function amplitudes are accessible in simulation, we computed the exact kernel matrix (or equivalently estimated it for =∞ in a noiseless quantum computer).
As mentioned earlier, noise is a major problem in near-term devices.One of the main advantages of quantum-enhanced SVM algorithms is that when the quantum circuit used for mapping the features in the kernel method is not too deep, error mitigation techniques can be applied (Temme et al. 2017;Li & Benjamin 2017;Kandala et al. 2019;Liu et al. 2021).Such techniques allow the algorithm to be executed on near-term devices without significant loss of computational power.The effectiveness of error-mitigation techniques and the robustness of kernel entries to noise have been demonstrated on near-term devices (Havlíček et al. 2019;Kusumoto et al. 2021;Bartkiewicz et al. 2020;Peters et al. 2021;Liu et al. 2021).These advantages make quantum-enhanced SVM algorithms leading candidates for achieving quantum advantage on near-term devices (Liu et al. 2021).
Quantum machine learning is new in the field of astronomy.Caldeira et al. (2019) have used Restricted Boltzmann Machines (RBMs) for a morphology classification of galaxies using a quantum annealer.They found that for small datasets, a quantum annealerbased RBM outperforms certain classical algorithms.In another study, Peters et al. (2021) have employed the SVM algorithm with quantum kernel estimation to classify two supernova types.They designed a quantum circuit which is robust for execution on near-term devices although the quantum circuit used in the study is not difficult to simulate on classical computers.The paper demonstrated that the classification performance on a near-term device is comparable to the noiseless simulation.
Section 2 illustrates the procedure for the collection and preprocessing of the input data, section 3 describes the SVM algorithm and quantum kernel estimation, and section 4 explains the results of this study.

THE GALAXY DATA
The data used in this analysis is collected from the publicly available Galaxy Zoo 1 (GZ1) dataset (Lintott et al. 2011(Lintott et al. , 2008)).This dataset includes morphological classification of galaxy images drawn from the Sloan Digital Sky Survey (SDSS).A few examples of these images, which are 2D projections of 3D bodies, are shown in Fig. 1.A large number of volunteers contributed to this classification by labelling galaxies visually based on their shapes.After performing bias corrections, galaxies are classified as 'spiral' or 'elliptical' if more than 80% of the debiased votes are in these categories, while all other galaxies are labeled as 'uncertain'.These labels are considered true labels throughout this study.
The features of the galaxies in this dataset are collected from the morphological metrics provided in a catalog in Barchi et al. (2020).A total of five distance-independent features are included in this analysis for model training: (i) concentration  = log 10 (  /  ) (Conselice 2003; Lotz et al. 2004;Ferrari et al. 2015), where   and   are the radii of the spheres enclosing (in this measurement) 75% and 25% of the total galaxy flux, (ii) asymmetry  = 1 − ( 0 −   ) defined using the Spearman's rank correlation coefficient () of the flux of the galaxy image  0 and its -rotated version   , (iii) smoothness  = 1 − ( 0 −   ) defined similarly for a comparison with the flux of the smoothed image, (iv) second gradient moment G2 extracted from the Gradient Pattern Analysis (GPA) method (Rosa et al. 2018), and (v) the Shannon information entropy H of the galaxy image pixels (Ferrari et al. 2015) which is expected to be low for smooth galaxies.More information about these features can be found in Barchi et al. (2020).All features were extracted using CyMorph (Rosa et al. 2018;Barchi et al. 2020), written in Cython, a language for writing C extensions for Python.
The features dataset was merged with the GZ1 dataset by matching the ID of galaxies.The merged dataset was trimmed by removing galaxies for which non-physical values were assigned or the CyMorph algorithm failed.Galaxies labelled 'uncertain' were also removed from the dataset, leaving only 'spiral' and 'elliptical' labels.Features were normalised by linearly scaling them to the [0,1] interval.
A property defined in Barchi et al. (2020) for each galaxy image is the area of the galaxy derived from its Petrosian radius   divided by the area of the point spread function (PSF) estimated from the full width at half maximum (FWHM), (1) (More information on   can be found in Petrosian (1976); Eisenstein et al. (2011).)Samples with large K gal generally include larger objects.The dataset used in this analysis has a spatial resolution of 0.396 arcsec/pixel and a PSF FWHM of ≈1.5 arcsec (Barchi et al. 2020).This mediocre resolution led us to divide the dataset into K gal ≥ 5, K gal ≥ 10, and K gal ≥ 20 and train a classifier for each, following what is done in Barchi et al. (2020).Since K gal has a similar distribution for the ellipticals and spirals, and to avoid unnecessary divergence from Barchi et al. (2020), it was not used in the classifiers as an input feature.

Support Vector Machine
In order to classify galaxies, we utilized the support vector machine (SVM) algorithm.For a dataset (x 1 ,  1 ), ..., (x  ,   ), this algorithm finds a hyperplane which maximises the separation (margin) between two classes, where x  ∈R  and   ∈{−1, 1} are the feature vector and the corresponding class label for the i'th datapoint, respectively.The hyperplane can be formulated as w ⊺ x +  = 0, where w is the normal vector of the hyperplane and /∥w∥ is the distance of the origin to the hyperplane.The nearest datapoints x  to the hyperplane from either of classes are called support vectors (SV) and the margin boundaries are hyperplanes passing through SVs, w ⊺ x +  = ±1 (see Fig. 2).
When the two classes are linearly separable, all datapoints x  with   =1 (  =−1) satisfy w ⊺ x+ ≥ 1 (w ⊺ x+ ≤ −1), meaning that they lie on the correct side of the margin (a.k.a hard margin).However, a more relaxed margin condition (a.k.a.soft margin) can be used in SVM, which allows some data points of each class to cross their corresponding margin boundary in exchange for a penalty term in the loss function.Taking this into account, the SVM optimisation problem can be formulated as min ,, where   is the distance of x  to its corresponding margin boundary if it has crossed the boundary and   =0 otherwise. is a hyperparameter of the optimization problem, which controls the strength of the penalty for the crossed-boundary data points.This optimisation problem is the primal representation of a dual where e is a vector of ones and Q is an  ×  matrix with Q        (x  , x  ), where the kernel matrix  is constructed by the inner product of datapoints,  (x  , x  ) = ⟨x  , x  ⟩.After the optimisation is complete, the class of a new datapoint x  is predicted using the sign of the decision function The kernel matrix can be utilized for an efficient non-linear data classification with the kernel method .In this method, before computing the inner products, datapoints are mapped with a feature map into a high-dimensional space, called feature space, where a linear classification is performed (see Fig. 3).The kernel matrix therefore becomes  (x  , x  ) = ⟨(x  ), (x  )⟩.
An essential property of the kernel method is that it only requires computing inner products, thereby avoiding an explicit mapping of the datapoints to the feature space that may be computationally expensive.

Quantum Kernel Estimation
Quantum computers can be used to estimate the kernel matrix in the SVM algorithm.It is based on the idea that instead of conventional classical feature spaces, one can exploit the exponentially large Hilbert space by leveraging controllable entanglement and superposition between qubits.A datapoint x  is non-linearly mapped to the quantum state (x  ) = |(x  )⟩⟨(x  )| in the Hilbert space, which serves as our feature space.In this space, an inner product between two quantum states (x  ) and (x  ) is defined by tracing over their product, and each kernel matrix entry is subsequently calculated to be (5) This matrix entry can be calculated using unitary matrix U: where |0 ⊗  ⟩ is the initial state with all qubits in the |0⟩ state.The kernel matrix is then estimated by measuring the state U † (x  )U (x  )|0 ⊗  ⟩ R times (shots) and then calculating the fraction of times where all qubits are measured to be 0. In our study, following the design of the circuit family proposed in Havlíček et al. (2019), the number of qubits N was chosen to be the same as the number of features.We computed the asymptotic case (R=∞) by accessing the output of the simulated quantum circuit.Depending on the choice of feature maps, and hence U, the complexity of estimating the quantum kernel and the performance of the resulting classifier changes.To achieve quantum advantage, we are particularly interested in feature maps based on quantum circuits that cannot be simulated efficiently on classical computers, while maintaining high performance for the classifier.The number of parameters a classical computer needs to track grows exponentially with the number of qubits if the qubits are well entangled.This can make the simulation of quantum circuits difficult, especially when the circuit is sufficiently deep including a large number of intermediate gates, and this is where the quantum advantage is achievable.Current quantum computers are not large enough to implement a deep quantum circuit for this purpose.However, a recent study (Havlíček et al. 2019) proposed a quantum circuit conjectured to provide quantum advantage while, more importantly, being implementable on current quantum computers.The authors showed that estimating kernel with the quantum circuit described below is directly related to a 3-fold forrelation ('fourier correlation') (Aaronson & Ambainis 2018) problem and could lead to quantum advantage.
A more generalised version of the unitary U proposed in Havlíček et al. ( 2019) is available as PauliFeatureMap class in the Qiskit software development kit.We used a subset of the generalised unitary which are of the form U (x) =   (x)  ⊗    (x)  ⊗  , where  is the conventional Hadamard gate which puts computational bases into equal superposition while   (x) is an entangling unitary parametrised by datapoint x.
The unitary   (x) is formulated as where  controls rotations and interactions,   ∈ {, ,  ,  } denotes the identity and Pauli matrices, ∈{   combinations, =1, ...} shows a subset of qubits (or features) to interact, and   (x) is a userdefined function of features which adjusts the amount of rotation.In this study, we only considered ||≤2, meaning that three-and more qubit interactions are excluded.Fig. 4 presents an example of this unitary for the case where single-qubit rotation   = and two-qubit interaction  , =  while no ≥3-qubit interactions exist.The full circuit used for our kernel estimation is shown in Fig. 5.
After the kernel is estimated, it is fed into a conventional SVM to find the optimised hyperplane.Once the hyperplane is found, datapoints from the test set are classified using the decision function in Eq. 4.

CLASSIFICATION RESULTS
This section describes the result of using our classical and quantum kernel classifiers to predict the morphological types of galaxies.An important step for minimising the loss function of a classifier is hyperparameter optimisation.For each kernel, the hyperparameters which maximise the area under the receiver operating characteristic curve (ROC AUC) for separating ellipticals and spirals were found using a grid search.
In the classical kernel, the commonly used radial basis function  (RBF) kernel was used.The regularisation term  was searched over 9 orders of magnitude, 10  for −1 ≤  ≤ 8, while for the kernel coefficient , the search included values between 0.0001 and 100, with most values centred around 1.During the search, hyperparameter configurations that significantly underperformed compared to other configurations were removed.A full grid search was then performed on the remaining parameters.The best hyperparameter for the classical kernel was  = 0.01 and  = 10 7 .A similar approach was taken for the quantum kernel.The hyperparameters in this kernel are different unitaries   (x) defined in Eq. 7. Circuits with single-qubit rotations followed by two-qubit interactions between all pairs of qubits are considered in our study, where the rotations and interactions are induced by Pauli matrices (an example shown in Fig. 4).The rotation factor  changed from 0.005 to 1.4 while the regularisation term  varied between 10 and 10 8 .We used the default data-mapping function   (x) of the PauliFea-tureMap class, which is   (x) =  0 for single-qubit rotations (i.e. when || = 1) and   (x) = ( −  0 )( −  1 ) for two-qubit interactions (i.e. when || = 2).The configuration which yielded the best ROC AUC score was  = 0.03,  = 10 7 , and unitary   (x) with   = and  , = .
The ROC AUC score of the quantum and classical kernel classifiers are compared as a function of training size in Fig. 6.The scores derived for the K gal ≥ 5 dataset show that the two classifiers have a comparable performance regardless of the training size.We utilised 40k data points to train the classifiers, a number constrained by the computational resources available for running the quantum kernel  Since the dataset used in this study is derived from Barchi et al. (2020), it is worth comparing the results.The authors in that study used a boosted decision tree (BDT) and a deep learning (DL) method for performing spiral-elliptical binary classification of galaxies.While this comparison provides a general understanding of the relative performance of the classifiers, it might not be a fair comparison due to subtle differences.For example, (i) the training size in our study was limited by computing resources required for the quantum kernel classifier, (ii) we applied 5-fold cross-validation while they used an 80-10-10 splitting of the data, and (iii) we optimised the hyperparameters by maximising the ROC AUC score while they potentially chose another score.The comparison is summarised in Table 1.When galaxies with low K gal values are included in the dataset (K gal ≥ 5), the SVM outperforms the BDT, while its performance is worse than the DL.For large values of K gal (K gal ≥ 20), the performance of the BDT and SVM are comparable, but both are worse than the DL.Given the above-mentioned differences between the two studies, the main point of this comparison is that the results are compatible with each other.
The ROC curves for the quantum and classical kernel classifiers are compared in Fig. 7.The curves are for the K gal ≥ 5 dataset for different training sizes.The two kernels exhibit comparable performance.

CONCLUSIONS
In this study, we apply a quantum-enhanced SVM algorithm for classifying galaxies into spirals or ellipticals based on their morphology.The Galaxy Zoo 1 dataset was used to collect volunteer-labelled galaxies, and five features per galaxy were extracted from the catalogue provided in Barchi et al. (2020).The SVM algorithm utilises the kernel method by implicitly mapping data points into a feature space (with a feature map), where they are classified with a hyperplane.In this algorithm, quantum computers can be used to estimate the kernel matrix, which is then passed to a standard SVM optimiser to find the optimal hyperplane using classical computers.For this galaxy classification problem, we employed a feature map that is feasible for implemention on near-term quantum computers and is conjectured to be intractable on classical ones.Using simulations, we showed that the performance of this algorithm is comparable to a fully-classical SVM for training sizes ranging from 400 to 40k.Following Barchi et al. (2020), we used a parameter K gal , which is proportional to the size of galaxies, to compare the performance of our classical and quantum SVM algorithms for different ranges of this parameter.Both algorithms exhibit similar improvement in performance with increasing K gal .For the dataset with K gal ≥ 5 and the training size of 40k, the ROC AUC score was found to be 0.946 ± 0.005 for both the classical and quantum kernel classifiers, where the uncertainty is the standard deviation of the scores derived from 5-fold cross-validation.
Our findings show that, despite the limited number of qubits provided by current devices, quantum models can provide similar performance to classical ones across a wide range of training sizes.This is in agreement with previous studies (Peters et al. 2021;Belis et al. 2021;Wu et al. 2021;Fadol et al. 2022;Duckett et al. 2022;Schuhmacher et al. 2023).It has recently been shown (Park et al. 2020) that a quantum SVM is capable of achieving higher performance for datasets with complex boundaries between the two classes.Future studies could explore whether incorporating a larger number (or different set) of features provides a different class boundary, leading to an improvement in the performance of the quantum classifier compared to the classical one.Given that our quantum circuit requires an equal number of qubits and features, adding features requires additional qubits.However, the rapid proliferation of available quantum devices makes the issue of qubit availability less concerning for future work.Another possibility to improve the performance of the quantum classifier is to encode the relationships (if present) between galaxy features in the quantum circuit (e.g.see Heredge et al. (2021)).Future studies could also investigate how to perform multi-class classification with a reasonable training dataset.
In conclusion, our result motivates further study of quantum machine learning techniques to problems in astronomy.

Figure 1 .
Figure 1.Examples of spiral (top row) and elliptical (bottom row) galaxies.The images were taken fromSchawinski et al. (2010)

Figure 2 .
Figure 2. Illustration of how the SVM algorithm separates the elliptical (yellow) and spiral (blue) galaxies in the space spanned by two features  1 and  2 .The decision boundary hyperplane is w ⊺ x +  = 0, and the closest data points to the hyperplane, which are located on w ⊺ x +  = ±1, are called support vectors.

Figure 3 .
Figure 3. Illustration of kernel method.When data are not linearly separable in the input space, they are mapped to the feature space and linearly classified.

Figure 4 .
Figure 4.Quantum circuit for an example of unitary   (x) with three qubits, each taking one of the features   of datapoint x as input.Single-qubit and two-qubit gates are shown in pink and blue, respectively.

Figure 5 .
Figure 5.The full quantum circuit used for estimating the inner product | ⟨  (x  ) |  (x  ) ⟩ | 2 .Unitaries   (x) (blue) are interleaved with Hadamard gates (yellow) and the quantum state is measured (orange) at the end of the circuit.

Figure 6 .
Figure 6.ROC AUC score as a function of training size for quantum and classical kernel classifiers.The ROC scores are computed by applying the models to the test sets using 5-fold cross-validation.The data for this plot is for K gal ≥ 5 condition.

Figure 8 .
Figure8.ROC AUC as a function of K gal .These ROC AUC scores and their uncertainties are derived from performing a 5-fold cross-validation on 50000 galaxy images.The model trained on 5 ≤ K gal , 10 ≤ K gal , and 20 ≤ K gal is applied to the 5 ≤ K gal ≤ 10, 10 ≤ K gal ≤ 20, and 20 ≤ K gal regions, respectively.

Table 1 .
ROC score for different models and conditions.