Scalable supervised online hashing for image retrieval

Online hashing methods aim to learn compact binary codes of the new data stream, and update the hash function to renew the codes of the existing data. However, the addition of new data streams has a vital impact on the retrieval performance of the entire retrieval system, especially the similarity measurement between new data streams and existing data, which has always been one of the focuses of online retrieval research. In this paper, we present a novel scalable supervised online hashing method, to solve the above problems within a unified framework. Specifically, the similarity matrix is established by the label matrix of the existing data and the new data stream. The projection of the existing data label matrix is then used as an intermediate term to approximate the binary codes of the existing data, which not only realizes the semantic information of the hash codes learning but also effectively alleviates the problem of data imbalance. In addition, an alternate optimization algorithm is proposed to efficiently make the solution of the model. Extensive experiments on three widely used datasets validate its superior performance over several state-of-the-art methods in terms of both accuracy and scalability for online retrieval task.


Introduction
With the continuous development of visual information technology, similarity search has become one of the basic requirements for large-scale image retrieval applications (Tang et al., 2015;Shu et al., 2018;Tang et al., 2019;Cai et al., 2021). The hash nearest neighbor search method has been widely used in the field of data search due to its fast calculation speed and less storage space (Slaney & Casey, 2008;Weiss et al., 2008;Wang et al., 2010;Heo et al., 2012;Liu et al., 2012Liu et al., , 2016Lin et al., 2015;Shen et al., 2015;Shi et al., 2016;Tang et al., 2016;Yang et al., 2016;Gui et al., 2018;Jiang & Li, 2018;. At present, existing research work is mainly to process data in a batch process, i.e. select training data and test data from the data given at a time, and use these fixed data to learn the hash function and test the retrieval performance. Obviously, such data processing is far from being capable of data retrieval tasks in the dynamic data growth model in reality. Therefore, many stud-ies have begun to pay attention to online hashing retrieval that can handle new data stream, which has the advantages of fast retrieval speed, low storage rate, and timely processing of dynamic data (Huang et al., 2013;Ç akir & Sclaroff, 2015;Leng et al., 2015;Ç akir et al., 2017a, b;Lin et al., 2018Lin et al., , 2019. The difference between the research focus of the online hashing method and the general hash method is that the online hash method studies how to incrementally update the hash model based on new data without retraining historical data. However, under normal circumstances, if the model is only updated with new coming data while neglecting the historical data, it will inevitably cause a certain amount of information loss and decrease of search accuracy. Therefore, how to efficiently update the hash model on the premise of ensuring the search accuracy is a problem that the online hashing method needs to solve.
In the online hashing method, the data will reform along with the change of time, and the model should be updated continuously to cater to the new data. However, the hash code of the historical data stored in the training set is not obtained by mapping using the latest hash function, but received by mapping the hash function at a certain time in the past. Since the hash mapping functions of the data are inconsistent, this may result in a decrease in search accuracy. Therefore, in the research of online hashing methods, it is necessary to balance the influence of historical data and new data on model updates to ensure the accuracy of hash search. Briefly, online hash work mainly focuses on exploring and solving three problems. First, establishing an effective similarity measure between existing data and new data streams is crucial to improving retrieval performance; second, how to improve the robustness and retrieval performance of the hash models by solving the impact of the addition of new data streams; finally, the combination of the data characteristics of the new data stream with limited training data to efficiently learn the hash function also plays a key role in online hashing.
In response to the first question, establishing a similarity relationship between existing data and new data streams is essential to achieve efficient retrieval. Most of the existing online retrieval methods use sample data labels to build similarity relationships between them (Lin et al., 2019(Lin et al., , 2020bWeng & Zhu, 2020). For the second question, the advent of the new data stream is a big test for the stability of the entire optimization model, especially the attribution of the new data stream category, and the existing data category may be incompatible (Ç akir & Sclaroff, 2015;Leng et al., 2015;Chen et al., 2017). For the third question, as far as we know, only few studies have been carried out on how to effectively accelerate the processing of training data and efficiently learn hash functions. In this paper, we present a novel supervised online hashing method, termed scalable supervised online hashing for image retrieval, SSOH for short, to extract the core information contained in the new data stream in a timely and fast manner, and learn the hash code efficiently. The overall structure of our method is shown in Fig. 1.
The main contributions of the proposed SSOH are summarized as follows: 1. To construct a similarity measure between the new data stream and the existing data, a new loss function is introduced into the SSOH model. During optimization, largescale pairwise similarity calculations are replaced by precalculated intermediate variables, which effectively saves calculation costs. 2. What's more, the introduction of intermediate variables also effectively alleviates the phenomenon of data imbalance, which is generally caused by the addition of dynamic new data streams that are quite different from the existing data. 3. To reduce the quantization error in the Hamming space, we designed a discrete optimization algorithm that makes it possible to achieve online discrete optimization. The algorithm can perform fast retrieval tasks, which is also confirmed by subsequent experiments.
The rest of this paper is organized as follows. Section 2 briefly reviews the related literature. The SSOH method is presented in Section 3. Section 4 describes the experimental settings and results on three benchmark datasets followed by the conclusion and future work in Section 5.

Related Work
In view of the broad application prospect of hash method in real life, domestic and foreign experts have made a lot of efforts and contributions. Next, we divide the hashing method into offline hashing and online hashing according to the incremental update of the hash model when the research data change dynamically.

Offline hashing
At present, most of the hash-based search methods that have been proposed adopt batch processing mode for hash learning; that is, all data information needs to be given in advance, and hash learning is performed on the overall dataset to obtain the hash function and the hash code corresponding to the data. In order to generate an effective and compact binary code, dataindependent Locality Sensitivity Hashing creatively uses random projection to project the sample data to the binary code. Subsequently, a variety of this method is proposed by many scholars (Mu & Yan, 2010;Wang et al., 2016;Qian et al., 2017), but in order to generate efficient binary codes, longer bits are often required. In view of the contribution of semantic labels in the training phase, the supervised hash method surpasses the unsupervised method in retrieval performance. For example, Iterative Quantization (Gong & Lazebnik, 2011) minimizes the quantization error of the vertices of the sample data mapped to the binary hypercube, and uses the rotation transformation to preserve the semantic similarity between the sample data. Multiview Discrete Hashing (Shen et al., 2018) is an effective learning compact hash code in unsupervised large-scale multimedia data, which utilizes matrix decomposition technology to extract the core information of the sample, and combines spectral clustering to reflect the underlying semantics, so effectively improves the discrimination and scalability of the hash code. Semipaired Discrete Hashing  realizes the similarity expression of cross-view graph exploration of semipaired data, and completes the pairing between samples in the common subspace, which can be used for semipaired data retrieval tasks for multiview. Self-supervised Adversarial Hashing (Li et al., 2018a) creatively uses two adversarial networks to efficiently express the similarities between different modalities, and mines deep semantics under the multilabel annotations of the samples. Multi-Task Consistency Preserving Adversarial Hashing (Xie et al., 2020) takes advantage of an end-to-end learning method to generate cross-modal hash codes, while achieving this task, a multitask adversarial learning model is designed to effectively maintain semantic consistency and improve the performance of cross-modal retrieval tasks. Deep Ordinal Hashing (Jin et al., 2019) utilizes the convolutional network to extract the global feature information of the sample data, which effectively takes into account the local spatial information, designs an end-to-end hash learning framework, and generates an efficient hash code. Instance-Aware Hashing (Lai et al., 2016) is based on the framework of the deep network, the sample data are organized into multiple groups, each group contains a category of features, and then the sample is expressed as a multisegment hash code to achieve the task of multilabel image retrieval.

Online hashing
Online hashing (OKH; Huang et al., 2013), the pioneering work of online hashing retrieval, is first proposed by Long-Kai Huang et al. IJCAI in 2013. The online hashing model is presented on the basis that the kernel function in the traditional hashing method can improve the retrieval performance (Kulis & Grauman, 2009Jiang et al., 2015); that is, when the data are obtained, the kernel function is used to map the linearly inseparable data into the linearly separable data, thereby realizing online retrieval. However, since only one pair of data can be processed per round in the retrieval process, OKH is not suitable for large-scale data processing requirements. In order to meet the urgent needs of large-scale data processing, Online Sketching Hashing (sketch-Hash) is proposed (Leng et al., 2015). The core idea of this method is to extract the sketching information of each data block and update the hash model according to the sketching information. With the substantial reduction of data volume, the computational complexity is reduced. The method transforms the problem of data block processing into the issue of matrix eigenvalues, and realizes the concept of online hash model update for large-scale data blocks. However, when the method extracts the data sketching matrix, as the data accumulate, the hash model may not learn the characteristics of the new data instantly, resulting in a decrease in the search accuracy of the online hashing method. In order to reduce the dependence on sample labels, Xing Tang et al. (Xing & Ng, 2016) proposed a semisupervised online hashing model, which entirely takes into account the fact that samples cannot be fully labeled, and utilizes a small number of labels and most unlabeled samples for semisupervised learning.
In 2015, Fatih Cakir et al. (Ç akir & Sclaroff, 2015) proposed an online hashing AdaptiveHash based on the stochastic gradient descent method. The method significantly reduces storage space and substantially increases the computing speed. In 2017, they further proposed online supervised hashing in order to extend the online hash search mechanism to large-scale datasets (OSH; Ç akir et al., 2017a). The method breaks through the incompatibility between new data streams and existing data labels for the first time, and realizes the promotion of large-scale data applications. The addition of the new data stream caused the data imbalance in the optimization model. In order to solve this problem, Lin et al. (2019) proposed BSODH, which effectively alleviates the phenomenon of data imbalance. However, in the article, the similarity parameters are artificially set, which causes the similarity degree between the sample data to be divided into only two kinds, and obviously does not conform to the similarity between the sample data. This is also one of the problems that this article attempts to solve. In 2020, Mingbao Lin, Rongrong Ji et al. (Lin et al., 2020a) proposed SPLH, aiming to establish link label space and Hamming space, through which the semantic space distance of the sample is accurately measured, so the online retrieval performance is improved. Fast Class-wise Updating for Online Hashing (Lin et al., 2020c) adopts the idea of divide and conquer, uses the label information of the data to classify, and learns the hash code in the way of class update. Incremental Hashing (Ng et al., 2020) is a method that uses the dominant set for sample selection and processing the dynamic set of image retrieval. Other representative single-modal online hashing studies include Chen et al. (2016), Zhu et al. (2017), and Li et al. (2018b).
The unsupervised online cross-modal hashing method  is proposed in 2016.This method utilizes a projection matrix to map the original data matrix, and realizes the premise of cross-modal retrieval without data-dependent classification information. Label Embedding Online Hashing (LEMON)  builds an embedding framework that maintains the similarity of labels, aiming to use tag embedding and similar information between data to alleviate the sensitivity of the new data stream to the system and generate effective binary codes. OLSH (Yao et al., 2019) maps the label information of the new multimodal data stream to the semantic common space, and measures the semantic distance of the sample data more accurately in this space to achieve efficient cross-modal retrieval. Symbol Meaning

Dimension of features k
Length of hash code c Number of category

Notations and problem formulation
Suppose we have n training samples X t ∈ R d×n at the t-th round, where d represents the dimension of the sample feature vector and n is the number of samples. The goal of hashing method is to learn the hash codes preserving their similarities in the original space, with k as the binary code length. Without loss of generality, we assume that all the training samples X labels L = [l 1 , l 2 , · · · , l n ] ∈ R c×n , where c is the total number of categories and l i ∈ {1, 0} c × 1 is the label vector of x i , Specifically, l ij = 1 if x i belongs to the category l j and l ij = 0 otherwise. In the online hashing, dataset X t cannot be available once for all. Given n training samples X t = [X t e , X t s ] ∈ R d×n , where X t s ∈ R d×nt is the new stream at the t-th round, and the corresponding label is denoted as ] ∈ R k×(n−nt ) as the hash code for new stream X t s and existing dataset X t e . The update of the corresponding hash function at the t-th round can be rewritten as where W t = [w t 1 , w t 2 , · · · , w t k ] ∈ R d×k is the linear projection matrix transforming sample data onto k-dimension binary space at the t-th round, and sgn( · ) is defined as For the convenience of later description, we will summarize the symbols used in paper in Table 1, and explain in detail when encountered in the following.

Proposed method
(i) Similarity carving: For the supervised hashing method, it is a very effective strategy to use the label information of the samples to perform similarity learning to generate a highly recognizable hash code. Naturally, this goal can be achieved through several optimization functions, and the general method is the inner product of the binary code to approximate the similarity matrix with Frobenius norm (Luo et al., 2018;Mandal et al., 2019). Specifically, it is defined as follows: (2) In order to better characterize the similarities in the Hamming space between existing data X t e and new data streams X t s at the tth round, the following objective function is proposed (Lin et al., 2019): where S t is obtained from the inner product of the label matrices corresponding to the existing data X t e and the new data stream X t s , and (ii) Label matrix embedding: Equation (3) accurately measures the quantization error between the similarity matrix and the inner product of binary codes; it clearly states that if the samples x i and x j are similar in the original space, then they should have similar binary codes, and vice versa. Limited by the characteristics of online hashing methods that carry out retrieval tasks in the way of streaming data, this measurement error method has derived a phenomenon, i.e. "data imbalance"; i.e. the similarity matrix S t is highly sparse under online setting and thus tends to generate consistent binary codes, which are indiscriminate and uninformative. More details can be referred to literature (Lin et al., 2019). Data imbalance makes the error quantification in equation (3) insufficient, which greatly reduces the performance of subsequent retrieval tasks. To solve this issue, the author introduced a balance parameter η s , η d , which effectively alleviated the adverse effect of this phenomenon on the optimization model. However, although the pairwise similarity in the similarity matrix S can be achieved by label vectors, it does not effectively describe the degree of pairwise similarity. Therefore, we introduce the coefficient matrix embedding the label matrix into the binary code space corresponding to the existing data (Gui et al., 2018;Luo et al., 2018) where G is a projection from L t e to B t e . Combining equations (3) and (4), we obtain where θ t is a balance parameter at the t-th round.
Downloaded from https://academic.oup.com/jcde/article/8/5/1391/6374686 by guest on 26 September 2021 Our establishment of such a model has the following two advantages: First of all, the label matrix is projected into the binary code space, which better alleviates the data imbalance phenomenon and avoids artificially setting similarity parameters (Da et al., 2017); in addition, we use the projection matrix G to project the label matrix L t e of the existing data to its binary code matrix B t e , which not only avoids the optimization problem of the existing data binary code, but also uses the realvalued matrix GL t e to capture more voice information than B t e , ensuring acceptable information loss during the similarity preserving. Therefore, the occurrence of data imbalance can be alleviated.
It is worth noting that literature (Luo et al., 2018) also uses the method of replacing binary codes with intermediate variable functions, but they are aimed at offline image retrieval tasks in a fixed dataset. We are committed to solving the problem of unaffordable large-scale image retrieval time caused by the accumulation of existing datasets in online retrieval tasks.
(iii) Hash functions learning: Hash function maps the samples in the original space to the binary code space, as defined by equation (1). The goal of this section is to learn the hash function by minimizing the quantization error between the new data stream X t s and its corresponding binary code B t s , which is also applicable to existing data samples.
We obtain where σ t and λ t are balance parameters at the t-th round. However, due to the existence of the symbolic function in equation (1), the learning of the hash function becomes a NPhard problem. In order to effectively solve this complexity problem, we use the linear function (W t ) T X to approximate the symbolic function in equation (1), and the modified model is as follows: (iv) Overall objective function: Putting equations (5) and (7) together, the overall objective function can be rewritten as where η t is a regularization parameter for the t-th round.
We note that the proposed SSOH differs from FSSH (Luo et al., 2018) as: (i) SSOH is a category of online image retrieval, while FSSH is the retrieval task of offline images. There is an essential difference between them. (ii) In the process of achieving similarity measurement, SSOH uses the inner product between the binary code corresponding to the new data stream and the label projection to complete, which better protects the similarity information between the data, while FSSH is realized by the inner product between the anchor point information and the label projection.

Alternating optimization
Due to the discrete restriction of the hash function in equation (8), it is difficult to optimize the objective function directly. It is worth mentioning that the optimization function is nonconvex for all variables. Fortunately, the objective function in equation (8) is convex for each variable with fixing other variables; thus, we can use the alternating iterative algorithm to transform the objective function into several subproblems that optimize a single variable at the t-th round to obtain the optimal solution. The optimization process is as follows: W t step: When B t s , B t e , G are fixed, the objective function equation (8) degenerates to Setting the derivative of equation (9) w.r.t. W t to zero, we can get the closed-form solution of W t where I is a d × d identity matrix. G step: When B t s , B t e , W t are fixed, and we rewrite equation (8) as follows: Similarly, we update G with a close-formed solution by setting the derivative of equation (11) w.r.t. G equal to zero, so we have where I ∈ R k×k is the identity matrix. B t e step: When B t s , W t , G are fixed, the optimization of equation (8) is formulated as follows: Then, we rewrite equation (13) as follows: where Tr( · ) is the trace norm. By removing the constant terms, the objective function in equation (14) degenerates to Thus, B t e can be solved with a closed-form solution as follows:  (8) can be formulated as By expanding each term in equation (17), the above optimization problem degenerates to By omitting the constant term, the optimization problem of equation (18) where Q = σ t (W t ) T X t s + kGL t e (S t ) T . It is a NP-hard optimization problem and many existing methods solve it by relaxing the binary constraint, resulting in larger quantization errors. In order to solve this problem, we adopt a scheme to directly learn B t s . Specifically, we can observe that when other rows are fixed, it has a closed solution for each row. In addition, as shown in Shen et al. (2015), the author proposed a discrete cyclic coordinate descent method to solve the binary constraint. Inspired by this, we also adopt the scheme to update the hash code bit by bit.
Concretely, we let A t = GL t e , and denoteā t r ,b t sr , andq r as the r-th row of A t , B t s , and Q, respectively; suppose thatĀ t r ,B t sr , and Q r are the matrices of A t , B t s , and Q excludingā t r ,b t sr , andq r , respectively. We can have Similarly, the second term in equation (19) can be rewritten as follows: Substituting equations (20) and (21) into equation (19), we obtain Thus, B t s can be solved as Through the four steps above, we update B t s , B t e , G, and W t one by one. The above iterative process is repeated until the objective function converges. The overall procedure of SSOH is summarized in Algorithm 1.

Algorithm 1 Scalable Supervised Online Hashing (SSOH)
Input:feature matrices X with its labels L, the hash code length k, the balance parameters θ t , σ t , λ t , η t , and the totalnumber of streaming data batches T .

Computation complexity
In this section, we analyse the computational complexity to verify the effectiveness of the proposed method. The time cost of SSOH proposed in Algorithm 1 consists of three parts: (i) optimizing the hash projection W t , (ii) the projection matrix G, and (iii) optimizing the binary codes B t e and B t s . We iteratively update each variable in sequence as shown in Algorithm 1 until the preset conditions are met, and the time complexity of parameters W t , G, B t e , and B t s can be obtained as O(d 3 + d 2 n t + dkn t ), O(kn t c + c 2 n + knc + c 3 ), O(kcn), and O(k 2 cn t ), respectively. Therefore, with K iterations for convergence, the total computation complexity of SSOH is O(K (d 3 + d 2 n t + dkn + c 2 n + k 2 cn t )). Given that c, k, and n t are smaller compared to n, the complexity of Algorithm 1 is donated by the dataset size. Obviously, it is scalable to largescale data.

Experiments
To verify the effectiveness and rationality of the proposed method, we apply the method to three most widely used datasets, namely CIFAR-10 ( Krizhevsky et al., 2009), MNIST (Le-Cun et al., 1998), and NUS WIDE (Chua et al., 2009), and compare it with several state-of-the-art methods. We first introduce the details of the three datasets, the evaluation criteria, the proposed comparison method, and the parameter settings in the experiment. Next, we give a comparison of the experimental results and discussion. Finally, the problems of convergence, computational efficiency, and parameter sensitivity of SSOH are further discussed.

Datasets
First, we give the statistics of these datasets and a brief description of each dataset as follows: CIFAR-10: The dataset has a total of 60K color images, divided into 10 categories, with 6K images per category. Each sample is composed of 4096-dimensional vectors. We randomly select 1K instances as the test set, and use the remaining 59K samples as the training set, as (Lin et al., 2019) did. In addition, we take 2K image samples in the training set as the new data stream to train hash function.
MNIST: The dataset contains 60K samples. In MNIST, each picture is a 0−9 handwritten digital picture of 28 × 28 pixels, each pixel has 8 bits, and we expand the 28 × 28 pixels to one-dimensional column vectors. We then randomly select 100 instances from each category as the test set, and the rest as the training set. In addition, we extract 20K instances in the training set as a new data stream to train hash function.
NUS WIDE: It is a large dataset with a total of 81 categories and 269 648 instances; following Lin et al. (2020b), we selected the 10 most common categories and obtained a subset of 186 577 instances. Each image is represented by a 500-dimensional visual word SIFT histogram image feature vector. We randomly select 2K images from this subset as the test set, and the remaining images as the training set; from the training set, we randomly select 4K image samples to update the hash function.

Evaluation criteria
In order to fairly evaluate the superiority of the method, we utilize the following common indicators: mAP, Precision@H2, the curves of mAP value under different sizes of training instances,   and precision-recall to measure and evaluate retrieval performance. We uniformly perform the above evaluation criteria on hash codes with varying length: 8 bits, 16 bits, 32 bits, 48 bits, 64 bits, and 128 bits. The specific experimental parameter settings are shown in Table 2.

Baseline methods and settings
To verify the efficiency of our proposed method SSOH, we compare the proposed method with several state-of-the-art online hash methods that are summarized as follows: OKH: Online Kernel Hashing (Huang et al., 2013) introduces a penalty function for model update, thereby retaining the vital information obtained in the previous study.
AdaptHash: Adaptive Hashing (Ç akir & Sclaroff, 2015) uses the online learning algorithm of stochastic gradient descent to update the hash function in an iterative manner with streaming data.
OSH: In order to adapt to the new changes in the number and structure of existing datasets, OSH (Ç akir et al., 2017a) allows the category of the new data stream to be inconsistent with the existing data category.
MIHash: Online Hashing with Mutual Information (Ç akir et al., 2017b) proposed the synchronization of binary code and hash function updates, and accelerated the update speed of hash tables based on mutual information.
HCOH: Supervised Online Hashing via Hadamard Codebook Learning (Lin et al., 2018) proposes a method of using LSH to process randomly generated binary codes to ensure semantic similarity.
BSODH: Towards Optimal Discrete Online Hashing with Balanced Similarity (Lin et al., 2019) uses asymmetric graph regular-ization to maintain the similarity between streaming data and existing datasets.
SPLH: Similarity Preserving Linkage Hashing (Lin et al., 2020a) establishes a public space for linking binary hash codes and sample labels, and effectively measures the error metric of the semantic distance between binary hash codes and sample label vectors.
Our model is implemented with Matlab. All the experiments are conducted on a standard workstation with a 3.4 GHz Intel Core I7-6700 CPU and 16 GB RAM.

Results and discussions
(i) mAP results: The SSOH method is competitive and far superior to existing online retrieval methods with the best performance. Specifically, this method has dramatically surpassed BSODH and HCOH, which is one of the state-of-the-art methods of online retrieval methods. It is worth mentioning that from Tables 3-5, it can be easily found that SSOH still shows better retrieval performance when the length of the hash code is low. Specifically, we apply all methods to the dataset CIFAR-10, and record and compared the performance of each method under different hash code lengths, as shown in Table 3 (the optimal results are shown in boldface, and the suboptimal results are shown in ttface). Similarly, Tables 4 and 5 list the mAP detection results on MNIST and NUS WIDE with various code lengths, respectively. As shown in Table 3, SSOH significantly outperforms other baseline methods on the same dataset in most cases. Specifically, on CIFAR-10, the retrieval result mAP of SSOH is improved by 3.19%, 3.85%, 2.30%, 3.36%, and 5.09%, corresponding to the hash code length of 16 bits, 32 bits, 48 bits, 64 bits, and 128 bits, compared with the suboptimal method (we use Identically, our method still shows satisfactory results on other datasets under different hash codes. For example, when conducting experiments on the NUS WIDE dataset, for the hash code retrieval lengths of 8 bits, 16 bits, 32 bits, 48 bits, 64 bits, and 128 bits, the performance improvements obtained by SSOH with regard to the suboptimal method are 2.61%, 3.23%, 3.90%, 8.51%, 7.06%, and 6.91%, respectively. (ii) Precision@H2: The performance results of Precision@H2 that we evaluated on these three benchmark datasets are also recorded in Tables 3-5. Similar to the mAP results discussed above, the performance of SSOH is optimal on almost all datasets. For example, on the MNIST dataset, compared with other suboptimal methods for retrieval results, in the 16-bit, 32bit, and 48-bit methods, our proposed method achieves 4.56%, 4.49%, and 1.60% performance improvement, respectively. On the other two datasets CIFAR-10 and NUS WIDE, these results once again prove the performance improvement achieved by SSOH in this evaluation index. Tables 2-4 show a ubiquitous phenomenon: As the length of the hash code increases, for almost all retrieval methods, the performance of Precision@H2 will show a trend of increasing to decreasing. It should be noted that the short hash code is limited by the code length and can only encode limited semantic information. Relatively speaking, in the long hash code space, the number of hash buckets grows rapidly, resulting in a wider search range, and it is impossible to learn codes with high recognition in a short time.
(iii) Impact of training data scale: The salient feature of online hash retrieval is the retrieval of newly arrived data streams, and the update of the hash function depends on the update of the projection matrix W t . Therefore, we focus on examining the performance variation trend of the entire model at different update stages. We test the relationship between mAP and training data size on three different datasets, and plot them in Figs 2, 3, and 4. On the datasets CIFAR-10 and MNIST, the size of the training dataset changes from 2K to 20K, and the step size is 2K. Considering that NUS WIDE has a large data volume, the size of the training set increases from 5K to 100K, with a step size of 5K.
As can be seen in Figs 5-7, the performance of all the retrieval methods changes with the size of training data, and the mAP value will oscillate to a certain extent. Only SSOH exhibits a relatively stable trend and exhibits well robust performance, and compared with other methods, the retrieval performance is the best. Similarly, although all benchmark methods have shown well performance on datasets MNIST and NUS WIDE, the performance of SSOH is more prominent.
In addition, the detection results on the MNIST dataset show that SSOH has good generalization ability; that is, a good retrieval performance can be obtained through a small number of training sets. To illustrate this, we take 32 bits as an example. When the training set is 2K, the retrieval effect of SSOH is already as high as 0.575, but for the suboptimal method MIHash, it is only 0.211. As the dataset samples increase to 20K, BSODH is 0.586, while SSOH is as high as 0.724. This also shows that compared with other methods, our method can achieve similar retrieval results with smaller data sample size. In order to further verify the effectiveness of SSOH, on the CIFAR-10 dataset, we plot the precision-recall curve in Fig. 8. Similarly to the results shown in Table 2, no matter how the length of the hash code changes, our proposed method is superior to other comparison methods.

Parameter sensitivity
In this part, we discuss the sensitivity analysis of parameters with the hash code length of 32 bits involved in the model on the CIFAR-10 dataset.This part mainly focuses on the sensitivity analysis of the balance parameters σ t , θ t , and λ t and avoid overfitting parameters η t . To study the influence of each parameter involved in our model, we fix the other parameters and only al-low one parameter to change each time. We mainly explore the effect of parameter changes on mAP under 32 bits.
(i) The influence of parameters θ t , σ t , and λ t : The influence of balance parameters θ t , σ t , and λ t on the system, as shown in Fig. 9a, b, and c. The best result is achieved within [0.6, 0.8] for θ t . σ t and λ t control the quantization error between the binary code and the hash function for the existing data and new data stream in equation (8), respectively. The suitable parameter interval is [0.6, 0.8] and [5,6] for σ t and λ t , respectively.
(ii) Influence of η t : η t is the coefficient of the regular term, mainly used to prevent overfitting. Fig. 9d describes the mAP value of the model under various η t , which confirms that SSOH is insensitive to η t . We set η t as 0.7 in the experience.

Training efficiency
To verify whether the proposed method can be competent for large-scale image retrieval tasks, we further evaluate the superiority of the model in terms of training efficiency. For convenience, we conduct experiments when the hash code length is 32 bits, and similar conclusions can be drawn with other code lengths. We show the time consumption of the benchmark method and SSOH in Table 6. It is easy to find that the retrieval efficiency of OKH, HCOH, SPLH, and SSOH is more prominent than that of MIHash and OSH. Combining Tables 3, 4, and 5, we can see that SSOH can find a good compromise between effectiveness and efficiency, taking into account retrieval efficiency and retrieval performance. This also shows that SSOH can be applied to large-scale image retrieval.

Convergence of B t s
At the t-th round, when new streaming data arrive, B t s will be updated based on an iterative process, as shown in Algorithm 1. Fig. 10 shows the convergence of CIFAR-10 for streaming input data at the t-th round. It can be seen that when t = 1, SSOH only   1 2 3 4 1 2 3 4 1 2 3 4 1 2 3   needs four iterations to converge. In addition, when t > 1, it only takes at most three iterations to update B t s before convergence, which proves the efficiency and effectiveness of the variable update in our method.

Visualization results
In this part, we randomly select an image (car) from the CIFAR-10 dataset as the query object, and show the first 10 images retrieved by our method and the baseline method. It can be seen from Fig. 11 that our method can retrieve all correct images based on the query image, while other methods have at least one failure case. It is not difficult to find that these failed retrieval cases usually have similar contours or color matching with the queried images, which also shows that these methods cannot effectively maintain the core information between the training images. Based on the randomness of the test results, it reflects the differences in the retrieval results of each method to a certain extent.

CONCLUSION
In this paper, we propose a scalable supervised online hashing for image retrieval. The method aims to solve the problem of data imbalance caused by the addition of new data streams and large-scale dataset retrieval problem. To this end, we introduce an intermediate variable function, which uses the existing data label vector projection to approximate its hash codes. This method not only achieves high efficiency for large-scale data retrieval, but also effectively alleviates the problem of poor retrieval performance caused by data imbalance. In addition, in order to solve the optimization problem of the model, we also propose an iterative optimization algorithm for model solving. Extensive experiments have proved the superiority of the method in terms of retrieval efficiency and model stability.