CSI acquisition in RIS-assisted mobile communication systems

ABSTRACT Reconfigurable intelligent surface- (RIS) assisted mobile communication is a promising technological paradigm thanks to the attractive advantages of low cost and flexible control of electromagnetic waves. However, the low-cost features of RISs entail some fundamental challenges to the acquisition of channel state information (CSI), which is essential for the optimal RIS design. To tackle this problem, there have been extensive studies on CSI acquisition in RIS-assisted mobile communication systems, from the perspective of architectural improvement as well as specific mathematical solutions. This article aims to overview the existing works on CSI acquisition in RIS-assisted mobile communication systems.


INTRODUCTION
The great success of multiantenna techniques in the last three generations of mobile communications makes us more clearly recognize the importance of exploring resources in the spatial domain. Future mobile systems will be equipped with even more antennas to further exploit the spatial degrees of freedom. However, large-scale antenna arrays will create formidable challenges from a cost and power consumption perspective. During recent years, reconfigurable intelligent surfaces (RISs), also known as programmable metasurfaces, have experienced a rapid development [1][2][3][4]. A RIS is a programmable surface composed of massive units whose electromagnetic responses can be artificially controlled in a real-time manner. It can be regarded as a reduced version of a large antenna array without signal transmission or reception modules, but with the advantages of low cost and low power consumption. Therefore, for the purposes of exploiting the spatial degrees of freedom and contributing to green communications, RISs have been introduced in mobile systems to produce a customized electromagnetic propagation environment and then improve the communication service quality.
In the early study of RIS-assisted mobile communications, a RIS is generally deployed in areas where users suffer from severe signal degradation in order to provide a complementary link to reflect an incident signal towards a desirable direction and to sustain the communication service. Compared with traditional relays, a RIS can greatly reduce the hardware cost and power consumption and simultaneously enable full duplex operation. Apart from communication recovery, a RIS can also improve the wireless channel by increasing the channel rank, and even enable emerging applications, such as localization and sensing [5][6][7][8]. The precondition of harvesting the gain provided by RIS is the acquisition of channel state information (CSI), especially in the RIS link. There have been extensive studies on CSI acquisition in RIS-assisted mobile systems. This article makes a review of the state of the art.
Our review is based on the following system model, as shown in Fig. 1. The mobile system works in the time division duplexing (TDD) mode. The base station (BS) is equipped with M antennas, serving K single-antenna users in the cell. Suppose that user k is at the cell edge or in the shadow of a building, suffering from severe signal degradation. Denote the channel between the BS and user k as d k ∈ C M ×1 . Then, d k is of poor quality and has weak power. In order to enhance the receiving signal strength at user k, a RIS is deployed between the BS and user k to provide an assistance link. The RIS is composed of N units, each with the ability to reflect the incident signal towards a desired direction by adjusting the phase of the signal. Denote the channel between the BS and RIS and the channel between the RIS and user k as H 2 ∈ C M ×N and h 1,k ∈ C N×1 , respectively. In the data transmission stage, the signal received at user k can be expressed as where P is the transmission power at the BS, = diag{v} is the phase shift matrix at the RIS, v = [e j θ 1 , . . . , e j θ N ] T satisfies θ n ∈ [0, 2π ], f k ∈ C M ×1 is the precoding vector to user k, x k is the signal sent to user k and n k is the additive complex Gaussian noise. In addition to the direct link, d T k , which is of poor quality, the RIS provides a controllable supplementary link h T 1,k H H 2 . The power of h T 1,k H H 2 can be significantly enhanced by properly adjusting , which can be realized when CSI is available.
In the context of CSI acquisition, this article starts from an introduction of the current categories of RISs, including passive RISs, hybrid RISs and active RISs. Based on the type of CSI to be acquired, implicit and explicit CSI acquisition methods are summarized, respectively. For explicit CSI especially, the methods to separate the direct link and the RIS link channels are first reviewed, involving high-overhead linear cascaded channel estimation in the RIS link. The channel features that can help reduce the cascaded channel estimation overhead are then articulated. Afterwards, the methods to estimate individual channels in the RIS link for different categories of RISs are reviewed. Then, the widely used channel estimation algorithms for explicit CSI acquisition are summarized. Finally, we present two open problems that will arise with the occurrence of new architectures and application scenarios.
Notation.-Letters in normal, lowercase bold and uppercase bold fonts are used for scalars, vectors and matrices, respectively. The superscripts ( · ) T , ( · ) H and ( · ) † return the transpose, conjugate transpose, and pseudo-inverse, respectively; | · | and · return the absolute value and the modulus, respectively; E{·} denotes taking the expectation. The notation · represents rounding up the value; [ · ] m, : , [ · ] :, n and [ · ] m, n extract the mth row, the nth column and the (m, n)th entry of a matrix, respectively. Finally, '•' returns the column-wise Kronecker product.

CATEGORIES OF RISs
During the last few years, a variety of RISs have appeared, each with its unique structure and functionalities. In this section, we introduce the existing categories of RISs and make a comparison among them. It should be clarified that the definitions of passive, hybrid and active RISs in this article are determined on the basis of existing RISs that have been published in the literature. With the emergence of new RIS architectures, the definitions and categories of RISs will vary.

Passive RIS
We say that a RIS is passive when it is equipped with neither a power amplifier (PA), low noise amplifier nor signal transmission or reception radio-frequency (RF) chains. Here, passive does not mean that no input power is required by the RIS. A low-level external voltage is still needed to control the phase shift for signal reflection.
The most widely used passive RIS is the reflective-only passive RIS shown in Fig. 2(a) [9][10][11][12][13][14][15][16][17]. It is usually located on the roof or facade of a building, reflecting signals in front of it. Thus, the reflective-only passive RIS is also named the intelligent reflecting surface (IRS) [11,12]. When an IRS is introduced to assist the mobile communication, the individual channels H 2 and h 1, k are cascaded together through . The effective channel at the IRS link is H 2 h 1, k . Since signal processing modules are absent at the IRS, we can only estimate the channels at the BS or user side.
Apart from IRS, some passive RISs have the ability to refract the incident signal, including the refractive-only passive RISs and the simultaneous refractive and reflective passive RISs. A refractiveonly passive RIS is also known as a reconfigurable refraction surface (RRS) [18,19]. It is generally located at the BS, acting as a phased antenna array. A RRS refracts the signal transmitted from the feed at the BS and controls the refraction direction of the signal by adjusting its phase. Different from the IRS-assisted system, where the IRS provides an additional link apart from the direct link, in a RRSassisted system, only the BS-RRS-user link exists.
A passive RIS offering both refraction and reflection [20][21][22] is referred to as an intelligent omnidirection surface (IOS) or a simultaneously transmitting and reflecting RIS. In an ideal scenario, an IOS can replace window glass to enable signal reflection in front of the window and signal refraction across the window. Then, two users that are at different sides of the IOS can be served at the same time. Channel estimation in RRS and IOS-assisted systems is still in its infancy, and is thus beyond the scope of this review paper.

Hybrid RIS
To address the challenge of channel estimation in passive RIS-assisted mobile systems caused by the absence of signal processing capabilities, a semipassive RIS, also known as a hybrid active and passive RIS, has been proposed in [23][24][25][26][27][28]. In this paper, we use a hybrid RIS for the simplification of expression. Hybrid RISs are equipped with signal reception or even transmitting RF chains, which can be connected with active sensors.
Each active sensor has two modes, including the antenna mode and the reflection mode. In the antenna mode, active sensors can receive pilots sent from the BS and users or even transmit pilots to them. Thus, channel estimation is supported at the RIS side. Denote by ρ n ∈ [0, 1] the power ratio of the received signal over the incident signal on the nth unit; ρ n = 0 and 1 represent the scenarios of the incident signal being completely reflected and received, respectively. When 0 < ρ n < 1, signal reflec-tion and reception happen simultaneously. The practical value range of ρ n is determined by the hardware structure of the hybrid RIS.
In a hybrid RIS, a few or even all passive RIS units can be replaced by active sensors. Assume that N a out of N units are selected and replaced by active sensors, and that the number of signal reception RF chains is N RF . Generally, we have N RF ≤ N a ≤ N. Thus, a hybrid RIS can perform channel estimation and also maintains the RIS's advantages of low hardware cost and low power consumption.
Based on the connection type between RF chains and active sensors, we further divide hybrid RISs into two sub-categories. One is the selection-type hybrid RIS, in which at one time instance, each RF chain is connected with only one active sensor, as shown in Fig. 2(b). If N RF < N a then a switch is further needed to make a selection between a RF chain and its possibly connected active sensor. The other is the beamforming-type hybrid RIS, in which each RF chain is connected with multiple or even all active sensors simultaneously. Analog beamforming is performed before signals received by the active sensors are combined at the RF chains, as illustrated in Fig. 2(c). Figure 2(d) illustrates an active RIS, whose units are all active. Note that an active unit is different from an active sensor. An active unit has only one mode, i.e. the reflection mode. No RF chain is connected with an active unit. It should be connected with a PA to enhance the incident signal power [29][30][31][32]. The motivation to introduce active RISs is to overcome the multiplicative pathloss in the BS-RIS and RISuser channels. However, an active RIS still works solely in the reflection mode and it is not equipped with signal reception RF chains. It cannot transmit or receive signals, and thus does not support channel estimation at the RIS side. Channel estimation in an active RIS-assisted system is the same as that in an IRS-assisted system.

Active RIS
No matter which structure the RIS has, in the data transmission stage, the RIS is switched to the reflection and/or refraction mode and improves the mobile communication quality by adjusting the signal phase. RIS phase shifting operates like analog beamforming in millimeter wave hybrid beamforming systems. In order to optimize the phase shift, acquisition of implicit or explicit CSI at the RIS is indispensable.

IMPLICIT CSI ACQUISITION
If the design of the RIS phase shift in the data transmission stage does not rely on the explicit CSI, i.e. the full channel h 1, k or H 2 or their cascaded version, then we can simply acquire the implicit CSI. Implicit CSI reflects the channel condition. It can be a beam index, a discrete phase index, etc.

Beam training
The original motivation to introduce RISs is to enhance the total channel power. Recalling Equation (1), the optimal , or, equivalently, the optimal v, should maximize d T k + h T 1,k H T 2 2 . Following the beam training approach in hybrid beamforming systems, a predefined RIS phase shift codebook can be utilized, and the optimal RIS beam can be selected after beam sweeping [33][34][35][36][37].
To be specific, define C = {cn }n =1,...,N as the codebook for v, whereN is the number of RIS beams and cn ∈ C N×1 is thenth RIS beam. Take the uplink as an example. Within the channel coherence time, the pilot model corresponding to thenth RIS beam can be expressed as where Yn ∈ C M ×K is the pilot matrix received by the BS antennas, K is number of users as well as the length of the pilot sequence, P k is the transmit power of user k, s k ∈ C K ×1 is the pilot sequence from user k, satisfying s k 2 = 1 and s H k s j = 0 for k = j and Zn ∈ C M ×K is the complex Gaussian noise matrix with independent and identically distributed entries. By multiplying Equation (2) with s k , we can extract the pilot component from user k as Yn s k The optimal RIS beam for user k with indexn * k is the one that maximizes the pilot reception power: Then, cn * k is selected as the RIS phase shift vector for user k in the data transmission phase andn * k is the implicit CSI we need to acquire.
Beam training is an efficient way to find a proper RIS phase shift design that can enhance the total channel quality. The computational complexity of beam training is O(K MN), much lower than that of explicit CSI estimation. Most importantly, we do not need to separate the direct link channel and the RIS link channel, and this method is not sensitive to the hardware impairments, such as imperfect phase shift and uncontrollable amplitude variations. How-ever, the performance of beam training relies heavily on the codebook. On the one hand, a fine codebook that can seamlessly cover the whole service region of RIS is extremely powerful. On the other hand, the codebook that satisfies this requirement usually has a large sizeN, resulting in a high beam training overhead of KN for K users.
The RIS codebook contains information of the spatial directions. If the user is equipped with multiple antennas as well, and only the line-of-sight (LoS) component exists in the RIS link, then the position of the user can be found with the beam training result [38,39].

Blind beamforming
Apart from a beam index, when the RIS phase shift has limited resolution, a discrete phase shift index can be the implicit CSI we aim to acquire [40][41][42]. Denote the number of quantization bits of each RIS unit as B. A total of 2 B discrete phase shift values, denoted¯ = {θ 1 , . . . ,θ 2 B }, can be constructed. When h 1, k and H 2 are unknown to the RIS, a blind beamforming method is proposed to determine θ 1 , ..., θ N in Equation (1) from¯ one by one. Similar to beam training above, a set ofN RIS beams, also denoted by C here, is predefined. The difference is that random beams are utilized here, which means that during the beam sweeping phase, θ 1 , ..., θ N are independently and randomly generated from¯ . If N is large then, for the nth RIS unit,θ 1 , . . . ,θ 2 B are selected averagely. Take the selection of θ n as an example. There are nearlyN/2 B beams whose nth entry isθ 1 , similar toθ 2 , . . . ,θ 2 B . Collect all the RIS beams whose nth entry isθ b and calculate the average pilot reception power at the BS as Search for b * n that achieves the highest pilot reception power at the BS: Then, the discrete phase indexθ b * n is chosen as the phase shift profile of the nth RIS unit. This is the conditional sample mean-based blind beamforming proposed in [41].
The blind beamforming method has an even lower requirement on the codebook, because a randomly generated codebook is acceptable. Similar to beam training, blind beamforming requires a training overhead of KN, and is not sensitive to hardware impairments. Its computational complexity is O(K M NN2 B ), and is thus higher than that of beam training. Blind beamforming has been experimentally verified to work well when N and B are small [41,42]. However, in order to guarantee that a proper b * n can be selected for θ n , the number of generated random samples or the size of the codebook should still be large. With the user's movement, blind beamforming may return outdated results, and the whole blind beamforming procedure should run again. Therefore, this method also suffers from the burden of high training overhead.
Implicit CSI acquisition is applicable to all categories of RISs. However, among them, IRS and active RIS-assisted mobile systems are more preferable to acquire implicit CSI since they are not required to have signal reception or processing capabilities. Furthermore, implicit CSI acquisition does not rely heavily on the RIS hardware characteristics. When the RIS has low cost and the hardware impairments cannot be ignored, it is preferable to directly search for a proper RIS phase shift design without estimating the explicit CSI.

DOUBLE LINK CHANNEL SEPARATION
With the development of RIS manufacturing technology, the latest RISs have greatly improved hardware profile, resulting in the enhancement of explicit CSI acquisition robustness. Therefore, more attention has been paid to the estimation of explicit CSI, which refers to h 1, k , H 2 and their cascaded version. However, pilots in the direct link and in the RIS link are combined together at the receiver, making it difficult to discriminate them. In this section, we focus on the separation of channels in the two links. Commonly used RISs have been verified to offer reciprocity between uplink and downlink channels in TDD systems through experiments [43]. Considering that uplink training requires less pilot overhead than downlink training, we focus on the uplink when we study double link channel separation and the following channel estimation methods.

ON/OFF RIS
An ON/OFF RIS is capable of reflecting and absorbing the incident wave, corresponding to the ON and OFF states, respectively [44,45]. When all RIS units are turned OFF, signals arriving at the RIS will be absorbed instead of being reflected, and, equivalently, we have = 0.
the RIS phase shift vector v can be separated from the channel H 2 diag(h 1, k ). We define  (7), the RIS phase shift amount v has been separated from H k . Thus, with the estimate of G k , a proper v for data transmission can be designed. The ON/OFF channel separation method requires N + 1 sets of RIS phase shift configurations. Note that a low-cost RIS usually has a large size, resulting in a large N. Furthermore, when K users exist, the total overhead, involving the length of pilot sequence s k as well, should be K(N + 1). In other words, a long training period is required.

ON-only RIS
When the absorption state is not supported by the RIS, i.e. the RIS only has the ON state, the RIS link pilot component always exists in y k . In order to separate the channels in the two links, special RIS phase shift designs should be applied [46][47][48][49][50]. In the training phase, assume that the tth RIS phase shift vector is v t ∈ C N×1 . Then, by rewriting Equation (3), we get the uplink received pilot model from user k corresponding to v t as Since the RIS phase shift vector v t has been separated from G k and can be known at the BS, we regard v t as the pilot signal too. By stacking the pilot signals over T RIS phase shift vectors together into a matrix, we obtain Given Equation (10), we have two approaches to separate d k and G k . One is to mathematically turn OFF the direct link. We now seek to find a matrix V that satisfies Then, G k can be obtained by calculating Y k V † . The other approach is to directly estimate the channels in the two links. Definē Then,Ḡ k can be estimated through the widely used linear channel estimation methods, including leastsquare (LS) and linear minimum mean square error (LMMSE) schemes, ifV has full rank. A typical instance ofV that can be used in both approaches is an (N + 1)-dimensional discrete Fourier transformation (DFT) matrix, whose entries in the first row are all one. However, T = N + 1 is still required, indicating that the same amount of training overhead as using ON/OFF RIS is required. The computational complexity whenV is a DFT matrix is O (K M N).
The double link channel separation methods illustrated above are applicable to all RISs, especially IRSs and active RISs, but are not necessary to hybrid RISs. This is because a hybrid RIS has the ability to estimate the RIS-link channel and then separate the channels into two links with a greatly reduced pilot overhead, which is discussed in the subsequent section entitled 'Individual channel estimation'.

CASCADED CHANNEL ESTIMATION OVERHEAD REDUCTION
Because of the huge training overhead resulting from the linear estimation of the large-dimensional channel G k , we need to seek channel features for overhead reduction. Since d k and G k have already been separated, in this section, we only exploit the channel features in the RIS link that can help reduce the overhead for estimating the cascaded channel G k .

Scaling law
According to the definition of G k in Equation (8), for two distinct users k and j, we have The scaling law between G k and G j can be found as follows [44,51,52]: That is, given G j , G k can be obtained with the Ndimensional scaling channelġ k j . This is because G k and G j share the common BS-RIS channel H 2 . Recalling Equation (9) and omitting the direct link component, we have Then,ġ k j can be linearly estimated given y k , G j and the known v t . By exploiting the scaling law, the cascaded channel of the principle user G 1 can be first estimated with the pilot overhead of N. Afterwards, the cascaded channels of other users G k are obtained by simply estimatingġ k1 , k = 2, . . . , K . Notably, in practical systems, we usually have M < N, causing a rank-deficient pseudo-inversion of G j . Under this condition, according to [44], a total of (K − 1)N/M instead of (K − 1)N pilots are needed to obtainġ k1 , k = 2, . . . , K . Therefore, the total pilot overhead of multiuser double link channel estimation is greatly reduced from K(N + 1) to N +

Channel sparsity
When the mobile system works in high frequencies, such as in the millimeter wave or even terahertz frequency bands, the channel experiences significant sparsity if the number of BS antennas or that of the RIS units grows large. Each individual channel is composed of a limited number of propagation paths, and each path can be described using a small amount of spatial parameters. Take the uniform linear array (ULA) as an example. Suppose that the RIS units are arranged in an ULA, and so are the BS antennas. When user k is in the far field of the RIS, h 1, k can be expressed as [53] h 1,k = where L 1, k is the number of paths in h 1, k , satisfying L 1, k N, α l, k is the complex path gain of the lth path, φ l, k is the angle of arrival (AoA) of the lth path at RIS, while 16) is the steering vector of an N-element ULA, d is the distance between two adjacent ULA elements and λ is the carrier wavelength. Similarly, in the far-field condition, H 2 can be modeled as where L 2 is the number of paths in H 2 , satisfying L 2 {M, N}, β l is the complex path gain of the lth path, ψ l is the AoA of the lth path at the BS and ϕ l is the AoD of the lth path at the RIS. By applying Equations (15) and (17) in Equation (8), the cascaded channel has the structure where U M and U N are M-and N-dimensional DFT matrices. Given that L 1, k L 2 MN,G k is a sparse matrix with only a few entries having non-vanishing amplitude. Ignoring the direct link component, Equation (10) can be rewritten as By regarding V H U H M as a measurement matrix, the sparse channel matrixG k can be estimated through compressed sensing or deep learning, which requires greatly reduced training overhead [23,51,54,55]. Alternatively, given the structured channel model in Equation (18), G k can be reconstructed using the spatial parameters, including α l 1 ,k β l 2 , sin ψ l 2 and sin φ l 1 ,k + sin θ l 2 [56][57][58][59][60][61][62]. Then, the channel reconstruction problem is translated to a parameter estimation problem, which also requires only a small amount of training overhead. Denote the average sparsity ofG k , k = 1, . . . , K , asL, satisfyinḡ L N. The total overhead approximates K (1 + L), and the computational complexity is generally Furthermore, because the L 1, k L 2 paths in G k are stemming from L 1, k + L 2 practical paths in h 1, k and H 2 , the multiuser cascaded channelsG k , k = 1, . . . , K , hold common row sparsity [51]. Figure 3 illustrates the modulo of cascaded channel matrices of two users, i.e. |G 1 | and |G 2 |. The biggest proportion of power of |G 1 | is captured by the same rows that capture the major proportion of power of |G 2 |, demonstrating the common row sparsity be-tweenG 1 andG 2 . The common sparsity and the channel scaling both result from the common channel H 2 . Exploring the common sparsity among multiple users can further reduce the pilot overhead and the computational complexity.
The cascaded channel is usually estimated in IRS and active RIS-assisted mobile systems. Nevertheless, for a hybrid RIS, since individual channels can be estimated, it is not necessary to estimate the cascaded channel.

INDIVIDUAL CHANNEL ESTIMATION
Under some special channel conditions in an IRS or active RIS-assisted system, as well as in a hybrid RISassisted system, the individual channels H 2 and h 1, k can be separated.

IRS and active RIS
Since the IRS and the active RIS do not have signal processing capabilities, whilst the channel estimation can be performed only at the BS or user side, it is not easy to obtain the individual channels or separate them from the cascaded one. To be specific, recalling the scaling law in Equation (12), we may take H 2 diag(h 1, j ) andḣ k j to be H 2 and h 1, j , respectively, causing severe estimation errors. Moreover, from the multipath cascaded channel model in Equation (18), we see that the spatial angles that can be estimated by the BS are sin ψ l 2 and sin φ l 1 ,k + sin θ l 2 . The AoA at the BS sin ψ l 2 can be clearly distinguished. The AoA and AoD at the RIS are added together, forming sin φ l 1 ,k + sin θ l 2 . We cannot separate sin φ l 1 ,k and sin θ l 2 from their superposition without any prior information. Therefore, it is not easy to separate H 2 and h 1, k from G k .
However, if partial CSI is known in advance then it becomes possible to correctly separate H 2 and h 1, k . A LoS path usually exists between the BS and the RIS to guarantee a good performance of the latter. Recalling the channel model of H 2 in Equation (17), let the first path be the LoS path. The LoS path has the strongest power, satisfying |β 1 | ≥ |β l | for l > 1. The expression of β 1 can be further derived from the free-space propagation path loss model [63]. The angles of the LoS path, i.e. ψ 1 and θ 1 , are determined by the locations of the BS and the RIS and known in advance when the BS and the RIS are fixed [58,64,65]. Recall the example in Fig. 3. If sin ψ 1 and sin θ 1 are fixed then sin φ l 1 ,k can be determined from sin φ l 1 ,k + sin θ 1 for l 1 = 1, ..., L 1, k . Moreover, given β 1 , it is easy to derive α l 1 ,k from α l 1 ,k β 1 . That is, all the spatial parameters in h 1, k have been successfully obtained, and h 1, k can be reconstructed by applying α l 1 ,k , sin φ l 1 ,k , l 1 = 1, . . . , L 1,k , in Equation (15). Then, H 2 can be extracted from H 2 diag(h 1, k ) given h 1, k . Following the parameter estimation approach, in this condition, the total pilot overhead of estimating the double link channels approximates K (1 +L 1 ) + L 2 , whereL 1 is the average sparsity of h 1, k , k = 1, ..., K, satisfy-ingL 1 N. Hence, the computational complexity is about O(K M N 2L 1 + M 2 N 2 L 2 ). Another way to separate the individual channels in the RIS link is to perform the parallel factor (PARAFAC) decomposition [66,67], which belongs to multiway analysis. By stacking h 1, k , k = 1, ..., K, together into a matrix, we have H 1 = [h 1,1 , . . . , h 1,K ] ∈ C N×K . Given v t , the effective channel is defined as where A three-way matrix H eff ∈ C M ×K ×T can be formulated, which has the following forms: Given a random initial value, H 2 and H 1 can be iteratively and alternatively estimated from H 1 eff and H 2 eff , respectively, through the linear estimation algorithms, such as LS. However, the PARAFAC decomposition-based individual channel separation method requires min(M, K) ≥ N, which is hard to achieve in practice. According to [66], the training overhead is in the range of [K + 2, K + N], and the computational complexity is O (K M N 2 ).

Selection-type hybrid RIS
Hybrid RISs can make individual channel estimation easier to implement. For the selection-type hybrid RIS, we denote the channel between user k and the N a active sensors and the channel between user k and the N − N a passive units as h a,1,k ∈ C N a ×1 and h p,1,k ∈ C (N−N a )×1 , respectively. When active sensors choose the antenna mode, the signal received by the active sensors from user k can be expressed as y a,1,k = P k h a,1,k + z a,1,k .
Given y a, 1, k , we can directly estimate h a, 1, k . However, if the active sensors are fixed and satisfy N a N then h p, 1, k can be obtained only through extrapolation from h a, 1, k , which requires that the h p, 1, k are correlated with the h a, 1, k . When h 1, k satisfies the model in Equation (15), h a, 1, k and h p, 1, k share the same set of spatial parameters α l, k , sin φ l, k , l = 1, ..., L 1, k , and, thus, are highly correlated [23,25,68]. By applying these parameters in Equation (15), the full channel h 1, k is reconstructed. Alternatively, exploring the sparse structure or directly employing neural networks, h 1, k can be obtained from h a, 1, k as well.
The training overhead of acquiring h 1, k is K.
Similarly, H 2 can be reconstructed by the BS sending downlink pilots to the RIS. If the active sensors are further equipped with signal transmission RF chains then H 2 can also be reconstructed by the RIS sending uplink pilots to the BS. Notably, pilot sequences from different BS antennas or RIS active sensors should be orthogonal to each other, causing a training overhead of M or N a . The second way to obtain H 2 is to conduct estimation at the BS. Having estimated the cascaded channel G k , H 2 can be acquired given h 1, k . However, the estimation of G k needs a training overhead of K k=1L k , because the RIS should alter the phase shift profile. Take the estimation of both-side individual channels at the RIS as an instance. The total pilot overhead of acquiring H 2 , h 1, k and d k for k = 1, ..., K is 2K + M, and the computational complexity is

Beamforming-type hybrid RIS
For the beamforming-type hybrid RIS architecture, each RIS unit has the opportunity to be connected with a RF chain. Therefore, it is possible to make full channel estimation instead of channel extrapolation. However, since N RF N, an analog beam sweeping stage is required, similar to hybrid beamforming systems [27,28]. We still take the estimation of h 1, k as an example. At time instance t, when the uplink pilot from user k arrives at the RIS, we have where y b,1,k,t ∈ C N RF ×1 is the received pilot across the N RF RF chains, W t = [w 1,t , . . . , w N RF ,t ] T ∈ C N RF ×N contains N RF RIS reception beams, w m,t ∈ C N×1 is the RIS reception beam in the mth RF chain in time instance t, [w m,t ] n = √ ρ n,t e j ϑ m,n,t if the nth RIS unit is connected with the mth RF chain, where ρ n, t ∈ [0, 1] is the power ratio of the received signal over the incident signal and ϑ m, n, t ∈ [0, 2π ] is the reception phase shift, and [w m, t ] n = 0 otherwise. After collecting pilots in N/N RF time instances, the Ndimensional channel h 1, k can be linearly estimated.
If h 1, k is sparse and satisfies the multipath model in Equation (15) then the training overhead can be further reduced through compressed sensing or parameter estimation methods. Channel H 2 can be obtained at the RIS by using downlink pilots sent from the BS in similar approaches, and the pilot sequences from different BS antennas should be orthogonal to each other. Alternatively, we can also utilize the uplink pilot components reflected by the RIS and received by the BS, which in time instance t is mathematically expressed as where the nth entry of the reflection phase shift vector v t ∈ C N×1 satisfies [v t ] n = √ 1 − ρ n,t e j θ n,t . Then, H 2 can be estimated following the same approaches as that for the selection-type hybrid RISs. Similarly, take the linear estimation of both-side individual channels at RIS as an example, and suppose that W t is in the DFT format. The total pilot overhead of acquiring H 2 , h 1, k and d k for k = 1, ..., K is

Double time scales
In a typical RIS-assisted mobile communication system, the locations of the BS and the RIS are fixed. The BS is generally mounted at the top of a building, and the RIS is then deployed in the LoS region of the BS to guarantee a strong link between them. The environment between the BS and the RIS is relatively stable, and thus the coherence time of H 2 , denoted as T 2 , is long. On the other hand, the mobile user has a high probability to keep moving. The coherence time of h 1, k , denoted T 1, k , is much shorter than T 2 . Therefore, channel estimation can be performed in double time scales [69][70][71]. For longer T 2 , the estimation of H 2 only needs to be performed once. Within this period, the estimation of h 1, k is performed in every shorter period of T 1, k . Since the training overhead required by the estimation of h 1, k is limited when H 2 is given, the overall overhead of double time scale training can be very low.

CHANNEL ESTIMATION ALGORITHMS
A comprehensive comparison of the overhead and complexity of the above mentioned existing CSI acquisition methods is provided in Table 1. Moreover, Table 2 summarizes the applicability of these methods to different categories of RISs.
In previous sections, we briefly mentioned the algorithms that are used to estimate the channels or their parameters, including linear estimation, compressed sensing, parameter estimation and deep learning. This section will provide more details about them in the context of RIS-assisted mobile communication systems.

Linear estimation
Linear estimation is the most classic and widely used estimation method in practical systems [72]. The computational complexity is fixed and the estimation accuracy is stable. Linear estimation methods do not rely on channel features or structures like Equation (15) or (17). Referring back to Equation (10), which has the expression the LS estimate ofḠ k iŝ and the LMMSE estimate iŝ where RḠ k is the covariance matrix of the channel G k and σ 2 n is the variance of the noise. Notably,V should have full column rank, which requires a sufficient amount of pilot overhead [28]. Considering that LMMSE has much higher estimation accuracy than LS, it is more widely adopted for channel estimation in RIS-assisted systems [44,73,74].

Compressed sensing
When the channels are sparse, compressed sensing can be applied for low-overhead channel estimation. Recall the signal model for compressed sensing- Cascaded channel estimation Scaling law Individual channel estimation IRS/active RIS (LoS) based cascaded channel estimation in Equation (19) such thatG k is the sparse cascaded channel in the angular domain to be estimated. We say thatG k is in the angular domain becauseG k is built on the dictionaries U M and U N , which simultaneously and uniformly sample and draw grids on the angles. Each entry ofG k represents the channel component on the corresponding sampled angular pair on the twodimensional (2D) grid. Given Equation (19), the general problem to be solved by compressed sensing can be written as where is a predefined residual threshold. The problem requires using the smallest amount of entries inG k to capture the majority of information in the channel. Sparse channel representations and compressed sensing problems for hybrid RISs in Equations (24) and (25) can be expressed in a similar way. Since the sparsity, i.e. the number of distinct entries inG k , is unknown in advance, and in order to avoid the huge computational complexity caused by exhaustive search, a typical solution is the greedy algorithm such as orthogonal match pursuit (OMP) [23,75,76]. By inspecting the problem in Equation (30), other algorithms such as the alternat-ing direction method of multipliers [25,27,77] can also recover the sparse channel.
In a multiuser scenario, the scaling law as well as the common sparsity between G k and G j can be exploited in compressed sensing-based channel estimation [51,75,76]. By writing G j as we see that users share a common sparse baseG 1 , which can be recovered through the multiple measurement vector-based compressed sensing estimator for accuracy enhancement.

Super-resolution parameter estimation
Unlike compressed sensing, which describes the angles using a grid defined by the dictionary, super-resolution parameter estimation aims to obtain more precise angular estimates that are no longer limited by the grids. We still take the cascaded channel as an example. Let G k = G k ( k ), where k = [sin ψ 1 , . . . , sin ψ L 2 , sin φ 1,k + sin θ 1 , . . . , sin φ L 1 ,k + sin θ L 2 ] contains the angles to be estimated. We now rewrite Equation (10), by ignoring the direct link, as We seek to estimate k from Y H k . For example, maximum likelihood (ML) estimation finds the angles that satisfy [56,57] Apart from ML, classic and widely used superresolution parameter estimation algorithms include multiple signal classification, estimation of signal parameters via rotational invariance techniques, etc. [78][79][80]. Even though these algorithms have different objective functions, a super-resolution grid is generally required. This grid can be much denser than the grid in compressed sensing methods. By searching over the grid, one grid point that can satisfy the objective function is chosen as the estimate.
If the grid density is not high then a further step is conducted to refine the on-grid estimate towards the one that is closer to the real angle value. Under this condition, enhancement of the parameter estimation accuracy can be achieved on basis of the on-grid estimates obtained by compressed sensing. For example, a gradient descent step can be added after the grid-based matching in each iteration of OMP [59,61]. Alternatively, a Newton refinement step can also refine the on-grid angle estimates towards their real values [58]. Newton refinement integrated with OMP forms the Newtonized orthogonal matching pursuit algorithm [81], which has been widely applied in sparse channel reconstruction [82][83][84]. Another compressed sensing-based approach, which can overcome the on-grid effect, is atomic norm minimization [39,60,68,85], where the grid is composed of infinite cardinality.
Acquisition of the spatial parameters not only supports low-overhead sparse channel reconstruction, but enables further applications such as user or target localization [38,39,56,57].

Deep learning
Deep learning has experienced rapid development in recent years. When applied to channel estimation, deep learning does not heavily rely on a precise channel model like Equation (15), (17) or (18). In RISassisted systems, deep learning can be used to estimate the cascaded channel or the individual channels for both the IRS and hybrid RIS.
In an IRS-assisted system, deep learning is usually adopted to estimate the cascaded channel G k . By directly regarding the received pilots at the BS or user side as the input of the neural network, the estimate of G k can serve as the output of a convolutional neural network [86,87]. Alternatively, deep learning can be introduced to further enhance the accuracy of LS or LMMSE cascaded channel estimation results. By regarding it as a denoising problem, a deep residual network can be applied to fix this problem [88,89]. The denoising concept was also applied in [90]. Different from Liu et al. [88,89], Jin et al. [90] regarded the channel matrix as an image. Three practical residual neural networks were adopted to improve the cascaded channel estimation accuracy, while the network input was the received pilot matrix.
Deep learning-based channel estimation is also preferable in hybrid RIS-assisted systems. For the selection-type hybrid RIS especially, since pilots can be received by a small amount of active units, only the sampled channel on these units can be directly estimated. Then, the full channel can be extrapolated through a residual network. The input of the network is the sampled channel, and the output is the full channel [24]. The denoising approach can be further applied here. Considering that the accuracy of the full channel reconstruction from a few observations on the active sensors is limited, a denoising neural network can be employed to refine the channel reconstruction result [55].

OPEN PROBLEMS
From the review above, we see that extensive research has been carried out on CSI acquisition in traditional RIS-assisted mobile communication scenarios. In future mobile systems, the emergence of new architectures and new application scenarios of RISs entails new challenges. In this section, we discuss two open problems that have to be addressed in the context of CSI acquisition in future RIS-assisted systems.

Near-field effect
A RIS has the attractive advantages of low cost and low power consumption. Thus, we can increase the size of a RIS to further improve wireless communication services. However, as the RIS grows large, the user or the BS falls into the near field of the large RIS. Then, the near-field effect kicks in [64,[91][92][93][94]. Under this condition, in the multipath channel model, each path is described by the 2D or 3D position of the user, BS or scatterer, instead of their direction or angle as in the far-field channel. The near-field steering vector a N (r, θ) ∈ C N×1 is different from the farfield one, whose nth entry can be expressed as [a N (r, θ)] n = λ 4π D n (r l , θ l ) e j 2π D n (r l ,θ l )/λ , where (r, θ ) is the polar position of the user, BS or scatterer, and D n (r l , θ l ) is the distance between the user, BS or scatterer and the nth RIS unit. Take h 1, k as an example. The near-field channel h 1, k can be modeled as β l a N (r l , θ l ). (35) Note that the sparsity of a near-field channel is not always apparent under the DFT transformation. When the user is quite close to the RIS, a large angular spread can be observed even if only a single LoS path exists in h 1, k . Compressed sensing and super-resolution parameter estimation should be adjusted to cater for the near-field effect. In particular, the channel parameters to be estimated become (r l , θ l ), l = 1, ..., L 1, k . Moreover, by exploring these parameters, user localization can be achieved if the LoS path exists.

High-mobility scenarios
High-mobility wireless communications, including high-speed trains and unmanned aerial vehicles (UAVs) scenarios, have gained increasing attention over the past years. For the purpose of enhancing the coverage and providing seamless wireless services with low cost, the introduction of RISs have been considered in these high-mobility systems [95][96][97][98]. However, since trains and UAVs move fast, the environment and the channel vary rapidly. Wireless communication in high-mobility scenarios suffers from a severe Doppler effect and the channel aging problem [98]. In IRS-assisted systems especially, channel estimation in the RIS link becomes even more challenging. Hence, how to efficiently obtain CSI with limited pilot overhead is of great importance. Fortunately, the property of double time scales can also be exploited in high-mobility scenarios. Take the high-speed train scenario as an example. The RIS can be deployed on one side of the railway, coated on the window of the train, or on the wall or roof of the train [95,99]. If the location of the RIS is fixed then H 2 varies slowly and has a much longer channel coherence time than h 1, k and d k . On the other hand, if the RIS moves with a high-speed train then the coherence time of h 1, k becomes longer than that of H 2 and d k , which is different from the double time scales in the formal case. Equally important, temporal correlation exists among channels in continuous time slots [98]. CSI obtained in previous time slots can be utilized to track the fastvarying channel in time with reduced pilot overhead. Furthermore, the Doppler effect can be mitigated by leveraging the time-frequency space framework [100]. However, the fast handover in high-mobility scenarios is an issue that cannot be ignored, introducing challenges to CSI acquisition in RIS-assisted systems.

CONCLUSIONS
This article provided a comprehensive review of the state of the art on CSI acquisition in RISassisted mobile communication systems. Newly appeared categories of RISs, including selection-type and beamforming-type hybrid RISs as well as active RISs, were considered. By classifying CSI into implicit CSI and explicit CSI, we surveyed the acquisition of these two types of CSI. Acquisition of explicit CSI was studied via a step-by-step approach, from double link channel separation, lowoverhead cascaded channel estimation, individual channel estimation to the estimation algorithms. The review ended with two open problems in future RIS-assisted systems. We articulated that, with the emergence of more advanced RISs and algorithms, CSI acquisition in RIS-assisted mobile systems will experience new breakthroughs.