MmWave V2V Localization in MU-MIMO Hybrid Beamforming

- Recent trends for vehicular localization in millimetre-wave (mmWave) channels include employing a combination of parameters such as angle of arrival (AOA), angle of departure (AOD), and time of arrival (TOA) of the transmitted/received signals. These parameters are challenging to estimate, which along with the scattering and random nature of mmWave channels, and vehicle mobility lead to errors in localization. To circumvent these challenges, this paper proposes mmWave vehicular localization employing difference of arrival for time and frequency, with multiuser (MU) multiple-input-multiple-output (MIMO) hybrid beamforming; rather than relying on AOD/AOA/TOA estimates. The vehicular localization can exploit the number of vehicles present, as an increase in the number of vehicles reduces the Cramér-Rao bound (CRB) of error estimation. At 10 dB signal-to-noise ratio (SNR) both spatial multiplexing and beamforming result in comparable localization errors. At lower SNR values, spatial multiplexing leads to larger errors compared to beamforming due to formation of spurious peaks in the cross ambiguity function. Accuracy of the estimated parameters is improved by employing an extended Kalman filter leading to a root mean square (RMS) localization error of approximately 6.3 meters.


localization estimates.
This paper aims to apply the joint TDOA/FDOA approach with multiuser (MU) multiple-input multiple output (MIMO) hybrid beamforming (HB) for vehicle localization at mmWave frequencies. The Fisher information matrix (FIM) is used to access the data quality among multiple vehicles to manage the mmWave channel and optimize the emitter vehicle location subject to communication constraints. Channel sounding is assumed for HB to determine the channel state information (CSI), and joint spatial division multiplexing is employed to determine the precoding and coding weights for the selected system configuration.
The main contributions in this paper are enumerated: • We derive a closed-form expression of Cramér-Rao Bound (CRB) of the parameter estimation and analyze the accuracy of cross ambiguity function (CAF) for TDOA/FDOA localization in beamforming (BF) and spatial multiplexing (SM) modes. CRB estimation indicates that increasing the number of vehicles reduces the estimation error. Therefore, localization with TDOA/FDOA estimation can accordingly exploit the number of vehicles present in V2V channels. Results show that higher accuracy is achieved in BF than in SM. • We propose a TDOA/FDOA estimation approach A. Demosthenous is with the Department of Electronic and Electrical Engineering, University College London, London WC1E 7JE U.K. (e-mail: a.demosthenous@ucl.ac.uk) with MU-MIMO HB considering dual mobility of the Tx and Rx and achieving a root mean square (RMS) localization error of 6.30 m. The rest of the paper is organized as follows. Section II undertakes a literature review to detail the existing mmWave and TDOA/FDOA localization techniques and their limitations. Section III details the localization with MU-MIMO HB and investigates the performances for BF and SM. In Section IV, CRB and CAF are estimated, following which emitter localization is undertaken; accuracy is improved by employing extended Kalman filtering. Concluding remarks are drawn in Section V.

A. Current mmWave Localization Techniques
The early work to obtain position and orientation in the context of mmWave technologies involves estimation and tracking of the AOA through beam-switching, user localization through hypothesis testing, and measurement of the received signal strength [4]- [7]. Various techniques employed for massive MIMO mmWave localization include estimating various parameters such as joint delay, AOA, and AOD, including hybrid techniques based on linearization and nonparametric kernel-based probabilistic models [8]- [12].
The large bandwidths in mmWave lead to much better temporal resolution, thus potentially improving the position estimates. More antenna elements in antenna arrays lead to smaller beamwidth with higher accuracy and resolution for the angular estimation. To leverage these characteristics, recent trends have focused on estimating position and orientation with a combination of AOA, AOD, and time of arrival (TOA). The CRB bounds of position and orientation for uniform linear arrays are derived in [13] by employing signals from a single transmitter (Tx), in line-of-sight (LOS), non-line-of-sight (NLOS), and obstructed-line-of-sight conditions for downlink localization. The closed-form of FIM is derived by employing geometric relationships for the channel, position, and orientation. For the non-uniform arrays, CRBs of position and orientation are given in [14]. For an indoor channel employing BF and SM, CRBs for TOA and AOA are derived in [15] using the CAF. The CRBs are compared for BF and SM for the single-user (SU)-MIMO case. In [13]- [15], it is further shown that the position and orientation estimates can benefit from NLOS components. However, the effects of Tx location, mobile terminal, and points of incidence of NLOS components are not analyzed for the presented results. The effects of NLOS components on position and orientation are given in [16]. It is shown that for sufficiently high temporal and spatial resolution, NLOS components provide position and orientation information which can increase the estimation accuracy. However, the accuracy depends on the number of NLOS paths that are not guaranteed in outdoor mmWave channels.
The aforementioned techniques are applied to static channels or when the Tx-Rx mobility is not quantified or depends on robust beam switching/control strategies for accurate AOD/AOA estimation. Most of the techniques apply to SU-MIMO, which does not leverage multiple vehicles present in a given area of interest. On the contrary, beam-tracking becomes onerous because more vehicles require robust, fast beam switching techniques. As a result, accurate AOD/AOA estimation becomes challenging and is prone to errors as Tx/Rx is mobile.
For the dynamic scenarios present in V2V channels, the Doppler shift measurements can provide additional Fisher information for localization [17]. One such technique is localization of the Tx with TDOA and FDOA estimation, which is employed in a wide range of applications for military, security and civilian use [18].

B. Comparison of Existing TDOA/FDOA Techniques
This section examines some of the main localization techniques based on TDOA/FDOA estimation. Localization based on TDOA estimation is suitable for high-bandwidth applications such as radars and mmWave V2V communication, while FDOA estimation can exploit the Doppler shifts in V2V communication [19], [20]. Localization with joint TDOA/FDOA estimation is a twostage process. The first stage employs CAF to simultaneously estimate TDOA/FDOA from an emitter using maximum-likelihood methods [21]- [24]. Multiple TDOA/FDOA measurements are employed in the second stage to estimate the emitter location [25]. In [20], the approach aims to improve localization accuracy by nearing the CRB limits targeted at UAV applications by fusion of measurements when the likelihood of only one TDOA measurement is present; a likely scenario in a highly dynamic 3D UAV channel environment. Further, the Gaussian mixture presentation of measurements-integrated track splitting [26] filter is extended to adapt to the UAV channel for adequate tracking of the mobile emitter. Other techniques include reducing the computational requirements with algebraic [27], [28], and numerical solutions [29]. Techniques in the second stage include localization using satellites [30] and fixed sensor networks [31]. The two-stage localization accuracy depends on the signal-to-noise ratio (SNR), resulting in higher localization errors at low SNRs. On the other hand, single-stage techniques may be employed to reduce such errors in localization. Single-stage techniques known as direct position determination enable emitter localization directly from the CAF [32], [33].
The mentioned TDOA/FDOA based techniques either improve localization accuracy or reduce the computational requirements. However, the techniques are applied in simplistic or other scenarios not applicable to mmWave V2V channels. > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3

A. MU-MIMO HB Localization
A 2-D geometrical scenario is given in Fig 1. The aim is to determine the location of the emitter vehicle E, using signals received by multiple user vehicles addressed as users. Henceforth each user is given as , ∈ [1, ], where is the total number of users. Since the CAF operates only on two received signals simultaneously, the users need to be paired. The assumption is made that the two users are time and frequency synchronized with each other, although not synchronized in any way with E.
Consider the ℎ pair of users and where , ∈ [1, ], ≠ and ∈ [1, ]. The TDOA and FDOA between the signals received by these users can be jointly estimated by the CAF. If the down-converted complex baseband signals received by and are ( ) and ( ) respectively, then CAF for the m th pair is [21]: where T is the integration time and '*' is the complex conjugate. The parameters and are required to be searched that cause simultaneously �CAF� , �� to peak. Due to different geometry between E and the various users, and the users having different quality of data, selecting a pair of users can be crucial in determining the location accuracy. Various strategies exist for user pairing. This could be based on whether the pairs share information or not; or given a set of users how to optimally choose pairing [34]. Ideally a complete set of users could be employed which although results in excessive data volume, can be alleviated as mmWave enables high-data rates. The emitter vehicle E employs a HB Tx where precoding is applied to both digital baseband and analog RF domains as in Fig. 2  . This can be a fully connected or a partially connected architecture. A fully connected architecture is given in Fig. 2, where each RF chain is connected to all antenna elements in the Tx array. This full-connected scheme provides full beamforming gain per RF chain but with a high complexity of × RF paths. For the partially-connected architecture, each of the RF chains is connected to / antenna elements in the Tx array. This partially-connected architecture leads to lower hardware complexity of RF paths at the cost of 1/ BF gain. In both cases, the number of data streams that is transmitted for a user are given by . Joint spatial division multiplexing can be employed to determine and precoding weights for the selected system configuration [41]- [43]. Each user Rx array is composed of number of antenna elements, an analog coder , number of RF chains, and the digital coder resulting back in signal streams.
In high or sufficient SNR conditions, SM can be employed wherein multiple data streams are transmitted to each user. However, low SNR/cell edge conditions only allow a single data stream using the BF mode transmission. The downconverted complex baseband signal as received by at time t, after propagating through the SM-MIMO channel is given by: where the symbols are: : � � where, is the signal power received, : signal envelope of a randomly modulated symbol transmitted in a single data stream,  : random phase offset in one data stream, assumed to be uniformly distributed over [0,2 ], : white, zero-mean complex Gaussian noise, : data stream number.

B. CRB for Multiple User Pairs
For a total of M pairs, the estimation parameter vector that is required to be estimated by the ℎ pair of users is: The maximum likelihood estimator � of the TDOA and FDOA are determined by solving the following optimization problem [21]: The FIM for is given by [21]: where is the effective SNR given by: In (6) and are the SNRs in the respective user Rxs with the noise bandwidth B. The TDOA accuracy improves for larger signal bandwidths and FDOA accuracy improves for larger integration periods.
The FIM of ( ), for all the combined M user pairs where = [ 1 ⊺ , 2 ⊺ … ⊺ ] ⊺ has a block structure, and is given by [34]: where ( ) is the cross-term FIM between ℎ and ℎ user pair. The cross-terms increase the computational requirement in the network. Since some users may be paired with more than one pair, their communication needs careful consideration to avoid a collision. When no user information is shared among other pairs the cross-term FIMs ( ) are zero and the FIM reduces to: Although the FIM in (8) requires fewer computations than (7), it may yield higher localization errors due to fewer entries in the FIM. The dividend for sharing information between pairs needs careful consideration due to the increase in computation, network capacity, and latency. In this paper, it is assumed that no information is shared between any user pairs.

C. Spatial Multiplexing and Beamforming
SM is likely to be employed in high SNR conditions to improve the network capacity wherein the channel can support multiple data streams to each user. SM increases the number of ( ) that are demodulated for each user since ∈ {1, } where > 1 for SM. Accordingly, for SM the resulting CAF in (1) is dependent on more than one ( ) in (2) for each user. Accurate estimation of � for > 1 is therefore dependent on the random nature of , in (2) and how coherent is the summation of all ( ) present in the data stream. This can result in spurious peaks rather than a single peak for a given CAF, thereby reducing the accuracy of � . Therefore, even in the absence of noise and high SNR conditions, the peak of the CAF may not correspond to the true TDOA/FDOA, thereby reducing the accuracy of localization. If ( ) are assumed as unit vectors, then amplitude (A) of the squared envelope of a sum of these unit vectors, with random phases has a probability density function given by [44]: where 0 (. ) is the first-order modified Bessel function, and > 2. For = 2 ( ) is very high → ∞ [44]. Fig. 3 indicates the probability of all ( ) in the data stream being summed up coherently. The probability of three ( ) being coherently summed up is given by = 3 and = 9 which is ( ) = 0.0164. This reduces to 0.004 for four ( ) i.e. at = 4 and = 16. Likewise the probability of more than four ( ) coherently summing up is even lower. Therefore, the likelihood of CAF having spurious peaks increases for SM which has more than one ( ) in one data stream. In comparison, in BF the CAF output estimates are based on a single ( ) for each user. Therefore, even in the presence of noise and low SNR conditions the peak of the CAF is more likely to correspond to the true TDOA/FDOA.

A. MmWave 3D Statistical Spatial Channel Model
The mmWave channel simulator employed in this paper is based on a 3D statistical spatial channel model for urban LOS and NLOS channels developed from extensive 28 GHz, 60 GHz, 73 GHz, and 140 GHz ultra-wideband propagation measurements in the cities of New York City and Austin, USA [45], [46]. The model generates channel impulse responses that match measured field data at a wide range of distances from 10-10,000 m and over local areas based on the time cluster-spatial lobe modeling framework. The approach extends the 3GPP model through the directional RMS lobe angular spreads and is consistent with the 3GPP modeling framework. Based on the 3D statistical channel model in [45], [46], a MATLAB-based statistical simulator, NYUSIM, has been developed by New York University [47] that can generate 3D AOD and AOA power spectra along with omnidirectional and directional power delay profiles that match measured field results [48]. 3GPP assumes an unrealistically large number of strong eigenvalues of the channel matrix, which are not found in mmWave channels [49]. Accordingly, NYUSIM is employed in this paper to simulate the MIMO channel for more realistic results [50].
NYUSIM employs spatial consistency to simulate the time-variant channel along the user trajectory. Due to the high correlation of a wireless channel over a distance of 10-15 m, incorporating spatial consistency is required to accurately represent the consecutive and spatially correlated channel evolution along the user movement in a local area. The channel update has two parts viz., large-scale parameters such as shadow fading, LOS/NLOS condition, and smallscale parameters such as the power, delay, phase and angles of each multipath component. The large-scale parameters are updated by using a spatially-correlated map, and the smallscale parameters are updated by a geometry-based reflection surface [48].

B. Estimating CRB From (8), for a constant bandwidth B and integration time T, the effective SNRs
were obtained for 100 repetitive runs with NYUSIM to simulate the CRB of for BF and SM. The simulation specifications are given in Table I. BF corresponds to one data stream and SM corresponds to two or four data streams. The user positions were obtained from NYUSIM randomly in the distance interval [10, 500] m from E for each simulation run as given in Fig. 4 for 10 user pairs.
A vehicle-mounted base station (VMB) was assumed for . VMBs offer advantages such as real-time communication, employing massive MIMO technology and dynamic caching; and therefore, proposed as a suitable option for mmWave V2V communication [51]. The extremely high frequencies (mmWave/THz) in these bands on interest motivate design of VMBs as compact size arrays with very fine pencil beams. At E, = 256 and for users = 4, meaning that at most 64 users can be simultaneously supported by E which can form at most 32 user pairs. In addition, the user can receive at most four different data streams from E. Thus, 10, 20, and 30 user pairs and 1, 2, 4 data streams in the channel simulations were chosen.
The normalized CRBs of for various number of user pairs and data streams are plotted in Fig. 5, indicating that BF has a lower CRB than SM. This agrees with the earlier analysis for BF and SM provided by (9) and Fig. 3. SM with four data streams with 10 pair of users produce the worst performance. The yellow bar for SM, = 4 with 10 user pairs is 0 dB and therefore not visible. Furthermore, Fig.5 shows that increasing the number of user pairs lowers the CRB for both BF and SM. Therefore, the high number of users likely to be present in V2V communication with MU-MIMO HB can be leveraged.  C. Estimating � TDOA and FDOA i.e. and are estimated by integrating the received signals of a pair of users from the same E and calculating CAF given in (1). The estimated and give the largest absolute value of CAF in (4). The channel specifications are listed in Table I. The parameters of the transmitted signals are given in Table II. The normalized absolute CAF values with = 1, 2 and 4 at 10 dB and -20 dB SNR are shown in Fig. 6 and Fig. 7, respectively. It can be observed that the estimated and for BF and SM are almost identical at 10 dB SNR. As SNR decreases, the estimation performance degrades, especially for the SM case. At -20 dB SNR the twin peaks for the CAF are visible for ≥ 2, indicating the formation of spurious peaks, which reduces the accuracy of TDOA/FDOA estimation.

D. Localization Performance
For the two-stage localization on estimating and from the CAF matrix, the location of E can be calculated by solving a system of non-linear equations which [52]:  where ,1 , ,1 and ,2 , ,2 are the location of the user pairs. , is the location of E to be estimated; , are the carrier frequency and user velocity, respectively. Ten user pairs with = 1, 2 and 4 data streams were used to estimate the location of the E. By setting the received SNR as 10 dB, the estimated and are almost identical for three different number of data streams, thus outputting an identical estimated location of E. The mmWave channel simulator NYUSIM provides time-variant channel conditions in spatial consistency mode to update the user location and channel condition. The spatial consistency parameters and user velocity settings for NYUSIM are given in Table III. Channel snapshots were generated every one meter, and a 15 m moving trajectory of the emitter was simulated for 16 users (corresponding to 8 user pairs). The median value of the estimated location obtained from 8 user pairs were used as the final estimated location of E at each time instance. The 15 estimated locations along the 15 m trajectory are plotted in Fig. 8, where most localization errors are within 15 m. In addition, there is no spatial correlation between two consecutive estimations. To further improve the localization performance and take spatial correlation into account, an extended Kalman filter was applied in the following section.

E. Reducing Localization Errors With Kalman Filtering
In addition to , and the number of ( ) , another factor on which CAF depends is the received signal power in (2). This can vary significantly due to scattering behavior of the mmWave channel leading to an increase in localization errors. The effect can be observed by the signal power received even for a slow-moving Tx. The power received for a Tx with a velocity of 5 m/s is shown in Fig. 9, wherein the power received varies about 4.6 dB between 2 m and 3 m.
Kalman filters employ a series of measurements observed over time, containing statistical noise and other inaccuracies, and produce estimates of unknown variables that tend to be more accurate than those based on a single measurement alone [53]. The extended Kalman filter [54] can be employed for non-linear state-space models. In addition, the extended Kalman filter is highly accurate in estimation performance and has low computational complexity compared to, for example, particle filters and the unscented Kalman filter. Additionally, the performance of the EKF and the particle filter was found to be similar in [55]. The accuracy of localization in mmWave channels can be improved by employing extended Kalman filter to the estimated channel parameters, such as signal strength, DOA, and TOA [7], [56]. The state of the system is the position of E(x, y) (ideally, it is (0, 0)), and the measurements are the TDOA and FDOA of each user pair. The state dynamics can be written as [57]: where is the state vector of the position and velocity of the emitter (i.e., ( , , , , , )). To model the change in the emitter state vector caused by the constant velocity of the emitter, the transition matrix A is given by [57]:   where wa,x and wa,y are the random Gaussian accelerations in the x and y directions, respectively. In this study, it is assumed that the random accelerations in the x and y directions have a mean of 0 and a standard deviation of = 9 m/s 2 . An extended Kalman filter includes two processing steps: prediction and update. The prediction step predicts the next state and covariance matrix based on the current state and current covariance matrix. It uses the defined transition matrix and process noise covariance matrix, given by [57]: where is the noise covariance matrix, i.e. [ ]. The Q matrix is given by [57]: assuming the random acceleration in the x and y direction (wa,x and wa,y) are uncorrelated. The P matrix was initialized as the identity matrix since no prior knowledge of the accuracy of the initial estimate of the emitter position and velocity was assumed. The second step updates (or refines) the predicted state based on the measurements conducted at each time instance. The EKF observation model describes how the measured TDOA and FDOA are related to the position and velocity of the user. The relationship is provided in (18), (19). The user state is refined using the observation model (and the measured TDOA and FDOA). The Kalman gain is expressed as: where is the uncertainty matrix of the TDOA and FDOA measurements. is calculated from the variance of the measurement error between the ideal TDOA, FDOA and the measured TDOA, FDOA. is the Jacobian matrix of the measurements with respect to the predicted state � , given by: The updated � can be estimated by: where is the difference between the measured TDOA and FDOA and the values of TDOA and FDOA predicted using the updated � in (13), (14).
For Tx mobility, a geometry-based channel evolution for the LOS path was employed. The emitter moved towards the east at 5 m/s from [0, 0]. Each Rx was located from 10 to 50 m away from E at 10 m/s in an arbitrary direction. 16 Rxs were simulated, forming eight user pairs. The updated Kalman filter with emitter mobility was applied, resulting in a RMS error of 6.5 m as shown in Fig. 10. The corresponding close up plot is depicted in Fig. 11. Note that the channel measurements used to characterize the localization performance with the Kalman filter and obtain Fig. 10 and 11 are identical to the channel measurements used for emitter localization without the Kalman filter, used to create Fig. 8.

F. Discussion
Joint TDOA/FDOA estimation under the challenging conditions in V2V channels are potential candidates for localization estimates. The effects on various precoders or other non-ideal conditions such as imperfect CSI on accurate TDOA/FDOA estimation and localization could be an area for future work, along with strategies that could enable integration of the AOA/AOD estimates. Since all users have to transmit their received signals to a site which could be one  of the users or a dedicated base station in the network, this site handles a large amount of computations. Strategies need to be in place for reducing and balancing the computational load on the network. In case of data loss, the system localization processing will result in poor stability and increase in latency. To circumvent these issues distributed data compression could be employed to reduce the amount of data transmission, and distributed computations to reduce the computational load on site [58].
The existing MIMO localization methods together with this work are listed in Table IV. The localization errors are indicated as mean, RMS, position error bound (PEB) and probability of sub-meter accuracy.

V. CONCLUSION
This paper has proposed a joint TDOA/FDOA estimation approach with MU-MIMO HB for mmWave V2V localization. At 10 dB SNR both SM and BF result in comparable localization errors. At lower SNR values, SM leads to larger errors compared to BF due to spurious peaks in the CAF. Due to the non-linear nature of the involved state-space models, the accuracy of estimation and tracking can be improved by employing an extended Kalman filter, resulting in a localization RMS error of ~6.3 m. The proposed technique resulted in a smaller user range error than the broadcasting GPS signal standard of ≤7.8 m with a 95% probability given by the US government [59].