Thermophysical modelling and parameter estimation of small solar system bodies via data assimilation

Deriving thermophysical properties such as thermal inertia from thermal infrared observations provides useful insights into the structure of the surface material on planetary bodies. The estimation of these properties is usually done by fitting temperature variations calculated by thermophysical models to infrared observations. For multiple free model parameters, traditional methods such as Least-Squares fitting or Markov-Chain Monte-Carlo methods become computationally too expensive. Consequently, the simultaneous estimation of several thermophysical parameters together with their corresponding uncertainties and correlations is often not computationally feasible and the analysis is usually reduced to fitting one or two parameters. Data assimilation methods have been shown to be robust while sufficiently accurate and computationally affordable even for a large number of parameters. This paper will introduce a standard sequential data assimilation method, the Ensemble Square Root Filter, to thermophysical modelling of asteroid surfaces. This method is used to re-analyse infrared observations of the MARA instrument, which measured the diurnal temperature variation of a single boulder on the surface of near-Earth asteroid (162173) Ryugu. The thermal inertia is estimated to be $295 \pm 18$ $\mathrm{J\,m^{-2}\,K^{-1}\,s^{-1/2}}$, while all five free parameters of the initial analysis are varied and estimated simultaneously. Based on this thermal inertia estimate the thermal conductivity of the boulder is estimated to be between 0.07 and 0.12 $\mathrm{W\,m^{-1}\,K^{-1}}$ and the porosity to be between 0.30 and 0.52. For the first time in thermophysical parameter derivation, correlations and uncertainties of all free model parameters are incorporated in the estimation procedure and thus, results are more accurate than previously derived parameters.


Introduction
Thermal conditions on atmosphereless, small solar system bodies are governed by the thermophysical properties of the surface material, e.g. thermal conductivity, heat capacity, and emissivity. The thermal conductivity is coupled to structural properties of the surface material such as grain size and porosity [44]. Observing the surface in the thermal infrared wavelength range, typically 5-25 µm, provides direct insight into the thermal conditions on the surface. Thus, thermophysical and structural material properties can be derived from thermal infrared data.
Infrared data is usually analysed by comparing the observed flux to the results of thermophysical models [18,41,40], to fit the observation in a weighted least-squares approach [35,47,50,7,16]. Typically, only a few parameters are varied in these works while most parameters of the model are assumed to be some constant value, or varied in coarse steps. This is due to the extensive computation time necessary to compute a solution of the standard thermophysical models. Often, look-up tables are computed prior to the fitting, and the temperature variation of the surface is interpolated from these tables [35].
Recently, [1] published an approach where a surrogate model, in form of a neural network, returns the temperature variation of an asteroid surface given insolation data and some thermal parameters, i.e., thermal inertia of a rock component of the surface regolith, thermal inertia of the fine components, a surface roughness parameter, and the rock component's area coverage. While significantly increasing the speed of the temperature calculation and thus allowing to use Monte-Carlo approaches to approximate unknown parameters, the trained network is merely a surrogate model of the true physical system and is thus limited in its predictive power. Furthermore, the computational complexity to fit a neural network significantly increases for more detailed thermal models with a higher number of free parameters.
The commonly used least-squares approach does not require to generate as many model evaluations as is necessary for a Monte-Carlo estimation but demands some form of linearisation. Consequently, unlike the Monte-Carlo ansatz, the least-squares technique can only provide a Gaussian approximate of the true uncertainty of the parameter estimate. The approach presented in this paper addresses the issues associated with existing fitting algorithms such as the least-squares approach and MCMC estimation. More precisely the proposed method is computationally feasible, i.e., only a relatively small number of samples compared to the MCMC approach are necessary for the algorithm to give robust results. Further the method does not require a linearisation and thus is able to capture the highly nonlinear relationship between surface temperature and observable infrared emission while providing a good representation of the uncertainty of the approximation.
It is important to mention that the method presented in this paper is a standard approach that has been developed in the field of data assimilation (DA) [9,28]. Here it is adapted to thermophysical modelling for the purpose of retrieving thermophysical properties from infrared observations. The proposed method, the so called Ensemble Square Root filter (ESRF) [49,43,34], combines the key strength of the least squares method, more specifically the "best linear unbiased estimator" [43], with the ones of the Monte Carlo approach and it has been successfully applied to highly nonlinear problems with large number of free parameters of order 10 7 and its accuracy and stability has been rigorously investigated in recent years [6,5,26].
Data assimilation techniques are widely employed in the Earth sciences, in particular in meteorology, atmospheric physics and oceanography. For other solar system bodies, data assimilation has been applied to atmospheric data sets from orbital Mars missions [31,54]. However, so far data assimilation has not been applied to the thermal infrared data sets gathered from small solar system bodies.

Methods
The method described in this section is one of the standard approaches for nonlinear high dimensional state estimation. Here, "state" denotes the variables that describe the time-dependent condition of the system, i.e. the surface and sub-surface temperature, as opposed to "parameters" that govern the state, i.e. the thermophysical properties of the surface material.
The DA technique is designed to infer states and parameters of a dynamical system of interest on the basis of two sources of information: a model (typically given by an evolution equation of a state of interest dependent on partially unknown parameters) and partial and noisy observations of the system. At first we will discuss the considered model and the associated observations followed by an introduction of the Ensemble Square Root filter [49,43].

Model
The thermophysical model used in this study is similar to the one used in [41,18,16] and assumes the surface to be a semi-infinite and homogeneous half-space. The 1D-heat conduction equation is solved where Ω is the rotation period, T (x, t) is the time-and depth-dependent temperature with x being the depth variable in the direction of the local surface normal and x = 0 at the surface. The depth is normalised to the diurnal skin depth d which is defined as: where ρ is the density, c p the specific heat capacity and k is the thermal conductivity. This normalisation requires to assume k, c p , ρ to be constants. At the lower boundary the flux is set to zero. The upper boundary condition is given by the energy balance at the surface: where A is the surface bond albedo, I(t) is the solar illumination, is the thermal emissivity and σ B is the Stefan-Boltzman constant. Further Q th (t) denotes the thermal radiation received from the surrounding terrain.
The thermal inertia is defined as Γ = √ ρc p κ in units of J m −2 K −1 s −1/2 . This parameter is commonly used to describe the amplitude of the diurnal surface temperature variation and its phase shift with respect to maximum insolation. The higher a surface's thermal inertia the later it will reach its maximum temperature in the afternoon, and the smaller is the difference between day and night temperatures. This thermal model calculates the temperatures on a spatial grid. We chose this grid to consist of 41 points spread over eight diurnal skin depth with increasing distance, as described in [18,16].
The aim now is to determine unknown parameters of interest of equation (3), e.g. the thermal inertia Γ, emissivity ε, etc., which result in a specific temperature profile. In this paper we use a sequential data assimilation algorithm to simultaneously estimate the temperatures on the 41 grid points, the state, as well as the model parameters. This is achieved by defining an "augmented" state vector, which consists of the temperatures and model parameters and will be denoted z(t) ∈ R Nz in the following.
where the temperatures evolve according to the thermal model eq. (1). Note that besides Γ, any parameter of the thermal model can be included in z.
While the model parameters are time independent, the data assimilation requires some sort of evolution in time for a sequential improvement of the parameter's estimate. Here, a Brownian motion is chosen to ensure that the parameter space is traversed sufficiently to converge to the true parameter value. Here, the forward model of the thermal inertia is defined as where W t is a Wiener process, i.e. the mathematical description of the Brownian motion. This process is realised by adding, in each forecast step, a random number to the previous estimate: where τ i is the time at which the parameter is updated, τ i−1 is the time of the previous update, and ζ Γ (τ ) is a random number drawn from a normal distribution N centred on zero and with a standard deviation of α Γ (τ ). The update time τ in this study is equivalent to the time of the observations. This formalism can be applied to other parameters of the thermal model, each with their own choice of α as provided in table (1).

Data
In order to employ DA techniques, observations that can be linked to the state of interest are required. The relationship between observations y obs and augmented state z(τ ) can be written as where ν(t) ∈ R Ny is the observational noise which is assumed to be Gaussian distributed with zero mean and covariance matrix R ∈ R Ny×Ny and H ∈ R Ny×Nz is the observation operator. Further note that the number of observed components N y is often significantly smaller than the dimension of the augmented state space N z . In this study only one component of z(τ ) is observable. In the first part of the study, this is the surface temperature T (0, τ ). In the second part, it is the radiance emitted by the surface and observed by the MASCOT radiometer in the second. This means that N y = 1, R is a scalar corresponding to the measurement uncertainty, and H is given by

Ensemble Square Root Filter
In the following we will introduce the Ensemble Square Root Filter (ESRF) which is a prominent member of the family of Ensemble Kalman Filters (EnKFs) [10]. The class of EnKFs comprises established sequential DA techniques applicable to models with non-linear evolution of high-dimensional states and parameters. Ψ is the operator evolving the augmented state from time τ i−1 to τ i . For the temperature evolution, Ψ is the solution to the PDE given in (1) evolving the temperatures T (x, τ n−1 ) to the temperature forecast T (x, τ n ). For the evolution of the model parameters, Ψ is described in eq. (6). Note that it is possible to add some noise in eq. (9) to express uncertainties stemming from model or numerical errors. The numerical success of the family of EnKFs has been documented for various applications [10] and there are rigorous accuracy and stability analyses available for the considered ESRF [6,5]. Further the proposed deterministic branch of this family (class of Ensemble square root filters) is preferable [49,34] over the stochastic alternative (also know as perturbed EnKF) which is also very popular in the literature [8].
The joint basis of the family of EnKFs is that the associated algorithms are Monte Carlo approximations of the classic Kalman Filter (KF) [23] which computes the best linear unbiased estimator (BLUE) for the given linear model (e.g., a linearisation of eq. (9)) and the observations (eq. 7). Note that the classic KF is only applicable in a linear setting. More specifically the KF has the underlying assumption that the posterior distribution p(z(τ n )|y obs (τ 1 : τ n )) that describes the probability of the augmented state z(τ n ) given all the data y obs (τ 1 : τ n ) from time τ 1 up to time τ n is a Gaussian N (m a (τ n ), P a (τ n )). Bayes Theorem connects this posterior distribution with the prior distribution which describes the probability of the augmented state z(τ n ) given all the data y obs (τ 1 : τ n−1 ) from time τ 1 up to time τ n−1 . The prior distribution is assumed to be Gaussian N (m f (τ n ), P f (τ n )) as well. This is the case when the model operator is linear,i.e., Ψ = A with A ∈ R Nx×Nx (as well as the observation operator H) and the initial value, the model and observational noise are independent identical Gaussian distributed. The link between prior and posterior is achieved via the likelihood l(y obs (τ n−1 )|m f ) which describes the probability of the observations conditioned on the current state estimate,i.e., where m ∈ R Nz is the first and P ∈ R Nz×Nz the second moment of the associated normal distribution, the superscript a and f are abbreviations of analysis and forecast used to distinguish between posterior and the prior distribution respectively. This notation is in accordance with the classical DA notation in the main application areas such as numerical weather prediction and oil recovery. Algorithmically the KF filter iterates over updates of the prior and the posterior mean and covariance for each time step τ . Figure 1: The graph visualizes the classic Kalman update. The blue (left) normal distribution is the prior with mean m f , while the green (centre) represents the likelihood function of the observation (red dot y obs ) and the yellow (right) distribution describes the so called posterior distribution with mean m a .

Kalman update
The classic KF can be extended to a nonlinear model setting via an ensemble approach where an ensemble of M augmented state vectors z f i (τ n ) and z a i (τ n ) with i ∈ {1, . . . , M } are generated in each time step τ n to approximate the Gaussian prior and posterior distribution via the empirical posterior mean and covarianceP for each τ n and analogously for the empirical prior distribution. The iterative update procedure of the samples is described in Algorithm 1. At first the initial ensemble of M augmented state vectors z i (0) with i ∈ {1, . . . , M } are generated by sampling from Gaussian distributions N (m(0), P(0)) which are then, individually, sequentially updated by iterating over the forecast and analysis step.
Analysis step: with update D il given in eq. (15) end for The corresponding MATLAB ® code will be provided upon request, please contact us if you are interested. The entries of the update matrix D ∈ R M ×M depend on the the components w i of the weight vector where 1 is a column vector filled with ones and in R M . A f is a matrix in R Nz×M that provides the distance between each ensemble member to the mean of the ensembles: The matrix S and its entries S ij that enter eq. (15) are defined by: where the matrix square root is defined as B 1/2 B 1/2 = B for a matrix B. The name "Ensemble Square Root Filter" refers to this matrix square root computation.
Note, that the update occurs at the times τ , i.e., the observation times. The update matrix D is constructed so that empirical mean and covariance are equal to the true mean and covariance of the posterior, m a =m a and P a =P a , for a linear Ψ. In other words the algorithm is designed to produce the same mean and covariance as the classic KF for a linear setting even for finite number of ensemble members M , whereas other EnKF versions only produce the KF mean and covariance in the ensemble limit M → ∞, e.g., the perturbed EnKF [9]. Further note that the update of each ensemble member depends on all other ensemble members, coupled through the empirical covariance matrixP a given in equation (12). For a more detailed derivation of the ESRF and its properties see chapter 7 in [43].
This form of update does not require the model to be linear which is one of the key benefits of the ESRF compared to the classic KF. Furthermore, despite the underlying Gaussianity assumption, the nonlinear evolution of the particles allows to capture the more complex behaviour of the system and thus leads to more realistic estimates.

Numerical simulation
The ESRF is tested for two cases. The first case is a proof-of-concept where the thermal inertia is derived in a controlled and simplified set-up with an artificial dataset based on a reference solution of the thermophysical model. In the second case it is employed to revisit the analysis of the radiometric data set retrieved by the MASCOT lander from the surface of Near-Earth asteroid (162173) Ryugu [16].

Estimation of Thermal Inertia in a Simplified Model
The aim of this numerical example is to show how the technique performs in a controlled setting. This is achieved by generating an artificial reference temperature profile computed by means of a set of fixed reference parameters. In order to validate the performance of the proposed technique the estimates obtained via the ESRF are compared to the reference temperature variation and reference thermal inertia.

Reference Solution
The reference temperature is simulated using the model given in eq. (1) above with thermal inertia Γ ref = 300 J m −2 K −1 s −1/2 , an albedo of 0.015, emissivity of 1, and Q th of 0. The illumination is calculated by the simple assumption of a spherical asteroid, with equal length of day and night: where I(t) = 0 if cos 2π Ω t < 0 and I max = 800 W/m 2 similar to the illumination conditions on Ryugu. The rotation period Ω can be chosen arbitrarily and was set to the rotation period of Ryugu of 7.63262 h [53].

Initial ensemble
For each ESRF simulation an initial ensemble of M = 50 members is generated and the thermal inertia values of these ensemble members are then drawn from a Gaussian distribution, Γ i = Γ start + ζ i , with ζ i ∼ N (0, α 2 ) and α = 20 J m −2 K −1 s −1/2 . We repeat this procedure for 20 ESRF simulations, sampling Γ start ∼ N (250, 100 2 ) rather than running a single simulation of 1000 Members with a standard deviation of 100 J m −2 K −1 s −1/2 . We found that by doing so we gain a more homogeneous sampling of the initial distribution in parameter space. Furthermore, we save computation time as each of the 20 ESRF simulations converges quicker than a single simulation with a larger ensemble and the individual runs can be evaluated in parallel.
In order to save more computation time, a number of temperature profiles are pre-calculated assuming the parameters given above and varying the thermal inertia between 100 and 500 J m −2 K −1 s −1/2 in steps of 50 J m −2 K −1 s −1/2 . The ensemble member's temperature profiles are than initialised by interpolating from the pre-calculated temperature profiles. These provided a more realistic initial guess for the temperature solution, accelerating the convergence of the PDE-solver.
For each ensemble member the temperature profile is sampled from a Gaussian distribution centred on the interpolated temperature profile with a standard deviation of 1 K. This method ensures that the ensemble is spread sufficiently to evolve through the parameter space while at the same time keeping the temperature profiles close enough to a physical solution to ensure convergence of the differential equation solver.

DA settings
The partial differential equation is solved using the MATLAB ® "pdepe"-solver for a total of 30, 000 equidistant time steps per simulated rotation, i.e. diurnal cycle. This corresponds to a time step ∆t = 0.91 s.
For the Kalman update, 15 observation points are placed equidistantly from noon (τ = 0) to noon. The thermal model is run between these observations for 2000 ∆t to forecast the temperature profile at the next observation, using the thermal inertia from the last Kalman update. The observation error is set to 1 K, which corresponds to setting the associated covariance matrix to R = 1.
The augmented state vector is then given by The thermal inertia evolves as described in eq. (5) and (6), where the parameter α(τ ) is reduced from one simulated rotation to the next. The width is varied from α = 10 J m −2 K −1 s −1/2 in the first rotation to α = 5 J m −2 K −1 s −1/2 in the second, α = 1 J m −2 K −1 s −1/2 in the third, α = 0.5 J m −2 K −1 s −1/2 in the fourth, and α = 0.2 J m −2 K −1 s −1/2 for the remaining rotations. This gradual decrease of α is in line with classic simulated annealing schedules often employed in the context of Monte Carlo methods. The key idea however is very intuitive, i.e., big α help to traverse the parameter space more quickly while they also prevent convergence of the ensemble members. Thus lower α values are chosen as the estimation procedure progresses in order to allow the posterior distribution to converge.

Results
The 20 ESRF simulations were run for 20 asteroid rotations starting with a randomly chosen thermal inertia each. Fig. 2 shows the histogram of the thermal inertia after 10 rotations. During the first few simulated cycles, the parameters spread out wide before converging. The last thermal inertia estimates of all ensemble members over all 20 simulations were combined into the histogram, i.e. 1000 thermal inertia estimates make up the final result of Γ = 299 ± 4 J m −2 K −1 s −1/2 with the uncertainty given by the 2σ bound, where σ is the standard deviation over the thermal inertia set. The thermal inertia of the reference temperature was 300 J m −2 K −1 s −1/2 and could thus be successfully retrieved. Furthermore, the reference temperature could be retrieved well within the assumed 1 K uncertainty. The bottom panel of Fig. 2 shows the temperature estimates at the 15 observation points, where at each point the mean and standard deviation was taken over the last diurnal cycle. The error bar indicates the 2σ uncertainty.
This study demonstrates the working principle of the considered data assimilation algorithm for the retrieval of thermophysical parameters from temperature observations. In the next step the ESRF will be used to revisit the radiometric data obtained on the surface of asteroid (162173) Ryugu by the MARA instrument [15,16].

Thermal Inertia Estimation of Ryugu
The MARA instrument onboard the MASCOT lander observed the infrared flux emitted by the surface of a single, irregularly shaped boulder on Ryugu for a full diurnal cycle. The instrument consists of six infrared bolometers, that are placed behind different infrared filters. The 8 -12 µm (W10) filter was the instrument channel with the highest fidelity and was used for the initial analysis reported in [16]. In that work, the nighttime data was fitted by minimising the χ 2 value that measured the misfit between observed flux and the one predicted by a thermal model. This thermal model included as free parameters the thermal inertia Γ, emissivity ε, the orientation of the surface observed by MARA in terms of azimuth θ and elevation φ of the surface normal, and the view factor to the surrounding terrain f which parametrizes Q th as follows: where the temperature of the surrounding is assumed to be equal to the brightness temperatures observed by MARA, T obs , as described in [16]. The surface orientation had to be included as a free parameter as the observed spot on the irregular boulder showed a rugged texture with various parts of unknown orientation, and thus unknown illumination condition. The parameters θ and φ therefore represent an averaged surface orientation within the field of view of MARA.
The illumination is determined by the scalar product of surface orientation n and (time-dependent) solar vector s: where I 0 is the solar constant and r h is the heliocentric distance.
In [16], the parameter space was sampled by a grid search, where the thermal inertia was varied in steps of 1 J m −2 K −1 s −1/2 . However, due to the high computational cost of the thermal model, the other parameters were varied in significantly coarser steps, i.e. only three emissivity values were considered (0.9, 0.95, 1) along with only two values for f (0 and 0.08). To test the efficiency of our new approach this analysis was repeated using the ESRF.

Forecast settings
The forecasts of parameters and temperatures are again calculated using eq.(1) and a forward model of the free parameters as in eq. (5). The free parameters of the model were chosen analogous to the analysis of [16]: Γ, ε, φ, θ, f . Also, the same grid settings were applied, i.e. we calculate the temperature profiles for a 1D grid with 41 grid points spread over eight diurnal skin depths. The augmented state is then given by: Note that unlike in eq. (20) the surface temperature T (0, t) is not directly observable but connected to the observed surface radiance F W 10 via the instrument function.
where λ is wavelength, B is the Planck function, and q it the MARA filter throughput [15]. The radiance observed by MARA is calculated from the reported brightness temperatures [16], i.e. T in eq. (24) is set to the brightness temperature and ε = 1. Note, that the MARA signal depends linearly on the net flux between MARA and the surface. The calculation of the brightness temperature from the signal incorporates the temperature of the MARA sensor, the instrument field of view, sensitivity, etc., which does not need to be repeated in this study. As in the simplified case, the forecast of the parameters is performed according to eq. (5) and (6). The value is again sampled from a Gaussian distribution where the standard deviation α(τ ) is stepwise reduced. An overview of α for the respective model parameters is provided in table 1.

Data settings
The Kalman updates are performed at nine equidistant points of the nighttime data starting from 17:45 local time, which corresponds to the first data point considered in [16]. The distance between the points is similar to the one in the first study. The temperature is forecast by the thermal model, and converted into radiance received by the MARA W10 filter (F W 10 ), based on the instrument calibration [15,16] and the ensemble member emissivity. The illumination is calculated for each ensemble member based on the angle between the sun vector and the surface normal according to eq. (22), while azimuth θ and elevation φ of the surface normal are updated from observation to observation. Likewise, surface emissivity, the view factor f , and the surface thermal inertia are updated.

Initialisation
The ensemble states are initialised similar to the first case. In total 20 ESRF simulations are performed and for each simulation a thermal inertia is randomly picked from Γ start i ∼ N (300, 100 2 ). In each simulation an ensemble with 50 members is initialised, and for each ensemble member a thermal inertia is sampled from Γ i ∼ N (Γ start , 100 2 ). The thermal inertia is confined to an interval of 150 to 450 J m −2 K −1 s −1/2 , a range that is larger than given by conservative estimates for Ryugu's thermal inertia [51]. For a sample of Γ i > 450, the thermal inertia is set to 450, if Γ i < 150 it is set to 150. The emissivity of the ensemble members is sampled from a Gaussian distribution ε ∼ N (1, 0.02 2 ) and confined to the interval of 0 and 1. Thereby, emissivity values larger one are folded back into the interval, i.e. an ε = 1.05 is set to 0.95 etc. The view factor to the surrounding terrain f is sampled from f ∼ N (0.048, 0.007 2 ) based on the topography of the landing site as described in the methods section of [16].
Azimuth and elevation of the surface normal in the best fitting case of the initial MARA data analysis were found to be 20 •  The temperatures of the ensemble members are initialised by interpolating the temperature profile from precalculated simulations. These pre-calculated simulations were performed for ε = 1, θ = 20, φ = 80, and f = 0, while the thermal inertia was varied between 150 and 450 J m −2 K −1 s −1/2 in steps of 50 J m −2 K −1 s −1/2 . It should be noted here that the resulting initial temperature curves are not consistent with the initial parameter sets of the ensemble members. However, this is not problematic as the ensembles are given enough time to produce consistent solutions. Rather, this initial temperature profile serves as a better first guess for a solution of the 1D-heat conduction equation, e.g. compared to assuming a constant temperature as in [16], and results in a quicker convergence of the temperature solution.

Results
The data assimilation method allowed for a much finer variation of the free parameters, resulting in a more thorough estimate of the thermal inertia. The thermal inertia was found to be 295 ± 18 J m −2 K −1 s −1/2 where the uncertainty corresponds to two standard deviations. This result lies within the range of the former estimate with lower uncertainty. Fig. 3 shows parameter estimates of the ensembles, averaged over time, combining 20 simulations with randomly chosen starting thermal inertia. The histograms displaying the posterior distributions of the various, simultaneously estimated model parameters show a major advantage of this method, which can account for non-Gaussian distributions. The thermal inertia estimate is roughly Gaussian distributed, with a slight tilt towards lower values, accounting for the fact that the effect of thermal inertia on the temperature variation decreases with increasing thermal inertia.
The other parameter distributions contain important information about the observed boulder. The estimated emissivity is very high, between 0.92 and 0.98. This is in line with the extremely dark appearance of Ryugu and the fact that roughness, as observed on the boulder in front of MASCOT, tends to increase the emissivity even further. The estimates for f show that this parameter might have been underestimated in [16], i.e. that the view factor of the observed spot towards the surrounding terrain is up to 10%.
The elevation of the average surface orientation within the MARA field of view is estimated to be between 80 • and 85 • , which is consistent with the camera images of the boulder surface in the field of view [22,46]. The azimuth distribution shows that the most likely values lie within 275 • and 350 • , i.e. oriented towards south-east. This is also the direction of the MARA boresight, which is consistent with the fact that those surface parts oriented towards MARA will contribute most to the signal. Also, due to the roughness effect, such an orientation would result in a systematically lower noon temperature as reported by [16]. The former best-fit azimuth of 20 • is less likely as the very flat peak in the posterior distribution indicates. However, many of the fitting models reported in [16] showed an azimuth similar to the one retrieved in this work.
The figure also shows, that the estimated surface radiance matches the observed one very well. The major improvement of this analysis over the initial one is the full correlation among the parameter estimates and a smooth, simultaneous, and statistically thorough estimation rather than a rough parameter sweep. Despite the fact that 47 parameters are estimated simultaneously, including the 40 sub-surface temperatures, one simulation run requires only 30 minutes on a Laptop with 4 cores, drastically decreasing the computational resources needed.

Convergence of Ensemble Distribution
Since one of the major advantages of using an EnKF variant for the parameter estimation is the increased computational speed, it is important to investigate the convergence of the estimate. Fig. 4 shows the evolution To obtain a stable result, the number of simulations starting with different initial parameter combinations is more important than the length of each simulation. The combined results of 20 simulations, each with 50 ensemble members, converged quicker than the result of a single simulation (not shown in figure). Since the different simulation are independent of each other and can be run in parallel, this saves substantial computation time.

Conclusions
This study introduced data assimilation as a method to derive thermophysical model parameters along with their associated uncertainties from thermal infrared observations. The considered ESRF allows for a simultaneous estimation of the state, i.e., surface and subsurface temperatures, as well as model parameters, i.e. thermal inertia, emissivity, surface orientation etc., based on observed thermal infrared flux. Ensembles generated by the ESRF form a distribution that represents the uncertainties of state and parameters, while automatically including their respective correlations. As the performed forecast step is done on the basis of the thermal model without a linearisation the ensemble is able to better capture the nonlinear relations better than commonly employed techniques such as the Least squares method.
The observations of the MARA instrument onboard the MASCOT lander were revisited in this work applying the ESRF. The results are consistent with the initial analysis of [16] but narrow the range of the thermal inertia estimate to 295 ± 18 J m −2 K −1 s −1/2 . At the same time the emissivity could be constrained to 0.92 − 0.98. The average surface orientation of around 80 • elevation and 290 • azimuth indicate that a significant fraction of the boulder in the MARA field of view is orientated towards the instrument. As these parts of the boulder face away from the sun during day, this result is consistent with reduced daytime temperature and the roughness effect reported in [16].
An advantage of the ESRF scheme is its computational design to cope with large dimensions of state and parameter spaces, while being robust for nonlinear systems. The possibility of estimating many parameters simultaneously enhances the scientific output of the remote sensing data. The parameters retrieved from the MARA observations are more accurate than previous estimates, as the ESRF discards unlikely parameter combinations and incorporates their correlations. At the same time, the parameters were sampled from a wide section of the parameter space and allowed to vary freely, limited only by basic physical limits, which should ensure that the parameter space was sufficiently covered.
The features of the family of EnKFs, i.e., coping with large dimensions of state and parameter spaces, provide great flexibility. The efficiency in handling many model parameters simultaneously sets the ESRF method apart from other Bayesian Monte-Carlo methods such as the Markov-Chain Monte Carlo [1] or Particle filters [36]. Further applications may include other thermal models and sets of parameters, e.g. such as modelling multiple ground layers with different thermal conductivity, or including temperature dependent models of thermal conductivity and heat capacity [50]. This will allow for an improved estimation of thermal properties on many other objects in the solar system such as Mars, the Moon, or Comets.