New-generation Maximum Entropy Method (ngMEM): a Lagrangian-based algorithm for dynamic reconstruction of interferometric data

Imaging interferometric data in radio astronomy requires the use of non-linear algorithms that rely on different assumptions on the source structure and may produce non-unique results. This is especially true for Very Long Baseline Interferometry (VLBI) observations, where the sampling of Fourier space is very sparse. A basic tenet in standard VLBI imaging techniques is to assume that the observed source structure does not evolve during the observation. However, the recent VLBI results of the supermassive black hole (SMBH) at our Galactic Center (Sagittarius A$^*$, SgrA*), recently reported by the Event Horizon Telescope Collaboration (EHTC), require the development of dynamic imaging algorithms, since it exhibits variability at minute timescales. In this paper, we introduce a new non-convex optimization problem that extends the standard Maximum Entropy Method (MEM), for reconstructing intra-observation dynamical images from interferometric data that evolves in every integration time. We present a rigorous mathematical formalism to solve the problem via the primal-dual approach. We build a Newton strategy and we give its numerical complexity. We also give a strategy to iteratively improve the obtained solution and finally, we define a novel figure of merit to evaluate the quality of the recovered solution. Then, we test the algorithm, called ngMEM, in different synthetic datasets, with increasing difficulty. Finally, we compare it with another well-established dynamical imaging method. Within this comparison we identified a significant improvement of the ngMEM reconstructions. Moreover, the evaluation of the integration time evolution scheme and the time contribution showed to play a crucial role for obtaining good dynamic reconstructions.


INTRODUCTION
Very Long Baseline Interferometry (VLBI) is a technique that combines telescopes, separated by large distances, to synthesize the aperture of a single telescope with size similar to the maximum distance among the VLBI elements (Thompson et al. 2017).With such large apertures, it is possible to obtain images of radio sources with resolutions as high as a few tens of µas (e.g.EHTC 2019a, 2022a; Gómez et al. 2022;Lu et al. 2023).However, VLBI only sparsely samples the Fourier components of an image, which may introduce severe degeneracies in the ill-possed inverse problem of image deconvolution (Kellermann & Moran 2001;Ryle 1962;Thompson et al. 2017).In addition to this, Earth rotation synthesis assumes that the source being imaged is static during the experiment, typically hours.But there are cases of sources showing fast structural changes even in intra-hour observation, as it happens with the supermassive black hole (SMBH) Sagittarius A * (Sgr A * ) in the Galactic Center (GC), re-⋆ Contact e-mail: alejandro.mus@uv.escently imaged by the Event Horizon Telescope Collaboration (EHTC) (EHTC 2022c; Knollmüller et al. 2023).This source has an estimated mass of M = 4 × 10 6 M⊙ (Ghez et al. 2008;Gillesen et al. 2009), and its gravitational timescale is only a few seconds.Moreover, its innermost stable circular orbit has periods of only 4 to 30 minutes, depending on the black hole spin (Bardeen et al. 1972).Even its polarization presents fast variations on similar timescales (e.g.Gravity Collaboration et al. 2023;Eckart et al. 2006;Johnson et al. 2015;Marrone et al. 2006;Wielgus et al. 2022;Zamaninasab et al. 2010).
Reconstructing images via deconvolution from sparse arrays, as the Event Horizon Telescope (EHT), is an ill-posed inverse problem based in some hypothesis on the underlying image, for example, that the source is not evolving during the observation.For static sources, the traditional techniques, as those based on CLEAN (Clark 1980;Högbom 1974;Müller & Lobanov 2023a), and those based on Maximum Entropy Methods (MEM) (Cornwell & Evans 1985;Holdaway & Wardle 1988), have proven to work very well for decades.However, if the source is changing in time, measurements are no longer satisfying the imaging assumptions, and these reconstruction algorithms may fail.
In the context of VLBI, there has been a growth on the development of different algorithms to image time-variable sources.Conventional snap-shot imaging methods have been used to reconstruct micro-quasars (for instance Miller-Jones et al. 2019) and many other stellar objects.Lu et al. (2016) presented a more complex strategy, a pioneer work on recovering a (2+1)-dimensional1 image based on a time-averaged morphology of Sgr A * in EHT scales.That is, they take into account the time dependence of the source during the imaging process.Recently, the EHTC (2022c) applied a similar strategy to obtain a time-averaged image of this source during long time observations.However, these techniques present some limitations as the variability of the source injects a noise on the data hard to model.To address this challenge, a new generation of imaging algorithms has emerged, aiming to overcome the rapid variability by reconstructing time-resolved brightness distributions, often referred to as "movies" (e.g.Arras et al. 2022;Bouman et al. 2018;Farah et al. 2022;Johnson et al. 2017;Leong et al. 2023;Levis et al. 2021Levis et al. , 2022;;Müller & Lobanov 2023b).
In the near future, with denser arrays as the nextgeneration EHT Doeleman et al. (2019); Johnson et al. (2023a,b) 2 (ngEHT), together with the increment of receiver sensitivity and computer capabilities, it will become more standard to reconstruct not only static images but movies of the sources.In this work, we present a new imaging algorithm, the new-generation Maximum Entropy Method (ng-MEM), rooted in the MEM principle.It extends the optimization framework proposed by Cornwell & Evans (1985) to encompass both static and dynamic reconstructions of evolving sources.
In addition, this paper surveys diverse methods for dynamic VLBI imaging, extensively discusses minimizing the ngMEM using a primal-dual concept, explores oftenoverlooked convergence criteria, strategies for initiating the minimization and ensuring its convergence.
The organization is as follows: First, in Sect.2, we recall the VLBI basics.Then, in Sect.3, we present the main families for static imaging and, in more detail, the MEM appearing in Cornwell & Evans (1985).In Sect.4, we show a brief overview of some of the actual dynamic imaging algorithms.In Sect.5, we present the ngMEM mathematical formulation.We give a numerical method to solve it, we state its convergence and we discuss the initial point dependence.Then, we compare it with current similar algorithms.In Sect.6, we apply the algorithm in two complex synthetic data cases: a ring with an orbiting hotspot of 24 hour period and a ring with a transient hotspot, emulating the behavior of Sgr A * inferred from the results presented in Wielgus et al. (2022).To conclude the evaluation of the ngMEM, in Sect.8 we conduct a fiducial benchmark by comparing its results with those obtained using a well-established method.Finally, Sect.9 presents the conclusions and outlooks of this work.

VLBI MEASUREMENTS AND IMAGING
An interferometer consists of T telescopes observing the same source at the same time.Any pair of telescopes (which constitutes a two-element interferometer) is called baseline.Each baseline, at each integration time, measures a complex visibility, V ab , where a and b denote the telescope indices.Such visibilities are the time-averaged cross-correlation of the recorded scalar electric field at two telescopes, and they are related to the brightness distribution on the sky, I (x), by the van Cittert-Zernike Theorem (Thompson et al. 2017) V (u, v) := I (x) e −2πi (xu+yv) dx, where (x, y) are the director cosines centered on the source, in radians, and (u, v) the baseline vector, in wavelengths, projected on the sky plane.
In the case where the image presents structural timedependent changes during the observation, the sampled visibilities Vt at a fixed time t only represent the corresponding instantaneous image or "snapshot".Therefore, a set of snapshots can be reconstructed along the experiment.If T is small, as in the case of the EHT in the 2017 campaign, where 7 and 8 sites were observing the SMBH Messier87* (M87*) and Sgr A * respectively (see for instance EHTC 2019a, EHTC 2022a), there may be strong degeneracies that prevent a robust imaging of a time-variable source.

STATIC IMAGING RECONSTRUCTION
In this section, we give a brief overview of the main methods used in static imaging reconstruction to contextualize and motivate dynamic imaging methods.As in EHTC (2019c), we distinguish two main families of static imaging methods: backward modeling and forward modeling.

Backward modeling
CLEAN (Högbom 1974) and its variants (for instance Clark 1980) are one of the most widely used algorithms for imaging.These iterative deconvolution methods are widely used for imaging and heavily rely on algorithmic parameters such as the CLEANing regions.In addition, they lack a closed mathematical formulation.Another significant limitation is their assumption that the source can be decomposed into a set of finite delta components.This may be insufficient to accurately describe extended image features observed in real astronomical images (Arras et al. 2021;Müller & Lobanov 2023a).
There have been efforts to overcome this limitation by using multiscalar algorithms that model the image as a sum of extended basis functions of different scales (Bhatnagar et al. 2004;Cornwell 2008;Rau & Cornwell 2011;Starck et al. 1994;Wakker & Schwarz 1988).Although these methods, called MS-CLEAN for multiscalar CLEAN, represent a significant advancement in imaging, they have not been extensively utilized in frontline VLBI.This is primarily due to the challenges associated with selecting appropriate basis functions, as different scales are sensitive to different regions of the uvcoverage.Additionally, MS-CLEAN methods do not inherently address the issue of missing regularization in CLEAN, specifically the occurrence of unphysical fits in the gaps of the uv-coverage.Recently Müller & Lobanov (2023a) developed a new MS-CLEAN algorithm to overcome this issue.

Forward modeling
In a first instance, we have focused on the non-convex optimization methods.These algorithms offer a high degree of flexibility by allowing the integration of complex data relations.Within this family, two main types can be identified: constrained optimization methods and unconstrained (penalty) methods, also called Regularized Maximum Likelihood (RML) methods.

Non-linear constrained optimization
On the one hand, constrained optimization methods, such as the Maximum Entropy Method (e.g.Cornwell & Evans 1985;Gull & Skilling 1985;Narayan & Nityananda 1986) and its variants, may include non-linear constraints through the use of Lagrange multipliers.In general, they seek to maximize some metric, S (p) subject to a set of constraints gi (x), and their typical formulation is

Non-linear unconstrained optimization
On the other hand, RML methods can represent complex data relations via regularizers such as the ℓ1 norm (Honma et al. 2014) or the total squared variation (Kuramochi et al. 2018), or even combinations (as Akiyama et al. 2017a;Chael et al. 2016).These functionals may not be easily modeled as constraints.The general formulation of the RML methods is Problem 2.
where f is the data fidelity term (in general, the χ 2 distribution based on visibilities, closure phases, closure amplitudes3 , etc), α k are fixed penalty parameters (or regularizers) and R (p) are conditions (as the entropy, smoothness, sparsity) on the pixel distribution p.
A fine tuning on the penalty terms α k is performed in order to find the best combination, which may depend on the particularities of the Fourier coverage and the source structure.

MEM: Cornwell version
Cornwell & Evans (1985) presented a pioneering use of the Maximum Entropy in radio imaging.In their work, the MEM is formulated in the form of Prob.(1).If we define a set of visibilities, V := {Vi}, with associated statistical weights, {ωi} and we consider the vector of pixel intensities p := {p k }, located at the position offsets {(x k , y k )} on the sky (with respect to the phase center of the correlation), we fix the objective function being S (p) a statistical measure.A typical choice is the relative entropy (or the Kullback-Leibler, KL, divergence, Kullback & Leibler 1951) , where M k is the k-th pixel of the a priori model, M .Alternative versions for S (p) and a deeper discussion can be found in Narayan & Nityananda (1986); Thompson et al. (2017).
To adjust the entropy to the data, two equality constraints are established: the expected total flux, F mod , should be equal to the recovered flux, F obs , and the χ 2 distribution of the observed visibilities V with respect to a model V mod should be close to its expected value Ω. Recall that the χ 2 of the visibilities is defined as In this equation, uj and vj are the (u, v) coordinates (in units of wavelength) of the j-th visibility, (x k , y k ) are given in radians and i is the imaginary unit.With this setting, Cornwell & Evans (1985) state the following non-convex constrained optimization primal problem: To solve Prob.(3), the Lagrangian dual approach is used, whose formulation reads where λ and β are Lagrange multipliers.Cornwell & Evans (1985) solve the optimization problem using a quasi-Newton method and updating the Lagrange multipliers in each iteration such that the Lagrangian is minimized for λ and β, and thanks to this fact, the cost function is maximized with respect to the p vector.Because of technical limitations, they work with a Hessian matrix with entries only in the superior, principal and inferior diagonal.For a sparse array like the EHT, which produces a poor dirty image, an important amount of information may be lost by doing the Hessian simplification.Our work extends this problem.
Note that this is a non-convex constrained optimization problem, since the χ 2 constraint is not affine (Boyd & Vandenberghe 2004).This may produce, among other consequences, a set of local minima instead of a unique global minimum.Therefore, special caution must be taken during the process of solving it, in particular using local search methods, i.e., algorithms that use the information of the gradient or the Hessian.For those cases, a starting point near a locally convex neighborhood is needed to guarantee the convergence to a local minimum.In interferometric observations at optical wavelengths, image reconstruction algorithms that overcome this limitation (by recovering a set of local minima) are already available (e.g Ozon et al. 2016).More recent results have also been published for VLBI (Müller et al. 2023b).
Extensions of the formulation by Cornwell & Evans (1985) have not only been used in the radio regime.In Buscher (1994), the authors developed the MEM in the optical regime, to take into account the bispectrum, which later inspired the VLBI imaging community the inclusion of closure quantities and MEM-based priors (Akiyama et al. 2017a;Chael et al. 2018b).Holdaway & Wardle (1988) and Holdaway (PhD Thesis, 1990) extended for the first time the method proposed by Cornwell & Evans (1985) to include the polarized entropy defined in Gull & Skilling (1985) for the reconstruction of polarization images from VLBI data.More recent works have been done in this area, for example , Akiyama et al. (2017b), Chael et al. (2016) Coughlan & Gabzuda (2013) and Mus et al. (Submitted).Polarimetry is out of the scope of this paper, and we refer the interested reader to the bibliography.

Bayesian and machine learning algorithms
The third family of interferometric deconvolution algorithms is based on Bayesian exploration of the posterior distribution of the source's parameters structure, using techniques as Markov Chain Monte Carlo (MCMC).They aim to find the reconstruction with the best representation of the signal and have the capability of providing uncertainty quantification.Some recent examples are Arras et al. (2021), Broderick et al. (2020) and Tiede (2022).
Further works have been done in forward-modeling algorithms (e.g.Aghabiglou et al. 2023;Leong et al. 2023;Terris et al. 2023) but it is out of the scope of this paper to give a full reference of them.

SOME PREVIOUS WORKS ON DYNAMIC IMAGING
The development of novel dynamic imaging algorithms has been active in the area driven by the EHT science.The possibility of observing Sgr A * with the resolution of the EHT, has been a strong motivation for the development of such algorithms.Roelofs et al. (2023) designed a challenge of synthetic datasets based on realistic observations and performed benchmark comparison of different static and dynamic reconstructions.
In this section, we give a brief overview of previous works on dynamic imaging following the classification of Sect. 3. It is out of the scope of this paper to give a full review of all the methods.Instead, the presented summary aims to contextualize the algorithm we introduce in this work.

Backward modeling
Recently, there has been an effort to extend the traditional CLEAN algorithm to a dynamic scale, for being able to reconstruct movies.The common technique implies that, after the imaging process reaches convergence, based on a specified stopping criterion, and produces an average static image, the CLEAN dynamic imaging technique is employed.This involves dividing the observation into smaller portions, referred to as "snapshots", with a time duration similar to the expected timescale variability of the sources.Assuming minimal structural changes over time, the model corresponding to the static image serves as the initial model.The goal is to identify structural changes by cleaning the residual map associated with each data snapshot (for an example of a similar method, see Mus et al. 2022;Stewart et al. 2011).However, this technique may be very limited in the cases of the sparse uv-coverage, which is the case of the EHT.Farah et al. (2022) introduce a new metric to overcome this limitation that ranks array configuration by their ability to produce accurate CLEAN dynamical reconstructions.

Forward modeling
In the context of non-linear optimization methods, efforts have been devoted to extend the modelization of the problem to take into account the kinematics of the source.These efforts have been mainly done for the RML methods.Their flexibility allows to add new regularizers (or penalty terms) into the cost functional to deal with kinematics of the source (see for example Johnson et al. 2017).In these methods, the observation can be thought as a set of static snapshots (or frames) that compose the "movie".These methods usually define a metric over the set of frames, and the regularizer terms impose similarity between adjacent frames and penalize the distant ones.The inclusion of temporal regularization in a model can result in the suppression of inherent source variability, but the extent to which this occurs depends on the specific source structure and timescale of variability.In Sect.8, we give two examples of time regularizers appearing in Johnson et al. (2017).We refer to that section for more details.Determining the threshold at which the regularization becomes excessive is subjective.
A slightly different approach to model the optimization problem is presented in Bouman et al. (2018), where they present StarWarps.This method utilizes a probabilistic graphical model to generate snapshots of a movie by solving the posterior probability distribution.This distribution is defined as a combination of three factors: data likelihood, multivariate Gaussian distributions for each snapshot, and transitional probabilities between adjacent snapshots.These transitional probabilities effectively serve as spatial and temporal regularization.By leveraging a Gaussian approximation, StarWarps allows for the exact inference of the movie by computing the mean and covariance of the image in nearby pixels in brightness and in time.In contrast, the RML dynamic reconstruction only derives a maximum a-posteriori estimation.
Other RML-based techniques have been developed by Müller & Lobanov (2022) (RML approach based on compressed sensing), Mus et al. (Submitted) (multiobjective optimization formulation extending the one presented in Müller et al. (2023b) including dynamic and polarimetric terms) and (Leong et al. 2023) (neural networks).In all cases, all dynamic non-linear optimization methods solve instances of Prob.(2).In this work, we present a novel algorithm that deals with the kinematics by generalizing Prob.(3).
In the context of Bayesian algorithms, Arras et al. ( 2022) extended the algorithm they presented in Arras et al. (2021), resolve, to infer the time-variability of the source's brightness distribution by exploiting its correlation structure in time, and hence, assuming correlation in close-by frames.Using this algorithm, Knollmüller et al. (2023) recently presented first dynamic reconstructions of Sgr A * .

ngMEM FORMALISM
In this section, we present the mathematical formalism for our novel algorithm for the dynamic reconstruction of VLBI interferometric data, the so-called new-generation Maximum Entropy Method (or ngMEM).First, we introduce the main idea of the problem.Then, we develop the mathematical formalism of the ngMEM, by extending the cost functional to take into account the kinematics of the source.Finally, we give the main differences between the current dynamical deconvolution algorithms and ours.
The algorithm we propose, as a generalization of Prob.(3), addresses a non-convex optimization problem.The inclusion of the χ 2 equality constraint results in the loss of convexity in the problem formulation, as it happens in Cornwell & Evans (1985).Non-convexity is a characteristic that makes it especially difficult to determine and obtain, if it exists, the global optimal value of the cost functional.However, we can still benefit from the differentiability of the objective function and the constraints.It is beyond the scope of this paper to go into detail on the optimality conditions for non-linear optimization problems and to prove constraint qualifications (authors refer to Bonnans et al. 2006, for a detailed discussion).Suffice it to say that our problem does satisfy sufficient and necessary optimality conditions, i.e., every local solution has its paired Lagrange multipliers. .Therefore, if we find a solution on the dual problem, we have its paired primal solution.

Time-dependent (dynamic) ngMEM
To extend the traditional MEM to reconstruct movies, we have included a new term in the cost function that constraints the evolution on the source brightness distribution penalized by a "time regularizer".When the regularizer is zero, the method might produce an unrealistic variability (since we would be applying the MEM to each snapshot, independently).However, if the regularizer is too large, we would recover a quasi-static movie (since all pixels would be forced to remain as constant as possible).Later in this section, we explain in detail how we handle this new term.
Before presenting the functional to be optimized, let us divide the set of visibilities into r subsets, defined as those where the visibilities were observed within a given time window.In particular, if t is the observing time, In this equation, ∆t l is a frame-dependent scalar that determines its duration and V l are the visibilites belonging to such a frame.Let us model these data as a set of "image keyframes", where there is one image keyframe for each data frame.The model will have a total of r × N 2 parameters (i.e., r images of N 2 pixels each).We can solve this problem of dynamic imaging (i.e., find the set of image keyframes that optimally fit the data) by using the following formalism.If p k i is the i-th pixel of the k-th image keyframe, we can arrange all the model parameters into a "supervector" ⃗ Q, defined as In essence, we have ordered the pixels keyframe by keyframe into ⃗ Q.With this, we can define what we call the Shannon (Shannon & Weaver 1949) "time entropy", T : R n + × V l × V m → R+, which promotes the most constant evolution of the pixel brightness distribution during the whole experiment via where where τ , which we call "image memory", will be discussed later and C is a small positive constant to ensure finite values of the logarithm.We notice that, for clarity, the superindices refer to time keyframes, while the sub-indices refer to the pixel index in the image.On the one hand, T preserves the same properties4 as J with respect to the p component.This is a great advantage.Since the maximization of the original entropy (Prob.( 3)) forces the image to have the minimum contrast that is compatible with the data (i.e., a higher entropy corresponds to a smaller difference in the intensities among pixels, e.g.Cornwell & Evans 1985), adding the time entropy to the functional will similarly enforce the pixels to have a minimum time variation that is still compatible with the data.
On the other hand, the exponential injects correlations between the intensity values that any given pixel may take at different times.For keyframes with time separations of the order of (or shorter than) τ , the time entropy will force any pixel to have an intensity with a minimum time variation.However, for keyframes separated by a time lag longer than τ , the time differences of the pixels will have little impact in the time entropy.Increasing the value of τ implies allowing the time entropy to force smaller time variability of the pixel intensities at longer timescales.Gaussian weighting is just one possible way to leverage closer keyframe, but it has some advantages: 1) it is conceptually simple, 2) it is C ∞ (infinitely differentiable), and easy to be differentiated 3) it is not computationally expensive and 4) Gaussian distribution maximizes the entropy for a given mean and τ (therefore, we maximize entropy not only in the pixel time-dependent domain, but also in the frame distribution time-domain).Thus, we can construct the cost function as the sum of the two Shannon entropies with µ ∈ R+.
Henceforth, the optimization problem now reads Problem 5. (ngMEM Problem time-dependent) where F l mod is the total flux density of the source at keyframe l and the parameter µ is the "weight" of the time entropy, meaning that larger values will force a smaller time variability in the model.This number is fixed before the optimization problem (it is a constant for J).The source model is computed at each integration time (or "frame", which shall not be confused with the "keyframes" in Eq (4)), by using a pixel-wise time interpolation of the keyframes.For instance, if we use a linear time interpolation, the model image at time t, with t ∈ (t l , t l+1 ), is This way, the χ 2 is computed using the image interpolation at each exact visibility observing time (i.e., at each frame), even though one keyframe may include several integration times.
The final reconstruction will be affected by how the keyframes are interpolated into each integration time (frame).Currently, two interpolation modes are implemented (nearest and piece-wise linear), although the algorithm can be easily adapted to other interpolation modes, like splines.
Moreover, we have a set of equality constraints on the flux density, which means that ngMEM is able to use lightcurve information as a constraint to the model.This feature may be especially important for EHT observations where ALMA participates, since precise lightcurves can be derived from the ALMA-only observations (Mus et al. 2022;Wielgus et al. 2022).

Numerical method for solving the ngMEM problem
It can be shown that the Lagrangian dual for Prob.( 5) is Problem 6. (Dual of the ngMEM Problem time-dependent) Details on the derivation of the dual problem and zerogap proof are available in Mus (PhD Thesis, 2023).That means, the solution to Prob. ( 5) can be recovered by solving Prob.( 6).We have denoted by ⃗ β the vector of Lagrange multipliers acting on the lightcurve constraints and by β l the specific multiplier acting at keyframe l.
We solve Prob.( 6) via Newton's method on a locally convex neighborhood.Thus, we need to compute the gradient and the Hessian of the Lagrangian.Since we already know how the entropy and χ 2 derivatives look like (for instance Cornwell & Evans 1985;Narayan & Nityananda 1986) we just present the ones here corresponding to the new term, T .
The gradient is and the second derivative is Notice that, according to the Kronecker delta, the second derivative of the time entropy is non-zero only if n = u (i.e., the time entropy only relates the change in time of each pixel, but not among different pixels).The time entropy introduces non-zero terms in H, which fall out of the diagonal boxes that are related to χ 2 l .Actually, H can be written as a matrix of matrices, in the following way: where H l , l = 1, . . ., r is the Hessian of the Lagrangian in the l-th image keyframe.Regarding the matrices T jk , they can be computed from Eq. ( 11), namely: In a similar fashion to ⃗ Q, we create the "super-gradient", ⃗ K, by arranging the gradient of the Lagrangian, with respect to all the model parameters, being χ 2 l and E l the χ 2 and the KL divergence respectively in the l-th keyframe.
With all these definitions, the time-dependent equation for Newton becomes, for a given iteration k, or in a more compact expression by setting The set of equations in Eq. ( 15) are the first optimal (Karush-Kuhn-Tucker) conditions which ensures the existence of an optimal solution associated to the so-called osculating quadratic problem of Prob.( 6) at (p k , λ k , β k ) (see Bonnans et al. 2006, for more details) where c k denotes the vector of the equality constraints.
The above system of equations generates a sequence of solutions to define the next iterate p k+1 , λ k+1 , ⃗ β k+1 .Methods that solve simultaneously primal and dual problems are called primal-dual methods.Of course, a solution of Eq. ( 15) is dependent on the election of several factors as the starting iterate, µ and in a lesser way, τ .
In this framework, the ngMEM algorithm is as follows Algorithm T1 Movie reconstruction algorithm solving the ng-MEM.
5.3 A particular case: ngMEM with µ = 0 It is instructive to have a closer look at a special case µ = 0. Our optimization problem is an extension of the problem defined in Cornwell & Evans (1985), or Prob.(3), but using a flat prior, the Shannon entropy.Indeed, if we take the model M appearing in their formulation to be constant, the ngMEM could be seen as a perturbation of the form ϕ (p, µ) in such a way that It is noteworthy that the presented case is not equivalent to the standard snap-shot MEM imaging.The computation of the χ 2 involves interpolation between adjacent keyframes.Consequently, even when performing the deconvolution without the time entropy regularizer, there is still a small amount of information interchange among neighboring keyframes.
In Appendix D, we present the "super-Hessian" for this particular case and we compare a standard RML method and the ngMEM applied to EHT real observation taken during the 2017 campaign.

Algorithm considerations
The inputs of the algorithm are a set of visibilites correspondent to an observation, the hyper-parameter µ, initial values for the Lagrange multipliers, λ0, ⃗ β0, the number of pixels for each image, npix, niter or total number of iterations used as a stopping criterion control and the observation times, times.

Global philosophy and convergence
The global philosophy is as follows: we first compute the estimated noise value for the χ 2 , Ω, and we determine the keyframes (in Sect.6 we explain the way we have used to estimate it and how we can create the keyframes, in more detail) and, for each of the keyframes, we compute the flux density centered in such a keyframe (for instance, from the intra-ALMA visibilities in ALMA-EHT observations), F l mod .Second, we construct the Lagrange dual problem associated to Prob.(5).Third, since objective and constraints are, at least, twice continuous and differentiable in their domain, we can find, for example, using a (quasi-) Newton method a value for λ * and ⃗ β * such that L λ * , ⃗ β * , p * , t is minimized for some vector p * associated to the vector of multipliers λ * , ⃗ β * by constructing a convergent sequence p k , λ k , ⃗ β k that deals simultaneously with the objective minimization and with the constraint satisfaction.We construct such sequence by solving the osculating quadratic problem (Eq.( 16) and Eq. ( 17)).Quasi-Newton algorithms have the advantage to converge quadratically (see, for instance Bonnans et al. 2006) in a convex neighborhood.Finally, the Second order Sufficient Conditions (SSC) on nonlinear problems (Bonnans et al. 2006), ensure that the vector p * paired with λ * , ⃗ β * is a local solution in a certain neighborhood of the optimization problem, Prob.(5).In case the sequence has not reached a point where the SSC are not satisfied, niter acts as stopping criterion.

Solving the osculating problem
For solving Eqs. ( 16), ( 17) in each iteration until convergence, we have used the Cholesky decomposition, which requires the Hessian to be positive definite.This is not a problem in our case, since the Lagrangian is convex with respect to p.Other strategies for solving the system could have stronger requisites (for example, to be diagonal dominant) and the value of µ -which may affect the value of the Hessian elements far from the diagonal -could be limited by such a requisite.

Globalization
However, the Prob.( 5) is non-convex.As discussed before, this implies that gradient-based strategies do not guarantee convergence to a global minimum.This property imposes, among others, the condition of finding a good starting point: (Quasi-) Newton methods exploit the geometry of the function (gradient and Hessian).Such algorithms generate converging sequences (to an optimal primal-dual solution) if the initial point is close enough to a stationary point of the Lagrangian; otherwise, they may get stuck in a neighborhood of a saddle point, or even they may not converge at all.For this, it is important to have techniques to force convergence, even when the first iterate is far from a solution.This process is called globalization.We have implemented a globalization technique to control the χ 2 .

Starting point
In this case, the dirty image is a natural choice for the initial point.Or even the result of the MEM associated to Prob.(3).A common strategy in penalty methods is to iteratively compute the solution to the problem using the previous solution as a starting point and increasing the values of the penalty multiplier in each iteration.This method allows an increasing accuracy in the solution.Nevertheless, Polyak (1971) computes a rate of accuracy and convergence of this kind of strategy and shows that they are not ever-increasing in accuracy: there is a limit where the gained accuracy is not worth it, compared to the computational cost used to find it.Since µ can be seen as a penalty term in Prob.
(3) and, thus, we are in a similar setting as in Polyak (1971), we have followed this iterative strategy to compute our initial point p0.
We start with the dirty image and we solve the ngMEM for µ = 0.The obtained solution is used as a starting point for the ngMEM with µ > 0. Note that this process could be repeated iteratively, but we have found good solutions with just doing one iteration.In Fig. C1 of the Appendix C, we show a comparison on the obtained solutions for the ngMEM starting with different initial points (the dirty image and the result of our iterative strategy) in one of the synthetic data cases presented in this work (see Sect. 6 for further details).
It is important to stress that, regardless of the starting point, if a local optimal (λ * , β * ) that minimizes the Lagrangian with respect to the multipliers is found by the algorithm, the associated primal solution p * is a valid movie reconstruction.However, it could be a local solution with a big room to be improved (in terms of the value for the cost function).

ngMEM algorithm details
ngMEM depends on a number of free parameters and considerations, among them the selection of the keyframes and the relative weight of the time entropy, µ, stands out.In the next paragraphs we discuss some algorithmic consideration regarding the selection of the keyframes and free parameters.

Keyframe distribution
Following Alg.T1, we first need to split the experiment in keyframes.Our algorithm allows to either uniformly distribute the keyframes during the whole set of observations or imposing the keyframes in particular observing times.The first option is the simpler approach.It has some limitations, like creating "jumps" between different keyframes and not ensuring a linear and smooth transition of the times.The flexibility to select particular observing times for creating a keyframe allows, for example, to create keyframes using the scans as reference (for instance, keyframes are positioned either in the middle of the scans or at the beginning and end of specific scans).Moreover, if the source presents kinematics during a particular interval of the observation, this second option allows to concentrate more keyframes in such period to recover a better evolution of the source (for example Farah et al. 2022, gave specific time windows that proved to be more effective for dynamic imaging reconstruction of Sgr A * during the 2017 EHT).
In general, not all keyframes will have the same number of visibilites, being some keyframes quite limited.

Free parameters
The free parameters, µ and τ , are a measurement for weighting the kinematics of the source during the reconstruction.In essence, they determine the cost associated of getting the most constant solution which still fits into the data.Note that µ affects the whole time-dependent term of the cost function, while τ is a measure of the timescale of the correlation between the values that any given pixel takes at different times.For example, larger values on µ and τ in a short observation time will map into smaller time variability.τ can be computed based on observational data or in theoretical results, as the innermost stable circular orbit (or ISCO) in the case of a black hole.The parameter C regulates the extent to which the brightness of a pixel pn can vary between two keyframes l and m.When the brightness of a pixel remains relatively constant, the corresponding time factor entropy, given by log |p l n − p m n | , tends to infinity.Therefore, C is used to prevent the emergence of meaningless values by constraining the difference between pixel intensities.
The second parameter we need to carefully find is the estimation of the expected noise level (the expected value for the χ 2 which we call Ω in Eqs. ( 5), ( 6)).A too low value makes the algorithm adjust noise, while a too large value may leave a fraction of the true source signal as part of the residuals.
The way we have designed to find the best µ, number of keyframes and their location is by running a gridsearch.We choose a prior parameter space with a range for each of the parameters.In some cases, it will be possible to reduce the dimension of this space.Ideally, we want to reduce the parameter space as much as possible, since the algorithm will be called for each combination of free parameters.

Non-negativity intrinsic constraint
Finally, there is a last intrinsic constraint: we impose that, for all keyframes l, in any iteration of the algorithm, all the components of the brightness vector p l k , have to be greater than a cutoff value, called minimum flux.There are several options to fix this constant (for example, setting it as the rms).Nevertheless, having sparse arrays like the EHT, or long baselines, increases significantly this statistic.If, in addition, the source intensity is low, then we could be limiting ourselves.Therefore, depending on the case, this constant can be more or less conservative.

Differences to previous strategies
Our algorithm is philosophically and mathematically different from the RML and multiobjective optimization methods, since we use the dual approach.We present major points of difference between the ngMEM and the rest of algorithms.
• ngMEM is the first MEM-based algorithm including a time-entropy.As shown in Sec.5.3, it is an extension to the time domain of the traditional maximum entropy problem.
• Previous (RML and non-RML) methods involving the entropy, as the ones cited in Sect. 3 and Sect.4, work with the Kullback-Leibler divergence of the pixel brightness, i.e., they use the relative entropy with respect to a model.On the other hand, our algorithm computes it respect to a flat prior.Therefore, any contrast in the image and/or any time change in the pixel intensity, contributes to the total entropy, increasing the value of the cost function.We consider this approach to be a conservative one, in the sense that it minimizes the model time evolution that is still compatible with the data.
• ngMEM and RML differ in philosophy: ngMEM minimizes the entropy among all solutions that fit the data (direct implementation of Occam's razor) while RML methods try to balance data and regularization terms.
• Typically, RML methods are more flexible with respect to the addition of new terms.However, they approximate the solution, instead of the more rigid primal-dual approach.
• We act on the hyper-parameters in a different way as the one done in RML methods.While RMLs need to survey the hyperparameter space to get the best combinations, we are only considering one hyperparameter in the cost functional for the time evolution, µ (and in a second instance, τ ).The flux and χ 2 regularizers are Lagrange multipliers and thus, their updating is a normal consequence of solving the optimization problem.This leads to a lower dimensional hyperparameter space to be surveyed.
• Cornwell & Evans (1985) found the quasi-Newton step by using an approximation of the Hessian of the Lagrangian, composed by its tri-diagonal terms.This consideration is limiting in case of having a sparse array (as in the case of VLBI).Instead, in our implementation we work with the whole Hessian.As a consequence, we take into account the correct correlation among all pixels and thus we can better overcome the limitations imposed by having point spread functions (PSFs) with high and distant sidelobes, corresponding to sparse snapshot uv-coverages.

SYNTHETIC DATA I: GAUSSIAN SOURCES
In this section, we present the results obtained by applying the ngMEM algorithm to various synthetic data sets,having different uv-coverages.It is worth to recall that we understand by dynamic any kind of evolution/movement either in flux density, in relative position or in both.
Working on synthetic data from which we know the behavior of the sources in time, we can test the impact of several algorithmic choices and free parameters.Particularly we study the effect of µ in the movie reconstruction, and which role the kind of keyframe interpolation used plays, together with the visibility weighting scheme.
The aim of this section is to solve the problem for different number of keyframes and present how the kind of keyframe interpolation used also plays an important role, together with the visibility weighting scheme.
In a first instance, we have simulated four test sources replicating the array distribution and observing times from April 11 of the 2017 EHT campaign, (EHTC 2022a to EHTC 2022d), labeled as: (i) DOUBLE_1S: A double source changing its brightness during the scan 2 (from 09:20:00.0UT to 09:43:00.0UT).During this scan, the uv-coverage was particularly bad (see Fig. 1).In this case, we want to test the algorithm acting on short and weak constrained problems.
(ii) FOUR_2S: Four compact components, with two of them changing their brightness in time.The scans are 2 and 7 (the last one being from 12:48:00.0UT to 13:09:00.0).Scan 7 had the best uv-coverage (Fig. 1).This fact, together with the big jump in time between scan 2 and scan 7, make this source a challenging test for the sensitivity of the algorithm to more complex flux-density evolution.
(iii) FOUR_FULL: Same as point above, but with the scans 2 to 7.
(iv) DOUBLE_2S: Same as first point, but for scans 2 and 7.
The cases are selected based on their increasing level of difficulty.The initial case represents a straightforward scenario.In the second case, there is a significant temporal gap, leading to a substantial loss of uv-coverage.
By examining this case, we demonstrate the beneficial impact of keyframes that contain more visibilities, as they aid in constraining the source brightness distribution during the reconstruction process for keyframes with fewer visibilities.
Finally, the last case involves an observation spanning approximately 10 hours, without any significantly large data gaps.We show how smoother keyframe divisions help for the reconstruction.In all cases, the visibilities are contaminated by independent Gaussian noise in the real and the imaginary parts of the visibilities.
In this section, we have focused on two types of time inter-polation methods for keyframes: nearest and linear interpolation.Nearest interpolation, also known as Nearest-Neighbor Interpolation, employs a piece-wise-constant function to interpolate pixel values between keyframes.On the other hand, linear interpolation provides a more accurate result by interpolating values in a linear manner.However, both methods may result in a lack of differentiability between adjacent keyframes, leading to a non-smooth evolution of the source.Future developments will explore advanced interpolation techniques, such as those based on second-order polynomials or splines.Nevertheless, for the purpose of this section, the existing interpolation methods prove to be sufficient.Moreover, for the sake of readability, the results presented in this section are obtained by utilizing the static (µ = 0) ngMEM reconstruction (solution) as the initial model for the entire experiment.In Appendix C, a comparison is provided by using either the dirty image or the snapshot image obtained using the static ngMEM as initial points for each of the four cases.

Uniformly distributed keyframes
Choosing the appropriate (time) location of the keyframes, as discussed in Sect.5, is a crucial step.The time distribution of the visibilities and how the times of the keyframes are set is a condition for the quality of the reconstructed movie.First, we show the results obtained by an equally spaced keyframe distribution (during the observing time).The total number of keyframes has been set to 3 and 4.
The figure of merit used to quantify the quality of the reconstruction has been the normalized cross-correlation, or nxcorr, (see for instance Farah et al. 2022) between the true (time-dependent) source brightness distribution and the movie obtained using the ngMEM.This test measures the similarity of two (normalized) signals.
Figure 2 shows the effects of increasing µ with respect to the nxcorr.The left four panels, namely (a) to (d), represent the observations of double sources obtained through one scan and two scans, utilizing three and four keyframes.On the other hand, the right four panels, denoted as (f) to (h), illustrate the corresponding results for the four Gaussian sources.
Different markers represent different combinations of interpolation type (linear/nearest) and weighting scheme (uniform/natural).The asymmetric error bars are the distance between the best keyframe (the one with highest nxcorr) and the worst with respect to the mean.
In this simple dataset, several keytrends can be identified: First, observe that, for all cases, µ = 0 has a high nxcorr (with a higher variation for the case FOUR_FULL with near interpolation and natural weight).This is because these sources are morphologically simple.As we will see in the next sections, this is not the case for more complex source structures.Therefore, in these cases the algorithm can capture the variability just with the interpolation of the keyframes, if a proper keyframe division has been imposed.However, there is less variability between keyframes (in terms of error bars) when having more scans (see for instance Fig. 2).This is because the scan with better uv-coverage helps to recover the evolution of the source.
Second, in general and as expected, nearest interpolation is always worse than linear interpolation, although they are comparable within the error bars.This observation, along with others, has motivated us to develop a more accurate metric discussed later.

Keyframes chosen by observing times
In a more general case, a uniform distribution of keyframes may not effectively capture the inherent variability of the source.To address this limitation, our algorithm provides the flexibility to strategically place keyframes that are specifically tailored to correspond with desired observing times.As a specific example, we have chosen to place a keyframe at the start and the end of each scan.This approach enables us to capture the evolution of the source within each scan.However, it should be noted that this method results in a larger number of keyframes, and therefore, an increased computational cost associated to the (much larger) Hessian of Eq. ( 12) during the optimization process.
Figure 3 shows the nxcorr versus µ for each simulation in the case when the keyframes are set to the start and end of each scan.It is worth remarking that we can see how the interpolation and weighting start to play a role, for example, for the case DOUBLE_2S or FOUR_2S.

Summary of the results
In this section, we have applied the ngMEM algorithm to Gaussian-like synthetic sources using different scans of the EHT array in April 7. We have seen the effect of µ on movie reconstructions presenting different complexities.
The analysis highlighted the advantageous impact of keyframes with greater visibility counts in constraining source brightness distribution during reconstruction, especially for keyframes with fewer visibilities.We have verified the importance of the time interpolation (near and linear) between keyframes and how it helps to reconstruct a more better movie.
We have given two keyframe distribution strategies: uniformly across the scan and placing keyframes at the start and end of each scan.The keyframe distribution is crucial to capture the dynamics of the source.
Nevertheless, the kinematics of the source are simple enough to be easily captured.In the following section, we investigate different keyframe distributions in a more complex setting.In addition, we have noticed that the nxcorr metric can be not accurate enough to evaluate the fidelity of a reconstruction and we will present a better figure of mertit.completing two full orbits before dissipating.For the purpose of these simulations, we have disregarded any potential scattering effect (see EHTC 2022c,d, for more details on the noise modeling of Sgr A * ), but we have added to the complex visibilities the thermal noise expected from the receiver and enhanced by atmospheric opacity based on Roelofs et al. (2023).Figure 4 shows the coverage of the uv-plane.This array consist of 20 antennas, with a shortest baseline of ∼ 10 5 λ and a longest baseline of ∼13.5 Gλ.
The distinct time evolution of the source is crucial in determining the number of keyframes and their temporal distribution.This section highlights the major role these two factors play in successfully recovering the dynamics of the simulated hotspots.
Moreover, as seen in the previous section, the nxcorr metric is not enough to evaluate a movie.In this regard, we have developed the K − front, a figure of merit that allows to evaluate the solution with respect to the model movie in every integration time.This metric measures the distance between the obtained reconstruction and an "ideal" solution in every integration time.A smaller distance signifies a more accu-rate reconstruction (with respect to the "ideal").We refer to Appendix A for more details.
Henceforth, we have fixed the weighting to be uniform and the time interpolation between two consecutive keyframes to be linear.Uniform weighting has been used to avoid possible over-weighting of the baselines associated to ALMA.The exceptional sensitivity of this telescope can result in the dominance of its visibilities within the image if a natural weighting scheme is applied.However, in the case of Sgr A * , this dominance can be used to derive a good estimate of F l mod (from intra-field dynamic amplitude calibration; see Mus et al. 2022;Wielgus et al. 2022) to be used by the ngMEM.Hence, during the optimization process, the "lightcurve constraint" will force k p l k to be as close as possible to F l mod , by updating the ⃗ β Lagrange multiplier, just as the λ multiplier will also be updated to enforce the χ 2 condition.To achieve this, we use, Eq. ( 17).In addition, the χ 2 constraint plays a role defining the feasible set depending on the noise estimation.A too conservative estimation can result in an image with an incomplete decon- volution of the source structure, while a too low value of the imposed rms can introduce artifacts associated to noise.In this work, we estimate the visibility noise by computing the rms of the visibilities, divided in chunks of a given duration.
We have tested different chunk periods for the two cases (constant and transient hotspot) to find the best noise estimate (1.875 × 10 −1 and 2.41 × 10 −1 respectively).
In the next two sections, we show the obtained results for these two simulated cases.

Persistent hotspot
For the case where the hotspot persists during the whole observation, the kinematics of the hotspot can be captured with a uniform distribution of keyframes.We have performed a gridsearch exploration, based on the nxcorr metric, to find the number of keyframes and µ to best capture the movement.In particular, we show the results obtained using 10 and 15 keyframes.In the case of 10 keyframes, the optimal value found for the parameter µ is 700, whereas for 15 keyframes, µ is determined to be 200.
Figure 5 illustrates the azimuth angle (vertical axis) and time (horizontal axis) corresponding to these optimal values.The figure showcases the reconstructed temporal variation of the azimuthal brightness distribution.The hotspot brightness dominates over the noise and the ring.Since it is orbiting with constant angular velocity, the hotspot signal manifests as a diagonal feature in the figure.We observe that as the number of keyframes increases, the definition of the hotspot's orbit becomes clearer.

Transient hotspot
In contrast to the previous scenario, where scans were evenly distributed, we have manually selected a more strategic distribution for the scans in this case.Considering that the hotspot exists within a specific timespan of the observation, we have placed a higher density of keyframes during the presence of the hotspot and a lower density of keyframes over an extended timeframe when the hotspot is not present.This approach enables us to capture the movement of the hotspot more efficiently.Moreover, in this case, we show how keyframe inter-polation is useful when certain keyframe are placed in a time without associated visibility.
We have conducted a gridsearch using 14, 15, 16 and 17 keyframes.The results indicate that 15 keyframes produces the best movie based on the K − front figure of merit (see Appendix A).We note, again, that increasing number of keyframes tends to recover a better orbit definition for the hotspot.
In Fig. 6, we present profiles for four different numbers of keyframes: 14, 15, 16, and 17, displayed from right to left.Each profile is generated using the movie with a fixed value of τ , representing the "image memory" as defined in Eq. ( 6).Specifically, we set τ = 0.08 for 14 keyframes, τ = 0.07 for 15 and 16 keyframes, and τ = 0.06 for 17 keyframes (as indicated in Tab.B1), and µ set to the value that produces the best movie reconstruction, according to the K − front.Before ∼10:30 UT, a homogeneous brightness distribution, corresponding to the ring structure, can be seen.From ∼10:30 UT to ∼12:30 UT, which corresponds to ∼ 2 h, we start to see the orbiting hotspot as a bright feature with a changing azimuthal angle with time.
This highlights the importance of keyframe interpolation when certain keyframes lack associated visibilites.Fig. 7 shows the 60s averaged visibility amplitudes of the source as black dots along duration of the experiment (where its duration has been rescaled from 0 to 1) for the four cases (from left to right) of 14, 15, 16 and 17 keyframes.Vertical lines denote the position of the keyframes.If the vertical line intersects visibilities, the keyframe will have usable data at its central time.Otherwise, the model will time-interpolate and the τ parameter (i.e., the way information at different times is propagated to that keyframe) will have a major impact on the keyframe image reconstruction.The bottom set of panels is a zoom to the time-range from 10 UT to 11 UT, where some keyframes are not having any data point.This is translated into the "discontinuity" appearing in Fig. 6 in around 80 to 100 degrees.Because this keyframe has visibilites relatively distant in time, the algorithm does not recover the source brightness with high accuracy around that time.

Final remarks on synthetic data
Through Sec. 6 and Sec. 7, we have seen how the ngMEM algorithm is able to recover different types of kinematics for both simple sources, such as Gaussian profiles, and more complex structures like rings with an orbiting hotspot.Notably, in the case of ring structures with a hotspot, our algorithm effectively captures the evolution of the hotspot, regardless of whether it persists over time or undergoes rapid transitions.
To achieve optimal recovery of these kinematics, it is crucial to appropriately set the free parameters, which directly impacts the quality of the results.By selecting the optimal values for the free parameters, we can attain high scores in the figure of merit, indicating a successful reconstruction of the source's kinematics.In contrast with Sec. 6, in all cases µ = 0 performs the worst (see Appendix A).This fact proves that the time entropy regularizer helps to better capture the evolution of the source.
Additionally, our algorithm demonstrates its capability to reconstruct movies even when isolated keyframes lack associated visibilities.In such cases, the "image memory" parameter, τ , significantly influences the reconstruction process.The   value of τ influences how the algorithm uses information from neighboring keyframes to fill in gaps and ensure a coherent reconstruction, thereby mitigating the impact of missing visibilities.

COMPARISON WITH ANOTHER DYNAMICAL RECONSTRUCTION METHOD
As a conclusion of the results, we qualitatively compare ng-MEM with the RML method described in Johnson et al. (2017).Johnson et al. (2017) consider a particular instance of Prob.
(2) which incorporates multiple dynamical regularizers.Particularly, they define R∆t, which aims to enforce that the brightness of the reconstructed image remains piecewise smooth between two adjacent snapshots, and R∆p, aiming to minimize the deviations of each snapshot from the mean image.
In order to replicate the experiment conducted in their Fig. 4, we utilized the same dataset that was shared through private communication.This dataset was originally presented in Shiokawa (2013) based on a 3D general relativity magnetohydrodynamics (GRMHD) simulation.In their reconstruction, Johnson et al. (2017) employed the dynamical regularizer R∆I (defined in their Eq.8), suitable for cases in which frames present small variations with respect to the time-averaged image.
Figure 8 shows image reconstructions using ngMEM of the same model at the same times showed in the cited figure of Johnson et al. (2017) and µ = 4.The first column represents the model at three different moments (with three dif-ferent instantaneous uv-coverage).The second column shows the model convolved with a 20 µas Gaussian.Third column is the ngMEM reconstruction.Because the early and late frames have few data constraints, the dynamical imaging reconstructions at those times do not recover the kinematics as good as for the middle keyframe.
Since Johnson et al. (2017) do not report any quantitative comparison of their recovered keyframes with the true brightness distribution, it is not possible to compare quantitatively their results to ours.In any case, a visual inspection of Fig. 8 with Fig. 4 of Johnson et al. (2017) seems to indicates that ngMEM is performing better for this dataset.
The nxcorr for every frame in our reconstruction can be seen in Fig. 8.

SUMMARY AND CONCLUSIONS
Imaging in radio astronomy has long been a vibrant area of research, particularly in recent years with VLBI achieving unprecedented resolution through projects like the EHT.Furthermore, the future ngEHT promises significantly higher resolution capabilities, potentially enabling the temporal resolution of event-horizon scales of SMBHs with high intraepoch time cadence.
Time variability has appeared to be a game-changing factor in the traditional conception of interferometric imaging, which proves to be critical for the fast-changing SMBH Sgr A * .This paper introduces a new non-convex, constrained optimization problem extending the traditional MEM formulated by Cornwell & Evans (1985), known as ngMEM.This extension involves integrating a new regularizer term into the cost functional to address time evolution.It is the first MEMtype dynamic imaging algorithm proposed for VLBI.
Our algorithm's main advantages are: 1) utilization of the full Hessian of the Lagrangian, leveraging information encoded in correlations among adjacent and distant pixels/keyframes, and 2) integration of a time entropy regularizer enforcing minimal time variability in the source image while remaining consistent with the data and 3.) a flexible framework for time interpolation and keyframe selection.
Along this paper, we have developed the primal-dual optimization formulation of the ngMEM.We have given a numerical method to solve it and we have justified its quadratic convergence.We have also listed the differences and advantages of this method with respect to some well-known alternatives, and we have proposed an iterative method to improve the starting point of the algorithm and, consequently, to get a better fidelity on the final movie.We have tested this algorithm with different cases on synthetic data, even emulating a transient hotspot compatible with the results on Sgr A * reported by Wielgus et al. (2022).To compare movie reconstructions and to quantify their fidelity, we have defined a novel figure of merit, the K − front.Lastly, we have conducted a performance comparison between our algorithm and the algorithm proposed by Johnson et al. (2017), by applying ngMEM to the same dataset they presented in Figure 4 of their study.
This algorithm can be easily adapted to perform dynamic studies, not only of Sgr A * and other active galactic nuclei, but also X-ray binaries, inter-epoch of long term observations and many other cases concerning kinematics.Currently, we are in the process of extending this algorithm to include antenna gains and polarimetry, with the aim of being able to recover robust polarimetric movies of fast-changing radio sources.

APPENDIX A: A STRONGER CRITERION TO COMPARE MOVIES
The figure of merit based on asymmetric error bars gives a good overall idea of the behavior of the reconstruction with different parameter combination, such as µ, the visibility weighting scheme or the time interpolation.However, it is not a very accurate metric for selecting the better movie, since among other problems, in general, big error bars can be obtained, and therefore a big subset of solutions could be congruent within the error bars.For this reason, we have developed a new metric explained in this appendix.

A1 Formalism
Remark 1.Throughout this section we suppose we have a groundtruth that we use to compare the nxcorr of the different movies.
In this appendix, we denote by κ a movie reconstruction5 , i.e., a solution of the ngMEM.Any κ is composed by a ordered set of keyframes.So, given a movie reconstructed with the ngMEM, κ, of n keyframes, we can write κ := {f κ 1 , . . ., f κ n }.We denote by nxcorr (f κ k ) , 1 ≤ k ≤ n the nxcorr of the k-th keyframe.In case of having a set of p movies composed by n keyframes each, we denote by f κ l j the keyframe j corresponding to the movie κ l , 1 ≤ l ≤ p. Definition 1. (Set of movies) Given a fixed set of visibilities corresponding to a specific observation, and a fixed number of keyframes, we denote by K the set of all local minima (by setting different parameters, as µ) of the ngMEM.Every element κ ∈ K is a movie.
Definition 3. (Super-dominant keyframe) Given a movie κ ∈ K we say its keyframe Further elaborating on this concept, suppose that the set K consists of l movies.Observe that, for all movies in K, we can fix one keyframe.Then, since the set of keyframes {f κ 1 1 , . . ., f κ 1 n } is a totally ordered set with respect to the nxcorr metric, we can pick the best keyframe (in terms of nxcorr) among all movies.This chosen keyframe, is a superdominant frame.This set determines an upper bound for the set keyframes.None of the movies in K have a frame better than the corresponding one of the K −front.Of course, the movie generated by joining such a keyframes in general is not a valid movie, meaning is not, necessarily, a solution of the ngMEM, but it would be the best, the desired, solution to get.Only if the ngMEM has a global solution and it is in K or K is a singleton, then it would happen that the K − front is a feasible movie.
Definition 6. (Ideal movie) The movie (assuming the abuse of language, since it is not a movie understood as a solution of the ngMEM) generated by the keyframes of the K − front is called the ideal movie and is denoted by κ.
The name of ideal is well-motivated.In the best case scenario, we would get the K − front movie.However, because of data and algorithm limitations, we never recover the groundtruth model.We will never get a movie where all keyframes have nxcorr 1.But, in order to define the metric, we need this groundtruth.
Definition 7. (Utopian movie) The groundtruth movie is called the Utopian movie and it's denoted by m.Now we are in the setting of establishing a criterion to compare two movie reconstructions, by defining their distance to the front.
Definition 8. (Distance between two movies) Given two movies from K, κ1 Observe that d (κ1, κ2) defines a mathematical distance: it is positive definite, symmetric and satisfies the triangular inequality.This distance tells us how distant are κ1 and κ2.As bigger the distance is, the less similar the movies are.Then, in order to get a criterion to chose the best solution among the possible reconstruction, we just need to compute the distance to the K − front.
Definition 9. (Distance to the front) The best possible movie in K is the one whose distance to the front (or to the Utopian movie) is minimal.Such a movie is denoted by κ .
Remark 3.For all movie κ ∈ K, it always holds Since the model remains independent of the number of keyframes (each keyframe have nxcorr one), movies with different keyframe distribution can be compared by normalizing the distance to the Utopian through division of each term in summation Eq. (A1) by the respective keyframe length.The distance to the Utopian movie serves for comparing movies "interkeyframe", while the distance to the K − front provides a relative comparison to other solutions obtained in the search of the parameter space ("intrakeyframe").Utilizing the K − front as a guide can aid in determining if further exploration of the parameter space is necessary.
With this framework, we can find the best reconstruction (the one with less distance to the K − front) for the more complex source: ring plus a hotspot and for the ring with a hotspot simulating the one of Sgr A * .A2 K − front applied to the Synthetic data II In Fig. A1, we show the space K for the case of ring with a constant period hotspot.The x-axis of the figure are keyframes (and the keyframes are labeled with their associated number) and the y-axis the nxcorr.The movie with constant nxcorr equals to 1 is the Utopian movie (the perfect/groundtruth movie).The ideal movie is also plotted and it is the upper bound frame-wise of the all movies.In Tab.A1 we can find the distance of every reconstruction to the ideal and Utopian movie for 10 and 15 keyframes.Among all solutions, the closer one (the best) is for the case µ = 700 and µ = 200 respectively.
In Fig. A2, we show the set of movies K for the case of Sgr A * and the spurious hotspot for 14 to 17 keyframes (top left, top right, bottom center resp.).Tables A2 show the distance to front for all this cases.The profiles presented in Fig. 6 are done with movies with less distance.
With these examples, we can see the potential of the K − front: visually we can get an idea of the better movies and easily find them without any uncertainty or ambiguity.Again, we can observe how µ = 0 is the movie performing the worst.Time entropy always helps to recover the variability of the source.the dirty image, even if in most frames recovers the hotspot, is much noisier.

APPENDIX D: TIME-INDEPENDENT ngMEM
In this appendix, we focus on the performance of the timeindependent ngMEM and provide a comparison with the use of a RML method.When reconstructing static images using ngMEM, it is sufficient to set µ = 0. Consequently, the super-Hessian described in Eq. ( 12) is simplified to: Using DoG-HIT6 (Müller & Lobanov 2022, 2023a), we have implemented Prob.(2) with active regularizers for the χ 2 for the visibilities, the entropy and total flux (see, for instance Chael et al. 2018a).Then, we have imaged the SMBH Messier87 * , M87 * , observed with the uv-coverage from April 11, 2017(EHTC 2019a to EHTC 2019f).The results are depicted in Fig. D1.The left panel showcases the M87 * image released by the EHTC to the public.In the center and right panels, we present the reconstructions obtained using the unconstrained problem and the static ngMEM, respectively.Both reconstructed images have been convolved with a 20 µas beam.kf 0 at 0.0 UT kf 1 at 2.6 UT kf 2 at 5.3 UT kf 3 at 7.9 UT kf 4 at 10.6 UT kf 5 at 13.2 UT kf 6 at 15.9 UT kf 7 at 18.5 UT kf 8 at 21.2 UT kf 9 at 23.8 UT kf 0 at 0.0 UT kf 1 at 2.6 UT kf 2 at 5.3 UT kf 3 at 7.9 UT kf 4 at 10.6 UT kf 5 at 13.2 UT kf 6 at 15.9 UT kf 7 at 18.5 UT kf 8 at 21.2 UT kf 9 at 23.8 UT

Figure 1 .
Figure 1.uv-coverage of the EHT observations of Sgr A * , taken on April 7, 2017.The data are arranged in scans, shown in different colors.The number of visibilities in each scan are given in the figure legend (in parenthesis).

7Figure 2 .Figure 3 .
Figure2.nxcorr value for different µ for the four sources using three keyframes ((a) to (d)) and four keyframes ((e) to (h)).Different colors and markers represent different combinations of weighting and interpolation.In each plot, the µ producing the best movie reconstruction is indicated (bottom right corner).Dash horizontal line indicates nxcorr = 1.For visual clarity and to avoid overlapping of the different markers, we have introduced an small offset in the µ axis.

Figure 4 .
Figure 4.The ngEHT uv-coverage in a 24 h observations of Sgr A * .Different colors represent different baselines.The 20 ngEHT antennas can be found in Roelofs et al. (2023).

Figure 5 .
Figure 5. Azimuthal distribution of the ring brightness versus time for the persistent hotspot.Images between keyframes have been computed using a linear interpolation.Black stripes are placed where there are no visibilities and the vertical (blue) lines indicate the keyframe times.The movie is composed of 1024 frames and 10 keyframes.Color scale indicates brightness of the source.Left panel: µ = 700.Right panel: µ = 200.

Figure 6 .
Figure 6.Azimuthal distribution of the ring brightness versus time for the transient hotspot model during the existence of the hotspot.Images between keyframes have been computed using a linear interpolation.Black stripes are placed where there are no visibilities and the vertical (blue) lines indicate the keyframe times.Color scale indicates brightness of the source.

Figure 7 .Figure 8 .
Figure 7. 60s time-averaged visibility amplitudes (black dot points) corresponding to the transient hotspot simulation during the whole experiment.Keyframe times are shown as vertical blue lines.There are cases where the keyframes contain visibilities and cases where they do not.Bottom row : Time interval zoom where there is a big gap on the visibilities.This corresponds to the discontinuity of the profile for the angles in the range ∼ 80 to 100 degrees.

Figure A1 .
Figure A1.Values of the nxcorr for every keyframe for different µ.Left panel is for the case of 10 keyframes and right panel for 15 keyframes

Figure A2 .
Figure A2.Comparison between 10 keyframes movies for different values of µ for the ring and hotspot appearing for 2 h.Top left panel: 14 keyframes.Top right panel: 15 keyframes.Bottom left panel: 16 keyframes.Bottom right panel: 17 keyframes

Figure B2 .
Figure B2.Groundtruth flux (continuous line) vs recovered flux (dots) by the ngMEM for the case of Sgr A * with an spurious hotspot.In continuous line, the flux's model, in bullets the values recovered by the algorithm.In the x-axis the UT and in the y-axis the amplitude ( Jy).

Figure
Figure C1.K − front comparison for the 24 hours orbiting hotspot using the dirty image (right column) and the result of the static ngMEM (left column) as the starting point of the algorithm.

Figure D1 .
FigureD1.The reconstructed image of M87 * using an RML method, as the one described inChael et al. (2018a), is displayed in the center panel.On the right panel, the reconstruction obtained using the ngMEM with µ = 0 is shown.Both reconstructed images have been convolved with the same 20 µas beam, represented by a white circle.The fov is indicated in the center image.To assess the similarity between the reconstructed images and the publicly available EHT image (shown in the left column), the nxcorr values are displayed in the top corner of each image.

Table A1 .
Distance to the K − front for each movie reconstruction Distance to the K − front of ring with a persistent hotspot.Left table: 10 keyframes.Right table: 15 keyframes.Distance to the K − front of ring with a hotspot appearing only for 2 h.Top left table: 14 keyframes.Top right table: 15 keyframes.Bottom left table: 16 keyframes.Bottom right table: 17 keyframes.The lesser the distance, the better the reconstruction.

Table B1 .
Free parameters Notes: Values of the free-parameters: 1) number of keyframes, 2) time entropy regularizer µ and 3) image memory τ used for the movies presented in this Appendix.