A generalized curvilinear solver for spherical shell Rayleigh-B\'enard convection

A three-dimensional finite-difference solver has been developed and implemented for Boussinesq convection in a spherical shell. The solver transforms any complex curvilinear domain into an equivalent Cartesian domain using Jacobi transformation and solves the governing equations in the latter. This feature enables the solver to account for the effects of the non-spherical shape of the convective regions of planets and stars. Apart from parallelization using MPI, implicit treatment of the viscous terms using a pipeline alternating direction implicit scheme and HYPRE multigrid accelerator for pressure correction makes the solver efficient for high-fidelity direct numerical simulations. We have performed simulations of Rayleigh-B\'enard convection at three Rayleigh numbers $Ra=10^{5}, 10^{7}$ and $10^{8}$ while keeping the Prandtl number fixed at unity ($Pr=1$). The average radial temperature profile and the Nusselt number match very well, both qualitatively and quantitatively, with the existing literature. Closure of the turbulent kinetic energy budget, apart from the relative magnitude of the grid spacing compared to the local Kolmogorov scales, assures sufficient spatial resolution.


Introduction
Turbulent thermal convection is ubiquitous in nature as a primary driving mechanism for atmospheric and oceanic circulations [1].Such convective motions in Earth's outer core or in the solar convective zone, for example, provide energy to sustain global-scale magnetic fields in planets and stars [2,3].Such flow phenomena are further enriched due to the presence of global rotation, external or self-generated magnetic fields, chemical reactions, phase change, the porosity of the medium, and particle suspension [4].Furthermore, the design of heat exchangers, cooling systems for electronics, and indoor air circulation systems requires a fundamental understanding of thermal convection [5,6].Rayleigh-Bénard convection (RBC) is a simple model of thermal convection, where a fluid layer between two parallel plates is heated from below and cooled from above.Such a plane layer geometry can be considered, for example, as a local approximation of the tangent cylinder region of Earth's outer core, which is situated between the top and bottom surfaces of the solid inner core and extending towards the north and south poles, respectively, up to the core-mantle boundary.Simulations in the plane layer geometry can reproduce the basic force balance and heat transfer behavior that can be validated from well-designed laboratory experiments.Therefore, this flow configuration has been extensively studied, with the individual or combined effect of global rotation and magnetic fields [7,8] to model various geophysical and astrophysical turbulent flows [9].
In the geophysical and astrophysical context, however, a spherical shell geometry is more pertinent to modeling planetary cores or stellar convective zones.The most extensive body of literature in this geometry focuses on "geodynamo" simulations that attempt to model convection in Earth's outer core convection and the associated geomagnetic field originating from it [10].Mantle convection [11], rapidly rotating convection [12,13], RBC without rotation and magnetic field [14,15,16], deep convection in gas giants [17,18], and solar convection [19] are among the other prolific areas of research where spherical shell models are implemented.The superiority of these models lies in their capability to model many essential dynamical features of planetary atmospheres, such as thermal winds, strong shear layers, magnetic buoyancy, meridional circulations, and large-scale flows.They can also incorporate important geometric constraints, such as tangent cylinders and curvature effects near the boundaries, whose combined or individual influence can not be accounted for in a local Cartesian plane layer configuration [20].
The local plane layer and the global spherical shell simulations differ primarily in the direction of gravity, which is generally kept vertically downwards in the local Cartesian models.In contrast, the direction is radially inwards in global spherical shell models.Additionally, rotating convection in spherical shells exhibits distinct scales in the radial, axial, and azimuthal directions [21], whereas, for the local Cartesian model, we need to consider only two spatial scales: the horizontal scale of convection and the vertical scale over which convection occurs.For both geometries, the governing non-dimensional parameters are the Rayleigh numbers (Ra), which is a non-dimensional measure of the thermal forcing, and the Prandtl number (Pr), representing the viscous to thermal diffusivity ratio.Apart from this, the flow properties may also depend on the aspect ratio Γ = W/H (where W and H are the horizontal and vertical extents of the domain) and the radius ratio Γ = r i /r o in-plane layer and spherical shell geometries, respectively.The important global diagnostic quantities are the Nusselt number Nu and the Reynolds number Re, representing the nondimensional heat transfer and flow speed.
An intriguing question in this research direction is the scaling relation between such a diagnostic quantity with a governing input parameter, such as Nu, as a function of Ra.The thermal convection in planets and stars occurs at parameter values that are several orders of magnitude away from the reach of state-of-the-art numerical simulations and experiments.Therefore, these scaling relations are valuable tools to extrapolate the results of these experiments and simulations to planetary and stellar convective regimes.For plane layer geometry, a Nusselt number scaling of Nu ∼ Ra 2/7 is found for moderate thermal forcing (Ra ⩽ 10 10 ), whereas, for higher thermal forcing, a scaling relation of Nu ∼ Ra 1/3 has been widely reported [22].A systematic investigation has been reported by [14], who found the same scaling laws for the Nusselt number in the spherical geometry.It should be noted here that though the global diagnostic quantities exhibit similar behaviour, the local properties, such as the thickness of the viscous and thermal boundary layers, are markedly different in the two geometries.For example, the effect of curvature and a radially varying gravitational acceleration (as appropriate in Earth's core) results in asymmetric boundary layers in the spherical geometry, in contrast to the symmetric boundary layers in a plane layer geometry.
Experimental difficulties related to the radial direction of gravity make the advances in spherical shell convection almost entirely dependent on massively parallel numerical simulations.Existing solvers [23], [15] use spherical harmonic decomposition of the flow variables in the azimuthal and latitudinal directions while Chebyshev polynomials are used in the radial direction for proper resolution of the boundary layers.In this paper, we report on the development, implementation, and validation of a new finite-difference solver for studying spherical shell convection.The solver can map any three-dimensional curvilinear geometry to a computational Cartesian domain using the Jacobi transformation.This enables us to solve the conservation equations in Cartesian coordinates, which are much simpler than their spherical coordinate counterpart, even after their modification by the Jacobi, elongation, and stiffness matrix coefficients.Furthermore, the effect of the ellipticity of the core-mantle boundary [24] and the anisotropic shape of the inner core [25] on the azimuthal and latitudinal variation of radial heat flux can be accounted for.The capability to account for any effect of the non-spherical boundaries is the primary motivation for developing the present solver.The solver uses second-order central spatial discretization, while temporal discretization is achieved with the fractional step method [26].In order to avoid the stiffness induced by the fine resolution near the boundary layers, the viscous terms have been treated implicitly, while the other terms are marched explicitly.The fractional step marches the velocity field into an intermediate field by a combination of the Alternating Direction Implicit method (ADI), the Crank-Nicolson method (CN), and the third-order low-storage Runge-Kutta method (RKW3) [26].The remaining procedure in the fractional step method is to remove the divergence residual from the velocity field after the end of each RKW3 step, which in turn is achieved by pressure correction.We use the multigrid HYPRE module to accelerate the pressure correction.The rest of the article is structured as follows.Section 2 discusses the governing equation used.The numerical scheme is described in 3. Results are presented in Section 4 and summarized in section 5.

Governing Equations
We aim to investigate Rayleigh-Bénard convection of an incompressible, Newtonian, Boussinesq fluid in a spherical shell geometry as illustrated in figure 1.The spherical shell has an inner radius r i and an outer radius r o kept at constant temperatures T i and T o , respectively.The shell gap d = r o − r i , the temperature difference ∆T = T i − T o , and the free-fall velocity u f = {g 0 α(T i − T o )d} 1/2 have been used as the characteristics scale for length, temperature, and velocity, respectively, to nondimensionalize the governing equations.Here, g 0 is the gravitational acceleration at the outer radius.The relevant fluid properties are the kinematic viscosity (ν), thermal diffusivity (κ), and thermal expansion coefficient (α).The non-dimensional governing equations are expressed below using a Cartesian coordinate system.
Here θ and ϕ are the colatitude and longitude as shown in figure 1.The non-dimensional temperature difference is defined as T = (T f − T o )/(T i − T o ), where T f is the temperature of the fluid.The non-dimensional parameters in these equations are the Rayleigh number and the Prandtl number defined below.
In the subsequent section, we will use a coordinate transformation to convert the spherical domain to a Cartesian domain.

Coordinate Transformation
To solve the governing equations 1-3 in a generalized curvilinear coordinate system we perform coordinate transformation.The basic idea behind a coordinate transformation is to transform a set of physical laws written in Cartesian coordinates x 1 , x 2 , x 3 into an alternative form based on generalized curvilinear coordinates ζ, η, ξ [26].
Physical law written in the Cartesian system Physical curvilinear grid Physical law written in the generalized system Computational Cartesian grid, Jacobi terms Such a transformation will result in the inclusion of additional coefficients in the space derivatives in the governing equations, and the relation of this transformation between the Cartesian and the generalized curvilinear coordinate system is stored in a Jacobi matrix (J).The continuity, momentum, and energy equations after the transformation are expressed below.
Here, x i denotes the coordinate i of the Cartesian system, and ζ i denotes the coordinate i of the generalized system.The notations x i = (x 1 , x 2 , x 3 ) = (x, y, z) and ) have been used interchangeably.After the transformation, the transformed governing equations are solved as if in a Cartesian system.In this context, grid transformation is often synonymously used with coordinate transformation as the curvilinear domain (i.e., a spherical shell domain in our case) is transformed into a new computational Cartesian domain.Here J −1 ,C i j and G i j are Fig.3: A transformed computational cell associated with the grid point (i, j, k); where i, j, k are the integer indices used to identify discrete space in the ζ, η, and ξ directions, respectively.
The determinant |J −1 | is the volume ratio of the original cell to the transformed cell, whereas C i j and G i j are grid elongation and skewness coefficients, respectively.The side length, and consequently the side area and the total volume, of a transformed cell is chosen to be unity.

Jacobi terms
Figure 3 demonstrates a cell (i, j, k) in a transformed computational domain.The Jacobi terms, J −1 , C pq , and G pq , as expressed in equations 8 and 9 are stored at the cell's faces.The calculation of J −1 , C pq and G pq is given below.1. J −1 is computed at every cell face, denoted by J −1, f c ; where f c indicates cell face (1-3), by calculating all the nine components in J −1 .For instance, we can compute the components of J −1,2 of a cell (i,j,k) as follows, 4. Compute J f c = ∂ζ i /∂x j simply by the straight-forward inversion, J −1, f c : pq and G f c pq from J f c using equation 9.

Spatial discretization
The spatial derivatives in 6 and 7 are discretized using a second-order central finite difference scheme.Figure 4 illustrates the stencils used to discretize the term 10 using this scheme.

Temporal discretization
For temporal discretization, a fractional step method is used where a velocity field is sequentially advanced in multiple substeps.We use a combination of the Alternating Direction Implicit method (ADI), the Crank-Nicolson method (CN), and the third-order low-storage Runge-Kutta method (RKW3) to march to an intermediate field as described below [26].

Alternating Direction Implicit method
Alternating Direction Implicit (ADI) method has been used to treat the viscous term implicitly while marching in one direction at a time.We demonstrate the method with a two-dimensional diffusion equation as shown in equation 14.To solve 14 using the Euler method, the procedure is to perform implicit Euler in the x direction with explicit Euler in the y direction for the first half (∆t/2), and vice versa for the second half ∆t/2 as shown in 15 and 16.

Crank-Nicolson method
The Crank-Nicolson (CN) method splits the right-hand side into two equal parts, the implicit and the explicit, as demonstrated in 17 and 18.

Third order Runge-Kutta method
The third-order low-storage Runge-Kutta method (RKW3) uses only two storage variables.Marching is accomplished in three substeps, briefly summarized here.Given an equation for ϕ, RKW3 is implemented in the following manner, Here, rk goes from substep 1 to substep 3, and the values of h, β, and Π are given in table 1.

The ADI-CN-RKW3 combined marching scheme
The above-mentioned algorithms (ADI, CN, and RKW3) are combined to march the governing equations to an intermediate state temporally.The right-hand side of equation 6 is split into explicit and implicit terms as indicated by the subscripts ex and im in equation 21.Depending on the grid skewness G i j , the diagonal components of the viscous terms are susceptible to the stiffness of the discretized systems and are, therefore, marched implicitly.The ADI scheme is used since there are three viscous terms containing G 11 , G 22 , and G 33 .At a given time, they are split into two parts using CN.These steps are shown in 22, 23, and 24 as an example for substep 1 of the RKW3 marching scheme.
Here, □ represents all the terms to be marched explicitly.
The intermediate velocity fields u ⋆ i , u ⋆⋆ i , u ⋆⋆⋆ i are obtained by solving a set of tridiagonal matrices that result from the spatial discretization of the equations 22, 23, and 24 in the ζ, η, and ξ directions respectively.The intermediate velocity u ⋆⋆⋆ i is the first step in the fractional-step scheme.We employ the Thomas Algorithm with Pipelining, as described in the next section, to solve the tridiagonal system 22-24 to obtain u ⋆ , u ⋆⋆ , and u ⋆⋆⋆ .

Thomas Algorithm
Let us consider solving Aψ = g for ψ, which is the outcome of the spatial discretization of, for instance, 22.Here A is a tridiagonal matrix given as The first two relations in 25 are, Substituting ψ 0 from 26 into ψ 0 in 27 gives b where b The algorithm involves two stages, forward sweeping and backward substitution.The sub-diagonal elements a 1 − a n are removed using Gaussian elimination during the forward sweeping step.Therefore, equation 25 takes the form: At the end of the forward sweep, we can solve for ′ n in 29.Subsequently, we solve for ψ n−1 -ψ 0 (equations n − 1 until 0) as in ψ i = (g For the grid in Section 3.7, periodic boundary conditions are enforced in the ξ direction.Therefore, the above-mentioned Thomas algorithm is modified as follows.Consider the discretized system Aψ = g with periodicity as in 30 where ψ 1 = ψ n−1 and The first step includes separating equation 30 into a tridiagonal system 31 with an additional equation 32.
The tridiagonal matrix on the left-hand side is defined as [A c ] and [g] as the g-column matrix on the right-hand side.
be the solution of the system 31 where Substituting ψ in 33 into ψ 1 and ψ n−1 in 32 gives Rearrange 36 for In summary, to solve the system 30, we employ the following steps:    to CPU i+2 and so on.After the forward sweeping is finalized, backward substitution starts, and reverse sweeping is performed, as shown in 39.However, the only information being sent from CPU i+1 to CPU i is ψ i+1 s .
For a periodic system, we use the following steps: 1. Construct 34 and 35 2. Use the parallel Thomas subroutine to solve 34 and 35 for [ψ1] and [ψ2] 3. CPU 0 owning the first block (contains node 1), sends ψ1 1 and ψ2 1 to CPU N that owns the last block (contains node n) 4. CPU N calculates ψ n and broadcasts ψ n to every CPU that owns a subsystem of 30

Every CPU calculates [ψ] from 33
Notice that by splitting 30 into 31 and 32, CPU N solves the tridiagonal system 33 which has size one element less than the others.

Pipelining
In the previous subsection, we summarize how to solve a tridiagonal system in parallel.It is done simply by completing the forward/backward sweep and sending data to the proper neighbor in order to continue marching.CPU i that finishes the forward sweep sends data to CPU i+1 until the last block is reached.Generally, each CPU can be responsible for thousands of tridiagonal subsystems contained in a single subdomain (or 'a block').This subsection summarizes how to solve such a big system efficiently.
Consider a computational domain containing (n ζ , n η , n ξ ) grid points.For the sake of simplicity, the domain is equally decomposed only in the η direction into N J blocks so that CPU 0 occupies block 0, CPU 1 occupies block 1, and so on.Thus, each CPU owns a block of size (n ζ , n η /N η , n ξ ); given that n η /N η is, by design, an integer.Supposing that we choose to perform an implicit marching in the η direction, the resulting tridiagonal matrix is subdivided into N η sections.The easiest, though the least efficient, way to solve these systems is to let CPU 0 solve all of its subsystems across the (n ζ , n η /N η , n ξ ) grid before sending data to CPU 1 .That is, CPU 0 performs a forward sweep at cell ′ i e in the previous subsection) at (1 : n ζ , n η /N η , 1 : n ξ ) and sends it to CPU 1 .Following the same process for the subsequent CPUs until CPU N η −1 is reached, the backward substitution is carried out in the same way from CPU N η −1 to CPU 0 .The obvious drawback is that only one CPU operates at a given time, and the whole process will be even slower than the serial version since there is additional communication overhead.Pipelining is employed in an attempt to minimize the number of idle CPUs while optimizing communication overhead.In essence, rather than sweeping across the (n ζ , n η /N η , n ξ ) grid all at once, each CPU performs the sweeps only for a portion of the grid and shares data with its neighbouring CPU in the sweep direction downstream in a forward sweep and upstream for a backward sweep.A portion of the grid can be chosen for the first CPU, with the others obeying the same portion.We give an example of pencil-type pipelining.Consider figure 7 and the following steps: 1. CPU 0 , process rank 0 in the figure, performs the forward sweep in the η direction at cell (i, k) = (1, 1) from (i, j, k) = (1, 1, 1) to (i, j, k) = (1, n η /N η , 1); here i and k are dummy indices pointing to a grid location in ζ and ξ directions, respectively.CPU 0 then repeats the forward sweep until (i, k) = (1, n ξ ).Notice that the forward sweep is in the η direction, but the 'pencil' aligns in the ξ direction.At this point, CPU 0 packs and passes data to CPU 1 .The data is of size  for each k ∈ [1, n ξ ] with CPU N η −2 , and starts the backward sweep at the sliding index i = n ζ − 1. 6.The backward sweeping process is carried out in the same way as the forward sweep.7. Solving the system of tridiagonal matrices is finalized after CPU 0 finishes the backward sweep at the sliding index i = 1.

Handling shell cut
Using the parallel Thomas algorithm with pipelining, we have been able to solve 22, 23, and 24 in parallel for u ⋆ , u ⋆⋆ , and u ⋆⋆⋆ .The grid used for the solver before it is rotated about the x 1 -axis is shown in figure 2a.The directions parallel and perpendicular to the body surface are denoted by ζ and η, respectively.Figure 2b represents the transformed coordinate, that is obtained using Jacobi transformation.The top and bottom edges of the domain, parallel to x 1 -axis in figure 2a, are indicated by the phrase 'Branch cut' (AB and CD).They correspond to the left and right sides of the transformed domain, which are parallel to the η direction, indicated by the word 'shell cut'.The body surface seen in the curvilinear domain in figure 2a is transformed to the top and bottom body surface of the transformed domain 2b.The words 'shell cut' or 'Branch cut' represent a shared interface among CPUs that cuts through the centerline.A tridiagonal system in the ζ direction created by discretizing 22 is interrupted by the shell cut on both the left and right sides.To handle the shell cut, the tridiagonal system from one side of the cut is merged with the system on the opposite side.Therefore, the resulting system is twice as large as the system without the cut.
For example, consider solving a system in the ζ direction in figure 8.In the forward sweeping, CPU 1 with point A starts solving from this point A and then passes the information to CPU 17 until the forward sweeping reaches CPU 49 .Then, CPU 49 , with point B passes the information to CPU 51 that has point B ′ on the other side of the cut.CPU 51 keeps performing the forward sweep by sending data to CPU 35 and so on until the sweep reaches A ′ owned by CPU 3 .The backward substitution follows the same procedure by starting from point A ′ and marching until the substitution reaches back to point A. Forward and backward sweeping are done using the pencil-type pipeline Thomas algorithm explained previously.

Pressure Correction
The generalized curvilinear solver uses a combination of the ADI-CN-RKW3 methods to obtain u ⋆⋆⋆ , the intermediate velocity.The remaining procedure in the fractional step method is to remove the divergence residual from the projected velocity u ⋆⋆⋆ at the end of each sub-RKW3 step (denoted as PC in figure 5).This step requires correcting the pressure to account for the divergenceless field.Rewriting equation 6 as 40; where 2 represents the advection, the diffusion, and the baroclinic terms.40 is temporally discretized into 41 and 42.Here, u ⋆⋆⋆ i denotes velocity at the third step of the ADI, and h is a sub-time step of RKW3.
42-41 gives: Taking divergence of 43 gives 44.Note that This yields the Poisson equation 45 for pressure correction δP h Removing divergence from the u ⋆⋆⋆ field is done by solving 45 for δP h and computing u n+ h i using 46.
The Poisson equation for pressure correction is solved using the Semi-Coarsening Multigrid routine in the HYPRE library [28].The divergence-free field u n+ h (1) marks the end of RKW3 first sub step.We follow the same procedure until u n+ h (1) + h (2) + h (3) = u n+1 is obtained.Figure 5 illustrates the entire process.HYPRE is a library of scalable linear solvers and multigrid methods [29] and [30].Generalized curvilinear solver utilizes two solvers provided by HYPRE: 1) SMG, a parallel semi-coarsening multigrid solver for linear systems [28] and 2) BoomerAMG, a parallel implementation of the algebraic multigrid method [31].

Simulation details
We have performed simulations for three Rayleigh numbers, Ra = 10 5 , 10 7 and 10 8 , keeping the Prandtl number constant at Pr = 1.The radius ratio is also kept to be constant at Γ = 0.6, and an inverse square-law profile of gravity with the radius g = (r o /r) 2 is assumed.These choices of parameters are aimed to facilitate comparison with Gastine et al. [14].The number of gridpoints used in each direction for different Ra is given in table 2. The physical curvilinear grid is clustered in the radial direction near the boundaries to resolve the boundary layers near the solid surfaces before performing the Jacobi transformation.The clustering function is given below.
here, rx 2 is the stretching factor, and nx 2 is the number of grid divisions in the radial direction.
After the transformation, all the grid spacings are unity, and the information about grid stretching is provided effectively through the elongation matrix.At the bottom and top surfaces, a no-slip boundary condition is used (u 1 = u 2 = u 3 = 0), while the temperatures are fixed at the bottom (T = 1) and top (T = 0) surfaces to impose an unstable gradient for maintaining thermal convection.The periodic boundary condition is used for all the variables in the ξ directions.The "shell-cut" boundary conditions are used in the ζ direction, as explained before in section 3.5.3.All simulations are started with u i = 0 and small random perturbations in the temperature field.
We use the following notations for the surface, volume, and time-averaged quantities. where All the statistical quantities are averaged in time for at least 100 free fall time (t f = d/u f ) units after the simulation reaches a steady state.Heat transport in the spherical shell is quantified by the Nusselt number Nu, which is defined as where ρ(r) = ⟨T ⟩ s , ⟨⟩ s represents the average over the spherical surface 48 and the overbar represents the time average 50.Here, T c is the conductive temperature profile for spherical shells with isothermal boundaries given by 53.The thermal conduction equation for a spherical shell with isothermal boundary condition is given by which yields

Results
This section summarizes the results of the simulations listed in table 2. We validate our results with those of [14], and further demonstrate the closure of the turbulent kinetic energy budget.We compare the non-dimensionalized temperature profile variation in the radial direction in figure 9.The radial temperature variation is found to be matching with that of Gastine et al. [14].Additionally, we compare the Nu obtained from our solver with the values reported by Gastine et al. [14] at the same Ra.For the calculation of Nu, we use equation 51 with r i = 3 and r o = 5.From the table 2, it can be observed that Nu matches very well with the values from the Gastine et al. [14].

Flow visualization
A comparison of the qualitative features of the instantaneous flow and the thermal field with an increase of Ra from 10 5 to 10 8 is presented in figure 10.For lower thermal forcing at Ra = 10 5 , the plumes generated from the boundaries span the radial extent of the domain, as seen from 10a.However, Ra = 10 8 , as shown in 10b, the plumes are much smaller with a well-mixed interior.With the increase in Ra, turbulence increases, accompanied by higher mixing and generation of smaller scales.The higher Ra case will have a negligible temperature gradient in bulk due to enhanced mixing.In figure 10c, the alternating regions with positive and negative radial velocities indicate the presence of structures similar to convective rolls, while figure 10d exhibit the presence of small-scale plumes near the boundary at higher Ra.

Boundary layer asymmetry
The thermal boundary layer thicknesses (δ T i , inner δ T o , outer) are defined as the distance of the local maximums in the T rms 54 profile from the inner and the outer walls, respectively.The velocity boundary layer thicknesses (δ u i , inner δ u o , outer) are evaluated similarly from the location of the local maximums in the horizontal velocity profile, u h , 55 [32].From the values of boundary layer thickness at the inner and outer boundary from the table 2, it is visible that there is an asymmetry in the temperature profile at both the inner and outer radius.The total heat flowing in through the inner surface should flow out from the outer surface for thermal equilibrium.In conjunction with the inner spherical shell area being less than the outer spherical shell area, the temperature drop is higher at the inner boundary than at the outer boundary 14.It is also visible that with an increase in Ra, steepening of the temperature profile near the boundaries occurs, as shown in the insets of 11.

Turbulent kinetic energy budget
We discuss the turbulent kinetic energy (t.k.e.) budget in RBC in this section to review not only the kinetic energy balance but also the adequacy of the resolution of the present simulations.The budget can be expressed as follows: where, In the RHS of equation 56, the buoyancy flux ⟨B⟩ is the source term that converts the available potential energy to turbulent kinetic energy to drive the convective motions.This t.k.e. is converted to internal energy by the viscous dissipation term ⟨ϵ ν ⟩, [33] which acts as a sink.Buoyancy flux averaged over the whole spherical volume can be expressed as, After substituting Nu from 51 and T c from 53, we obtain, The evolution of the t.k.e.budget for the case Ra = 10 7 is shown in figure 12a.We evaluate the volume-averaged t.k.e.budget terms when the simulation becomes statistically stationary (t ⩾ t 0 , where t 0 represents the starting time of the simulations).The balance term signifies the difference between the left and right-hand sides of equation 56.This quantity remains smaller than 5% of ⟨B⟩, indicating sufficient resolution achieved in the simulation to dissipate all the kinetic energy.To further quantify the spatial resolution of the numerical model, the viscous dissipation ratio (χ ϵ ν ) defined by 60 is also tested for its closeness to unity.The table 3 shows that χ ϵ ν value is near unity for all cases, signifying good spatial resolution.To further test the adequacy of the resolution, the radial grid spacing is compared against the Kolmogorov scale (l η ) defined by, l η = (Pr/Ra) 3 8 (1/ϵ ν ) 1 4 .As seen from the figure 12b, the radial grid spacing normalized by l η is near unity for all the Ra cases near the walls, indicating appropriate wall resolution.As seen in figure 12b, the normalized spacing stays below 4 for all the cases, which is sufficient for accurate calculation of second-order correlations [34].

Conclusion
This present investigation discusses the development of a generalized curvilinear solver for spherical Rayleigh-Bénard convection.Using the Jacobi transformation, the solver transforms a curvilinear domain into a Cartesian domain, and a set of modified governing equations are solved in the Cartesian domain.The solver uses a secondorder central differencing scheme for spatial discretization, while for temporal discretization, a combined marching scheme of ADI-CN-RKW3 is used.A parallel Thomas algorithm with pipelining is used for solving the tridiagonal system, which is more efficient and faster as it reduces the idle time for CPUs.In order to remove the divergence residual from the projected velocity in the intermediate field of the fractional step method, Semi-coarsening Multigrid(SMG) routine from the HYPRE library for the pressure correction is used.The solver simulates three Rayleigh number (Ra) cases, namely, 10 5 ,10 7 , and 10 8 .The primary emphasis is given to the ability of the solver to predict the heat transfer, quantified by the Nusselt number, Nu.Comparing Nu obtained from the numerical simulation with the expected value is not a reliable criterion to assess its validity because even the under-resolved schemes show good closeness with the Nu while producing temperature fields with strong nonphysical oscillations [35].Due to this fact, the solver is not only validated for its Nu but also with the radial temperature profiles from Gastine et al. [14].The radial temperature profile and Nu obtained from our solver demonstrate a good match with the results from Gastine et al. [14].To further test the spatial resolution, we check on the viscous dissipation ratio's closeness to unity and turbulent kinetic energy budget closure.The t.k.e.budget reveals good closure for all the Ra cases considered.With increased Ra, the temperature profile near the boundaries becomes steeper.This is because the fluid near the boundaries is subjected to strong thermal gradients, generating a large buoyancy force that drives the flow.Therefore, the steepening of the temperature profile near the boundaries is evidence of the buoyancy-induced strong convective flow with increased Ra.For a particular Ra, the temperature profile shows asymmetry due to the difference in area between the spherical inner and outer shell.
The majority of the computational methods developed for spherical shell Rayleigh-Bénard convection employ spherical harmonic decomposition of the solution variables in the angular coordinates (θ, ϕ) while using finite difference or Chebyshev polynomials in the radial direction [36,37,38,39,40,41,42].A notable exception to this rule is reported in the work by Kageyama et al. [43].However, all these methods were developed for perfectly spherical geometries.The novelty of the present solver lies in its capability to account for the effects of non-spherical geometries in planetary core convection.
Our ongoing work is focused primarily on extending the present solver to include the effects of rotation and magnetic field.Future extensions, with further model improvements, should reveal the possible effects of a non-spherical geometry, not only on the convective patterns but also on the self-generated magnetic field in global numerical dynamo simulations.

Fig. 6 :
Fig. 6: Example of a 4×4×4 CPU topology in a single computational domain.Numbers in the figure represent an individual CPU's rank (id).

Figure 6
Figure 6 demonstrates an example 4×4×4 central processing unit (CPU) topology in a single computational domain.The spatial discretization of 22, 23, or 24 over several CPUs results in tridiagonal matrices, and vectors.The tridiagonal matrices constructed from the discretization of 24 span over the entire ξ space index, e.g. over CPU 0−3 illustrated by the solid red line in the figure .Similarly, the tridiagonal matrices constructed from the discretization of 23 span over the entire η space index (e.g. over CPU 19−31 ), as indicated by a solid blue line.Considering 38, forward sweeping starts at CPU 0 .Once the sweeping reaches the interface between CPU i and CPU i+1 , CPU i sends b ′ i e , c i e , and g ′ i e to CPU i+1 .Then, CPU i+1 continues to carry out the sweeping by sending data b ′ i+1 e , c i+1 e , and g ′ i+1 e (1, 1 → n η /N η , 1), at cell (1, 1 → n η /N η , 2), and so on until cell (n ζ , 1 → n η /N η , n ξ ).Next, CPU 0 packs the plane data with n ζ * n ξ * 3 elements (recall b ′ i e , c i e , and g

Fig. 7 :
Fig. 7: Illustration of pipelining with the parallel Thomas algorithm following Figure VI.16 of Taylor [27].Forward sweep is achieved in steps (a-d), while steps (e-g) depict back-substitution.

Fig. 8 :
Fig. 8: Solving a tridiagonal system across a shell cut.Here (A, A') and (B, B') are the two pairs of surfaces across the left and right shell cuts, respectively, as per the arrangement of processors shown in figure 6.

Fig. 9 :
Fig. 9: Comparison of surface and time-averaged non-dimensional radial temperature profile for the cases at (a) Ra = 10 7 and (b) Ra = 10 8 .

Fig. 11 :
Fig. 11: Temperature variations observed across different Ra.An enlarged view in the inset to demonstrate the increase in gradient with the increase in Ra).

Table 2 :
Summary table for different Ra along with the grid used for each case.Here, Prandtl number Pr = 1, gravity profile, g = (r o /r) 2 , and radius ratio Γ = 0.6 has been used for all the runs.

Table 3 :
The turbulent kinetic energy budget terms and the viscous dissipation ratio, χ ϵν , at different Ra with Pr = 1.