EPOCHS VI: The Size and Shape Evolution of Galaxies since 𝑧 ∼ 8 with JWST Observations

We present the results of a size and structural analysis of 1395 galaxies at 0 . 5 ≤ 𝑧 ≲ 8 with stellar masses log ( 𝑀 ∗ / 𝑀 ⊙ ) > 9.5 within the JWST Public CEERS field that overlaps with the HST CANDELS EGS observations. We use GALFIT to fit single Sérsic models to the rest-frame optical profile of our galaxies, which is a mass-selected sample complete to our redshift and mass limit. Our primary result is that at fixed rest-frame wavelength and stellar mass, galaxies get progressively smaller, evolving as ∼ ( 1 + 𝑧 ) − 0 . 71 ± 0 . 19 up to 𝑧 ∼ 8. We discover that the vast majority of massive galaxies at high redshifts have low Sérsic indices, thus do not contain steep, concentrated light profiles. Additionally, we explore the evolution of the size-stellar mass relationship, finding a correlation such that more massive systems are larger up to 𝑧 ∼ 3. This relationship breaks down at 𝑧 > 3, where we find that galaxies are of similar sizes, regardless of their star formation rates and Sérsic index, varying little with mass. We show that galaxies are more compact at redder wavelengths, independent of sSFR or stellar mass up to 𝑧 ∼ 3. We demonstrate the size evolution of galaxies continues up to 𝑧 ∼ 8, showing that the process or causes for this evolution is active at early times. We discuss these results in terms of ideas behind galaxy formation and evolution at early epochs, such as their importance in tracing processes driving size evolution, including minor mergers and AGN activity.


INTRODUCTION
Ever since the discovery of galaxies, their extended nature has been a clear indicator of their differing properties from unresolved sources such as stars.These resolved properties have been of interest for many years, and in many ways are the oldest studied properties of galaxies (e.g., Hubble 1926;Buitrago et al. 2008;Conselice 2014;Ferreira et al. 2022b;van der Wel et al. 2014;Suess et al. 2022;Ferreira et al. 2022a;Kartaltepe et al. 2022).As far back as the 18th century, William Herschel, with his sister Caroline and, later, his son John, catalogued what we now know to be extragalactic objects, commenting on their appearance (Herschel 1786).Following up on this, Lord Rosse at his castle in Ireland discovered spiral structures in galaxies, through his observations of M51 and other nearby galaxies (Rosse 1850).After the inferring of distances to these objects, photography allowed Hubble and his immediate successors to develop the dominant morphological classification scheme we use today -the Hubble Sequence -which classifies extragalactic objects as spiral, elliptical, or irregular (Hubble 1926).This has ★ E-mail: katherineeormerod@gmail.com continued until this day, with James Webb Space Telescope (JWST) observations setting us on a new pathway towards understanding the structures and morphologies of the very first galaxies (e.g., Whitney et al. 2021;Ferreira et al. 2022a,b;Suess et al. 2022;Kartaltepe et al. 2022;Huertas-Company et al. 2023;Ono et al. 2023;Jacobs et al. 2023;Tacchella et al. 2023;Morishita et al. 2023).
Galaxy structure and morphology remain the oldest studied and are still a set of crucial properties for us to understand the evolution of galaxies through cosmic time, although we know very little about these features at  > 3. Tracking the changes in the structural properties of galaxies from the era of early galaxy formation until the present day provides valuable insights into the processes of galaxy evolution.There have been major efforts over the past 30 years to study morphology and structure of distant galaxies with the Hubble Space Telescope (HST; Buitrago et al. 2008;Conselice 2014;Delgado-Serrano et al. 2010), where the rest-frame optical properties of galaxies up to  ∼ 3 can be studied and examined.These HST observations have shown us that galaxies appear to become progressively more irregular and peculiar at higher redshifts (e.g., Conselice 2003;Lotz et al. 2004;Mortlock et al. 2013;Delgado-Serrano et al. 2010;Schawinski et al. 2014;Conselice 2014;Whitney et al. 2021).
The red Hubble filters are limited, in that they do not allow us to measure or observe the rest-frame optical light of galaxies back to within the first few Gyr of the universe; in fact the reddest HST filter, F160W on Hubble's Wide Field Camera 3 (WFC3), only probes the rest-frame visible light of galaxies up to  ∼ 2.8, but we know that galaxies exist at much higher redshifts (e.g., Adams et al. 2023b;Austin et al. 2023;Curtis-Lake et al. 2023;Castellano et al. 2022;Naidu et al. 2022;Finkelstein et al. 2023;Atek et al. 2022;Yan et al. 2022;Donnan et al. 2022;Harikane et al. 2023).There has also been a large effort to study the structural and size evolution of galaxies theoretically, through the use of cosmological simulations, at a range of redshifts.These simulations predict that at higher redshift, galaxies become more compact, although that the dust distributions in galaxies attenuates bright cores, and increases the observed half-light radius.Simulations, in agreement with observations, have also shown that galaxy sizes typically increase with increasing stellar mass, and show that compact galaxies may grow in size due to mergers or renewed star formation, with high-redshift stars moving outwards (e.g., Furlong et al. 2017;Marshall et al. 2022;Roper et al. 2022).
The relatively recently launched James Webb Space Telescope allows us to obtain the same type of data with the Near Infrared Camera (NIRCam) probing rest frame optical light as far out as  ∼ 9, with filters as red as ∼ 4.4m.The superior resolution of JWST and the long wavelengths of its filters allow us to examine galaxy structure with much better fidelity than with HST (Ferreira et al. 2022b).Early JWST observations reveal morphological and structural features of galaxies at 1.5 <  < 8 that were not possible to discern fully with HST, thus resulting in the re-classification of many galaxies previously believed to have peculiar morphologies (e.g., Ferreira et al. 2020).A major discovery has been that galaxies appear morphologically much more disc-like than previously thought (e.g.,.Ferreira et al. 2022a).While these early JWST papers show that galaxies are indeed different at  > 2 than we thought on the basis of HST imaging, a significant amount of quantitative analysis is still needed.
As such, in this paper we present an analysis of the sizes and Sérsic indices of massive galaxies with stellar masses log ( * / ⊙ )> 9.5, for which we can now quantitatively measure rest-frame structure up to  ∼ 8 with JWST.Whilst previously we could also measure galaxy sizes and morphologies with HST, these are often unreliable due to image fidelity and the range of wavelengths we were able to probe (Ferreira et al. 2022b).Our results at 0.5 <  < 8 allow us to probe deeper and at higher redshifts than previous studies.
We present a quantitative analysis of measured galaxy shapes, based on the Sérsic index, , obtained from Sérsic profile fitting, and size measurements, based on half-light radii measurements, to determine the evolution of galaxy structure over most of cosmic time.In this paper we answer how galaxy sizes and their overall shapes change for a mass-complete sample which we divide into samples based upon two properties: specific star formation rate, and Sérsic indices.This differs from previous work which has focused on either just the highest redshift galaxies (e.g., Ono et al. 2023) or those which are passive (e.g., Ito et al. 2023).The evolution of these properties within a stellar mass selection is a key observable for galaxy evolution as well as an important way to trace processes that drive galaxy size evolution in massive galaxies, including galaxy minor mergers and AGN activity (e.g., Bluck et al. 2012).
The structure of this paper is as follows: Section 2 discusses the data used and the data reduction process, Section 3 discusses properties of the galaxies within our sample.The fitting process is explained in Section 4, and we discuss how we select a sample of robust morphological fits in Section 5. We present our results, along with a comparison to simulations to confirm that our findings are not the result of redshift effects in Section 6.We assume a standard Λ CDM cosmology throughout of Ω  = 0.3, Ω Λ = 0.7, and  0 = 70 km s −1 Mpc −1 .Where we reference galaxy 'sizes', we are referring to the half-light radii of our objects.All magnitudes are given in the AB system (Oke 1974;Oke & Gunn 1983).

DATA
We use JWST NIRCam imaging (Rieke et al. 2022) to analyse the light profiles of a large sample of high-redshift galaxies in the F115W, F150W, and F200W short-wavelength (SW) bands, and F277W, F356W, F410M, and F444W long-wavelength (LW) bands.The Cosmic Evolution Early Release Science (CEERS, PID: 1345, PI: S. Finkelstein) Survey (Finkelstein et al. 2017(Finkelstein et al. , 2022;;Bagley et al. 2023) is one of 13 JWST ERS programmes, with the goal of examining galaxy formation at 0.5 < z < 10 and perhaps beyond.The galaxies analysed in this work are within the CEERS NIRCam footprint, and within the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) observations (Koekemoer et al. 2011).This is important as it allows us to use both the JWST data as well as the deep data from HST's WFC3 and Advanced Camera for Surveys (ACS), which also aids the determination of photometric redshifts.
As such, the photometric redshifts, stellar masses, and star formation rates used in this paper are based on the original CANDELS+GOODS WFC3/ACS imaging and data, Spitzer/IRAC S-CANDELS observations (Ashby et al. 2015), and Canada-France-Hawaii Telescope (CFHT) ground-based observations (Stefanon et al. 2017).A summary of HST filters used and their 5 depths is shown in Table 1.
The CANDELS survey was designed to investigate galaxy evolution and the birth of black holes at 1.5 <  < 8, and consists of multi-wavelength observations in five fields.The CANDELS/DEEP survey covers 125 arcmin 2 within GOODS-N and GOODS-S, and the remainder consists of the CANDELS/Wide survey, covering three additional fields (Extended Groth Strip, COSMOS, Ultra-Deep Survey), covering a total of 800 arcmin 2 across all fields (Grogin et al. 2011).The primary sample we select from originates from these deep observations within the EGS, where there is overlap with the CEERS JWST NIRCam data, and whose analysis is described in detail in Duncan et al. (2019).

CEERS Data Reduction
We process the JWST data products on this field using a modified version of the official JWST pipeline, explained in depth in Adams et al. (2023b); Adams et al. (2023a).We use the standard JWST pipeline (pipeline version 1.8.2 and Calibration Reference Data System v0995), with some minor modifications.and F200W data1 .After Stage 2 of the pipeline we apply a correction for 1/F noise, derived by Chris Willot. 2 We extract the sky subtraction step from Stage 3 of the pipeline and run this on each NIRCam frame independently.We then align calibrated imaging for each exposure to GAIA Data Release 3 (Gaia Collaboration et al. 2021), using tweakreg from the drizzlepac python3 package.We finally pixel-match the final mosaics using astropy reproject. 4he final resolution of our drizzled images is 0.03 arcsec per pixel.The total unmasked area of JWST images used in this paper is 64.15 square arcminutes, and the average depths of each filter are listed in Table 2.

Photometric Redshifts, Stellar Masses and Star Formation Rates
We begin our analysis with a catalogue of 1649 massive objects with log ( * / ⊙ )> 9.5, which have photometric redshifts and physical properties calculated in previous works.The photometric redshifts used in this paper are calculated in Duncan et al. (2019), using the EAZY (Brammer et al. 2008) photometric redshift code, with three separate template sets fitted to the observed photometry.These templates include zero-point offsets, which alter the input fluxes, and fix additional wavelength-dependent errors (see Duncan et al. (2018a,b) for full details).A Gaussian process code (GPz; Almosallam et al. 2016) is used to measure further empirical estimates, using a subset of the photometric bands.Individual redshift posteriors are calculated, and all measurements are combined in a statistical framework via a Bayesian combination to give a final estimate of redshift.From a comparison with spectroscopic redshifts these photometric redshifts are seen to have a high degree of accuracy; for full details, see Section 2.4 of Duncan et al. (2019).
The stellar masses we use are those measured in Duncan et al. (2014); Duncan et al. (2019), using a custom template fitting code (see Section 4 of Duncan et al. (2014)).With this custom spectral energy distribution (SED) fitting code, the stellar mass is measured at all redshifts within the photometric redshift fitting range.The masses also have a 'template error function', described in Brammer et al. (2008), accounting for uncertainties driven by the template set and wavelength effects.These stellar mass measurements assume a BC03 stellar population synthesis (SPS) model (Bruzual & Charlot 2003), with a wide range of stellar population parameters, and a Chabrier (2003) initial mass function (IMF).The star formation histories used within these fits follow the form SFR ∝ e −/ , with timescales of ||=0.25,0.5, 1, 2.5, 5, 10, where negative values of  represent exponentially increasing histories.A short burst model is also used ( = 0.05), as well as continuous star formation models ( = 1/ 0 ).In order to ensure that our stellar masses do not suffer from systematic biases, they are compared to stellar masses calculated independently within the CANDELS collaboration (Santini et al. 2015).While there is some scatter between the two mass estimates, there is no significant bias (see Section 2.5 of Duncan et al. (2019) for a detailed discussion).We aim to calculate masses using JWST data for galaxies at  > 4.5 in Harvey et al., in prep.Star formation rates for our sample of galaxies are calculated using the UV slope () of the spectral energy distribution, which gives a measure of the dust attenuation within the galaxy.We aim to measure this with JWST within this EPOCHS paper series (Austin et al., in prep).From this, we correct for dust and obtain the total star formation rates for our galaxies, which agree well with star formation rates derived directly from SED fitting (Duncan et al. 2014;Duncan et al. 2019).

Visual Classifications
For 470 objects within our sample, we use visual classifications from Ferreira et al. (2022a) as part of our analysis.The categories we use are as follows: (i) Discs: Sources with a resolved disc with an outer area of lower surface brightness, that regularly increases towards the centre of the galaxy.
(ii) Spheroids: Resolved sources that are symmetrical, with a centrally concentrated, smooth light profile, that are round or elliptical.
(iii) Peculiar: Resolved sources with a disturbed morphology, which dominates any smooth components.
(iv) Other: Mainly made up of sources classified as 'ambiguous', due to the classifiers not reaching a majority agreement on the classification of a source.This category also contains sources that are classified as point sources due to an angular size smaller than the full width half maximum (FWHM) of the point spread function (PSF), or clear spikes that are consistent with point sources, and any sources that were unable to be classified due faintness or image issues.

MORPHOLOGICAL FITTING
We use GALFIT version 3.0.5 (Peng et al. 2002(Peng et al. , 2010))to fit a single Sérsic light profile to each galaxy.Ultimately, the overall goal with JWST is to measure the light profiles in more detail, such as obtaining bulge to disk ratios and other features (Margalef-Bentabol et al. 2016).However, it is important to first determine structural properties using single profile fitting, and assess how they characterise the data (e.g., van der Wel et al. 2012van der Wel et al. , 2014;;Suess et al. 2022).
GALFIT is a least-squares-fitting algorithm which finds the optimum solution to the surface brightness profiles for galaxies through using a Levenberg-Marquardt algorithm.GALFIT uses the reduced chi-squared,  2  , to determine goodness-of-fit and finds the best fit model through  2  minimisation.The  2  is given by by: summed over  and  pixels, and where   is the number of degrees of freedom.As seen in Equation 1, GALFIT requires a data image from which the galaxy surface brightness is measured,  data (, ) and a sigma image, (, ), giving the relative error at each position within the image, which are then used to calculate the model image,  model (, ).
We run GALFIT for all available filters, but only report results here for the filters that best match the rest-frame optical wavelength of the source.This minimises, or even eliminates, the effect of morphological k-correction, as the qualitative and quantitative structure of galaxies changes as a function of wavelength (Taylor- Mager et al. 2007), which can result in significant structural changes between rest-frame UV and rest-frame optical images.The band selected at a given redshift is shown in Figure 1.The figure shows which filter we use within different redshift ranges, and what rest-frame wavelength we probe within that filter at that redshift.As can be seen, we are always probing the rest-frame optical at wavelengths redder than the Balmer break at all epochs in which we view our galaxy sample.
The Sérsic profile we use has the form where  () is the intensity at a distance  from the centre of the galaxy,   is the half-light radius of galaxy (the radius where 50% of the total luminosity is enclosed),   is the intensity at the half-light radius,  is the Sérsic index, which controls the shape of the light profile of the galaxy (Sérsic 1963;Ciotti 1991;Caon et al. 1993), and   can be approximated as () (Ciotti & Bertin 1999).GALFIT gives us a best fitting value for each of these terms.The errors on these values are also calculated through this method, and a full description of the GALFIT error calculation can be found in Peng et al. (2002).

GALFIT Pipeline
We use a custom pipeline for single component Sérsic fits with GALFIT.The process is as follows: (i) Source Detection: We use SExtractor (Bertin & Arnouts 1996) to detect sources within the F444W images for each CEERS pointing, following the parameters and method in Adams et al. (2023b).These catalogues are then cross-matched within 1 arcsecond to the catalogues created from the analysis completed in Duncan et al. (2019) to create our final catalogues for each pointing.The average separation between both catalogues is ∼ 0.15 ′′ (ii) Cutout and Mask Creation: We create 200 x 200 pixel (6 ′′ × 6 ′′ ) cutouts of each source, in order to ensure the entire surface brightness profile of the galaxy is enclosed within the cutout, along with that of any neighbours that may need to be modelled simultaneously.We use the SExtractor segmentation maps to make masks, creating the same 200 x 200 pixel cutout of the segmentation map, and then masking the necessary objects.
In order to create masks for each object, and select which neighbouring objects must be masked, we use the Kron ellipses (Kron 1980), as defined by SExtractor and plot circular apertures with a radius equal to the semi-major axis of the Kron ellipse.We do this to select galaxies that are sufficiently close enough for their surface brightness profile to interfere with that of the target, 'primary', object and must be fit simultaneously.If any neighbouring galaxy has an overlapping Kron aperture with that of the primary object, the neighbouring galaxy is deemed to be sufficiently close that it must be modelled alongside the primary galaxy, to account for both light profiles.We do this for as many neighbouring objects as necessary.Objects that do not have an overlapping Kron aperture are far enough away that they do not also need to be fit, and therefore are masked instead, primarily to save computational time.Through visual inspection of fits and residuals of fits from the data, we conclude that this criterion is good for determining when to fit neighbouring objects.The pixels that are masked are those that the SExtractor segmentation maps assigns to each source.An example of these selection criteria is shown in Figure 2.
GALFIT also requires a sigma image to give relative weight to the pixels in the image during the fit.As the input sigma image, we use the 'ERR' extension of the images, which is a measure of the noise of the image, and this is created using the same method as the object cutout images.
(iii) Input Parameters: In order to fit a single Sérsic profile, initial estimates of parameters must be provided to GALFIT.The input parameters that GALFIT requires estimates for are  and  image coordinates, total magnitude, half-light radius, axis ratio, position angle, and Sérsic index.Similarly to Kartaltepe et al. (2022), we use the SExtractor catalogue for our initial parameter estimates, except for the initial Sérsic index which we estimate as  = 1, although other values of  have virtually no effect on the output parameters, and only apply constraints to the image position of the sources within ± 2 pixels to ensure the correct source is being fit.
(iv) Point Spread Function: GALFIT requires the appropriate PSF for each filter, which is obtained using WebbPSF (Perrin et al. 2014), and resampled to our pixel scale.We experimented with different PSFs created through this method and find that the results do not significantly change.
Although the central position of the source is constrained, all other parameters are allowed to vary freely, and a selection process is used to select good fits with physical parameters after fitting is complete, as the overuse of constraints can lead to GALFIT converging on unphysical results.Example fits can be seen in Figure 3.

Comparison with IMFIT
IMFIT is an alternative light profile-fitting program which uses where the red radius in each image is that of the primary source, and the blue radii are those of the neighbouring sources.We give several scenarios for how these systems would be found.Left: No other sources would be fit simultaneously to the primary, and the pixels belonging to all other objects according to the SExtractor segmentation maps are masked.Right: The source where the Kron radius overlaps that of the primary source are simultaneously fit in this instance, and all other sources would be masked.Cutout sizes shown are 6" × 6".Levenberg-Marquardt, Nelder-Mead, and Differential Evolution algorithms to find the best fit parameters (Erwin 2015).In order to test the robustness of our method, we present a comparison of best fit half-light radii and Sérsic indices for a representative sample of 146 objects in CEERS Pointing 1, which have GALFIT fits that meet the selection criteria explained in Section 5 and have a stellar mass of log ( * / ⊙ )> 9.5.We run IMFIT using the same input param-eters and initial guesses as those used for GALFIT and again use the Levenberg-Marquardt algorithm for consistency.The results of this comparison are shown in Figure 4a and Figure 4b which plot the half-light radii and Sérsic indices measured for these 146 objects using GALFIT and IMFIT, showing a good agreement between the measured values.
We use two numerical values to provide a further indicator of reliability, and measure these for the sample of 146 objects used across the comparison.Firstly, we use the outlier rate, defined as the fraction of radii/Sérsic indices obtained by IMFIT that disagrees with the radii/Sérsic indices obtained by GALFIT by more than 15% in (1 + ), where  is the measured quantity.Secondly, we use the Normalised Median Absolute Deviation (NMAD) (Hoaglin et al. 1983), which is defined as 1.48 × median[|Δ|/(1 + )], where Δ =     −    .The NMAD is a measure of the spread of the IMFIT measurements around the GALFIT measurements, maintaining its reliability when outliers are present.
For the half-light radius we find an outlier rate of 6.2% and NMAD of 0.012, and for Sérsic index we find an outlier rate of 9.7% and NMAD of 0.021, showing a slightly better agreement between codes for half-light radius than Sérsic index.We also find a better agreement with Sérsic index at lower values of Sérsic index.We see greater disagreements for a few objects at higher Sérsic index  due to larger contrast which exists at higher values of .A slight change in the fitting will provide a larger change in  when  is larger.Overall, however, we find a good agreement between these different codes and use the GALFIT results throughout the rest of this paper.

SAMPLE SELECTION
We select massive galaxies with stellar masses log ( * / ⊙ )> 9.5, and then make further selections based on the goodness of fit achieved by GALFIT for each object.We do not make selections based on the  2  obtained, which is only used by GALFIT to determine when it has reached the best fit.Instead, we make our own selections based on the output parameters and the residual flux fraction (RFF).

GALFIT Parameters
In order to remove extreme cases, a fit must meet all of the following criteria: (i) The half-light radius must be within 0.01 <   (pixels) < 100.This ensures that fits where GALFIT has reached the minimum size possible or where the model would be larger than the cutout size are excluded.
(iii) The fit axis ratio must be (/) > 0.01, removing unphysical models.This is particularly prevalent in faint sources, where GALFIT sometimes converges upon a 'bad' fit with a small axis ratio (van der Wel et al. 2012).We do not make selections based upon GALFIT magnitude, and all models are fit by GALFIT with a rest-frame optical magnitude < 28.
Where neighbouring objects are being simultaneously modelled, these criteria are only applied to the best fit parameters of the central object.Where GALFIT does not converge, and gives no best fit parameters, or the best fit parameters do not meet the above criteria, we reject the fit.We start the fitting process with a sample of 1649 galaxies with log ( * / ⊙ )> 9.5, and through our selection criteria, we reject 192 galaxies, for which we repeat the fitting process with Sérsic index held at a value of  = 1.The 'Fixed Sérsic' fits must then  meet the above criteria for all other parameters, and out of these 192 galaxies, we still reject 93 galaxy fits, thus our sample contains 99 galaxies with a Sérsic index fixed at  = 1, and 1457 objects with a free value of  at this stage, with a total sample size of 1556 galaxies.

Residual Flux Fraction
We calculate the residual flux fraction for our fits that met the previous criteria in subsection 5.1.The residual flux fraction (RFF) is a measure of the signal in the residual image that cannot be explained by background fluctuations (Hoyos et al. 2012).As in Margalef-Bentabol et al. (2016), we define this as where I is the NIRCam image of the galaxy, I GALFIT is the model image created by GALFIT,   is the background RMS image, and FLUX_AUTO is the flux of the galaxy calculated by SExtractor, all of which are in the rest-frame optical filter of the object.The factor of 0.8 in the numerator ensures that the expected value of the RFF is 0 for a Gaussian noise error image (Hoyos et al. 2011).We calculate the RFF within the Kron radius of the galaxy, where we define the Kron radius as the semi-major axis of the Kron ellipse.
Calculating the RFF over a large radius leads to the RFF decaying to zero, where the outer areas can dominate the calculation, even if there is a complex residual at the centre of the image.We calculate the background term following the method used in Margalef-Bentabol et al. ( 2016), where we assume that : where ⟨  ⟩ is the mean value of the background sigma for the whole image.We calculate this by placing apertures on blank areas of sky in the image 'ERR' extension that we use for creating the GALFIT sigma images (see subsection 4.1), and calculating the mean value of these regions.The value of  is the number of pixels within the radius that we are using for the RFF calculation.
In order to remove any remaining objects that are poorly fit with large residuals, we complete visual checks to select an appropriate RFF cutoff value.As a result of this, we select objects with an RFF value below 0.5.This enables us to remove objects where the light is either very over-or under-accounted for, yet also allows for features that are not modelled precisely (due to features such as spiral arms and bars not being accounted for in single Sérsic fits) but where the measured properties are otherwise reasonable.Through RFF measurements, we reject a further 161 fits due to large residuals.This results in a final sample of 1395 robust galaxy fits, of which 1313 (94.1%) were fit with a free value of , and 82 (5.9%) were fit with a fixed value of  = 1, which we further analyse in section 6.The rejected fits are mostly comprised of lower mass galaxies, although there are fits rejected at all masses within our sample.

Final Sample
We begin the fitting process with a sample of 1649 galaxies, and after our quality cuts we recover 1395 galaxies for our final sample.This recovery rate of 84.6% is higher than comparable analyses, such as Suess et al. ( 2022) (∼ 60%), although we repeat the fitting process with a fixed Sérsic index where the first fitting procedure has failed.However, fits with a fixed Sérsic index only account for 5.9% of our sample.We note that we have not made any selections based upon the magnitude of the GALFIT model, as the input and output magnitudes are in good agreement.In the rest-frame optical, 3.66% of the GALFIT best-fitting models have a half-light radius in pixels that is smaller than the FWHM of the PSF, thus almost all of our sources are resolved.This sample is used throughout this paper to carry out our analyses of the evolution of galaxy size and structure.

RESULTS
In the following sections we describe the results of our analysis.This includes examining the sizes of our systems, as well as the overall shapes of these galaxies based on their Sérsic indices and how these evolve with time.Where we state results for passive and star forming galaxies, whereby these are defined by having a specific star formation rate (sSFR) below or above the midpoint of the distribution of values within the redshift bin.We calculate the specific star formation rate using the star formation rates and stellar masses from Duncan et al. ( 2014); Duncan et al. ( 2019) (see subsection 3.1 for full details).We then define a galaxy with a specific star formation EPOCHS VI 7 rate greater (lower) than the median sSFR within the redshift bin, to be a star-forming (quiescent) galaxy.

Half-light radii and Sérsic indices
We measure the half-light radii of our objects using GALFIT, and convert the values from pixels to their physical half-light radii in kpc. Figure 5 shows the size evolution with redshift for our sample of 1395 galaxies.We use the radius from the filter nearest to the rest-frame optical for each object, and find that the sizes are well fit by the power-law relation: (5) This is such that the average sizes of our sample, in terms of effective radii (⟨R e ⟩), become progressively smaller at increasing redshifts.This trend for galaxies at a given mass selection to become smaller at higher redshifts had been known to exist at  < 3 for many years (e.g., Trujillo et al. 2007;Buitrago et al. 2008;van der Wel et al. 2012), yet this is the first time this has been shown using JWST observations for similar types of studies.We also compare our power law function to those derived in comparable studies.We note that the curves presented in Buitrago et al. (2008) are normalised with respect to SDSS data, thus we perform an arbitrary normalisation of these curves to align them with the scale employed.We also extrapolate all curves to cover our entire redshift range.We compare to a range of individual points and power-law curves.
It is important to note that the evolution of galaxy size and Sersic index are highly dependent on the redshift ranges studied and the stellar mass and/or magnitude ranges which are included in the analysis.For example, if we study just the super massive galaxies at log ( * / ⊙ )> 11, we would find that systems have a stronger evolution than at lower masses (e.g., Buitrago et al. 2008;Buitrago et al. 2013).As such it is important to be clear with what we are comparing with in this figure, as no previous study has measured galaxies in exactly the same way that we have here.The points from van der Wel et al. ( 2014 What we can see from Figure 5 is that there is a steep evolution for galaxies selected with the mass range that we have within this paper.We find, as quoted before a power-law decrease in size that scales as ∼ (1 + ) −0.71 , which is less steep than previous results when comparing the evolution up to  ∼ 3.This is partially due to the fact that we are observing galaxies at higher redshifts where the size evolution tapers off, and does not continue as steeply at higher-z, although it is important to keep in mind that redshift (z) values do not scale linearly with time, and there is much more time at a given  at low redshift than at the higher redshifts.Another reason for the difference, can be seen at the lowest redshifts, where our galaxies are on average smaller than the previous work.This is likely due to us using a lower mass cut to define our sample of galaxies, resulting in on average smaller systems.This is consistent with findings from simulations, where it has been shown that galaxy size correlates with stellar mass, thus resulting in lower mass samples having smaller sizes on average (Furlong et al. 2017;Ma et al. 2018).
We also investigate the difference in the size-redshift relation for populations of galaxies with high and low Sérsic indices, defined as  > 2 and  < 2 respectively, although using  = 2.5 as the limit produces effectively the same results.We do this to determine how the size evolution depends on the shape of the profile, with those at  > 2 possibly more like the massive galaxies we see in the local universe and those with  < 2 possibly progenitors of disc galaxies or those undergoing mergers.
As can be seen in Figure 6, at low redshift, the galaxy populations have a clearly different size-redshift relation, but at redshifts higher than  ∼ 3, the relations show a greater similarity, suggesting that effective radius is less dependent upon the Sérsic index at high redshift compared to lower redshifts.What this means is that galaxies do not differentiate between overall morphology, as measured by the Sérsic index, until around  ∼ 3, consistent with findings at  < 5, where star forming galaxies exhibit inside out growth (Roper et al. 2023).This suggests that this aspect of the 'Hubble Sequence' was in place by at least  ∼ 3, with a disc-like ( < 2) and elliptical-like ( > 2) population clearly defined.
We also find that galaxies at higher redshift are almost all compact objects, regardless of their Sérsic index, and are smaller than their low-redshift counterparts of similar mass, in agreement with previous studies and simulations (e.g, Trujillo et al. 2007;Buitrago et al. 2008;van der Wel et al. 2014;Costantin et al. 2023).Furthermore, we find that the sizes of the high and low Sérsic index populations evolve differently at  < 3, with the objects with lower Sérsic indices following a steeper size -redshift relation, suggesting a difference in structural growth mechanisms in each population, likely due to the onset of inside out growth in these systems with lower Sérsic indices.We also show that the compact systems of the early universe are not representative of the galaxy population today, suggesting a strong size evolution must occur, continuing until the present day.This growth is such that we find an increase of a factor of three from  ∼ 7 to  ∼ 1, with roughly a doubling of size from  ∼ 7 to  ∼ 3, a time period of ∼ 1.4 Gyr.The relation is also still evolving at the highest redshifts, confirming that galaxy evolution as well as evolution in structure were already taking place in the first Gyr since the Big Bang.
For those objects fit with a free Sérsic index, we also investigate the evolution of Sérsic index with redshift, as shown in Figure 7. Figure 7a shows that for all free-fit  galaxies within our sample, Sérsic index decreases with increasing redshift, suggesting a higher proportion of disc-type galaxies in the early universe.Whilst these objects have Sérsic indices similar to modern pure disc galaxies, this does not necessarily imply that these are rotating disks (e.g., Buitrago et al. 2014).This is in contrast to previous findings using HST that there were fewer disc galaxies at  > 1.5, although the lower resolution of HST lead to misclassification of galaxies (Ferreira et al. 2022b).Exploring this complicated subject is beyond the scope of this paper.The slight increase in Sérsic index at  ∼ 7.5 could be due to an increase of spheroid galaxies reported in Ferreira et al. (2022b), particularly in the star forming population (see Figure 7b), as spheroid galaxies have been found to account for a higher proportion of the sSFR budget (Ferreira et al. 2022a) at high redshifts, but could also simply be due to random chance and increased errors at higher redshifts.We further discuss size and Sérsic index changes as a function of morphology in subsection 6.4.
In Figure 7b, we investigate Sérsic indices for passive and star forming populations, and find that the star forming galaxies have Sérsic indices around  ∼ 1, suggesting that most star formation takes place within disc galaxies, which is also the case when identifying these systems through visual means (Ferreira et al. 2022b).The slight deviation from this trend at the highest redshifts could be due to stars forming in young, compact sources, before evolving into disc-like galaxies.The populations start to show differing trends at  2023) which uses the TNG50 simulation to produce mock CEERS observations.We note that we only plot redshift errors for Bridge et al. (2019), as radius errors are not provided.
∼ 3, again hinting at the establishment of a bifurcation of the galaxy population in structure around this time.This shows that the star forming and passive galaxy populations possibly evolve through different mechanisms to create differing physical properties at later times, such as inside out star formation in star forming galaxies, and stellar migration in passive systems.

Galaxy Size-Mass Distribution
We plot the galaxy size-mass relation in different redshift bins, as shown in Figure 8.For each redshift bin, we separate galaxies into either a star-forming or passive population, using the median specific star formation rate (sSFR) of each bin as the separating value.This midpoint is determined by plotting a histogram of all the sSFR values for galaxies within a given redshift bin.The median value is then measured, and galaxies which are lower than this we call 'passive', and those above this 'active'.This is the same method we used previously for separating star forming from non-star forming galaxies when investigating trends with size and Sersic index.
In general, we find that galaxy size increases with stellar mass, in both the quiescent and star forming populations up to  ∼ 6, although this tends to be more obvious for the star-forming galaxies.We also see that the star forming galaxies at the lower redshifts are nearly always larger at a given mass than the passive galaxies, although this tends to break down at the lower masses.
We also find that at redshifts higher than  > 3, the sizes of the quiescent and star forming populations are statistically the same, suggesting that at a fixed mass at high redshift, quiescent galaxies may not be smaller than star-forming galaxies, as predicted in Suess et al. (2022), likely due to transient quiescence driven by high redshift active galactic nuclei (AGN) activity (Lovell et al. 2023).This means that whatever is differentiating the sizes of galaxies as a function of  shape or star formation rate does not come into play until past redshift  ∼ 3.This implies that the physical processes of growing galaxies is truncated for the passive galaxies, even if these systems still acquire stellar mass throughout their history.We discuss possible reasons for this later in the discussion section.

Changes in Effective Radii as a Function of Wavelength
The changes in a galaxy's appearance and size as a function of wavelength can provide many clues to the formation history and stellar populations of modern galaxies (e.g., Windhorst et al. 2002;Taylor-Mager et al. 2007;Mager et al. 2018;Suess et al. 2022).An example of this is that if a galaxy forms inside-out, with the older stellar populations in the centre of the galaxy, then most likely these systems would appear larger at shorter wavelengths, and vice-versa if formation occurred via outside-in.
As such, we measure half-light radii in all available wavelengths for our sources, and use these sizes to probe the size evolution of our galaxy sample at differing observed wavelengths.As seen in Figure 9, average galaxy sizes become increasingly more compact and smaller when observed at longer wavelengths, for objects with both high and low Sérsic indices at  < 3. We also see again, that those galaxies with higher Sersic indices are smaller on average at all wavelengths to those systems with lower indices.This is an indication that galaxy sizes are larger at shorter wavelengths where bluer and young light is probed, due to the formation mechanisms for these galaxies.This would be such that the outer parts of these systems consist of younger stars, compared to their inner portions made of older stars.More detailed analysis of the colour gradients and star formation gradients of these galaxies would answer this question.We also note that dust attenuation can increase the observed half-light radius (e.g., Marshall et al. 2022;Roper et al. 2022;Popping et al. 2022), although this would require a more in depth observational analysis.
At  < 3, we also find that galaxies with higher Sérsic indices have smaller radii in both mass bins, with this effect being more noticeable in the highest mass bin.However, we do not see the same effect past  ∼ 3, where we see a much flatter relation, suggesting that galaxies at high redshift are forming stars throughout the entire galaxy, with blue and red light emitted throughout.However, when we divide the higher redshift galaxies into different bins, we obtain a noisier trend, and thus we hesitate to draw any further conclusions from this observed trend.For objects with Sérsic indices of  > 2, there is much more scatter within the relation, although there are larger errors, due to the smaller number of galaxies with high Sérsic index at  > 3.
To compare with previous JWST work on similar questions, we also compare the sizes of our galaxies measured in the F444W (4.4) band to the F150W (1.5) band sizes, as shown in Figure 10.This comparison is quoted in arcseconds, as a comparison of the on-sky sizes.We find that for galaxies at cosmic noon (1 ≤  ≤ 2.5), the 4.4 sizes are 11.4±1.28%smaller than the 1.5 sizes on average, in agreement with the ∼ 9% difference found in Suess et al. (2022).Taking sizes measured in the near-infrared (observed with 4.4 at this redshift range) to be a reasonable proxy for stellar mass distributions, this shows that the stellar mass profiles of galaxies are smaller and more compact than their star forming 'light' profiles.We do not find that the galaxies outside of cosmic noon show the same effect, as the F150W and F444W filters no longer correspond to the rest-frame optical and rest-frame infrared of these galaxies.As we have shown that the smaller appearance in F444W is indeed due to the rest-frame infrared profile being smaller, thus showing the mass profile of the galaxy is smaller than the light profile, we investigate how this varies with the stellar masses of galaxies.This effect is dependent on the mass of the galaxy, as shown in Figure 11, where we examine the size difference at cosmic noon in two mass bins.We find that for galaxies with 9.5 ≤log ( * / ⊙ )< 10, the F444W sizes are 5.82 ± 1.76% smaller than their F150W sizes.This increases with increasing mass, with sizes being 17.9 ± 1.80% smaller for objects with 10≤log ( * / ⊙ ).This implies that on aver-age we see a greater difference in size between different wavelengths for the highest mass galaxies.

Correlation with Visual Morphology
For 470 galaxies within our sample, we use visual morphological classifications from Ferreira et al. (2022b), as defined in subsec-Figure 9.The half-light radius as a function of wavelength.These plots show the radius evolution for  < 3 (left panels), and  > 3 (right panels) galaxies.In each redshift bin, we show the evolution for low-mass and high-mass galaxies, with the galaxies separated into high-Sérsic (red) and low-Sérsic (blue) index populations.These figures show that up to  ∼ 3 (left panels), galaxies are more compact when measured in redder filters, regardless of selection method.Past  = 3 (right panels), the relation is flatter, with the galaxies having  < 2 being smaller at this stage.tion 3.2 to determine how the properties and relations we have found are determined by overall morphology.This is a small sample of galaxies, but represents one of the first times that we can examine these measured properties with visual morphologies.Using these classifications, we present an analysis of size and Sérsic index as a function of morphology, as shown in Figure 12.
Figure 12a shows that for all galaxy types, radius decreases with increasing redshift, but at all redshifts, spheroid-type galaxies are the smallest.Figure 12b shows how Sérsic index varies with redshift for all galaxy types.Disc type galaxies have  ∼ 1 as expected, with 'peculiar' and 'other' galaxies showing a slight decrease with redshift.Spheroid galaxies have the highest Sérsic index at all redshifts, significantly above other galaxy types.The small sizes of these galaxies showing that F444W sizes are smaller than those in F150W.This implies that galaxies measured in the near-infrared with JWST are more compact than rest-frame optical sizes previously measured with HST.Galaxies at cosmic noon (1 ≤  ≤ 2.5) are shown as red diamonds whilst the grey hexagons represent all galaxies within our sample that do not fall within cosmic noon.These latter galaxies do not show the same effect.The black, dashed line is the one-to-one relation.The error bar represents a typical error of 0.2 dex on these measurements.combined with their high Sérsic index is a clear indicator of their compact, concentrated nature.

Comparison with Image Simulations
One major issue with a study such as this, which deals with imaging and the analysis of structure at galaxies at vastly different redshifts, is the fact that the surface brightness measurements of galaxies declines as (1 + ) 4 , and this can produce significant changes in the way that structure for distant galaxies would be imaged by a telescope.In fact, it is clear that galaxy structure can in principle change substantially and that many galaxies are potentially being missed at the highest redshifts (Conselice 2003;Whitney et al. 2020;Whitney et al. 2021).Therefore it is very important that we carry out simulations to determine if the trend we see in this paper, namely that galaxies get progressively smaller up to  ∼ 8, is due to a real evolution or simply due to the fact that the surface brightness of the galaxies just makes our galaxies appear smaller.Previous work using HST shows that whilst we are likely missing galaxies at the highest redshifts, we can still measure accurately their structural parameters (Whitney et al. 2020;Whitney et al. 2021).
To understand this issue in depth for JWST data, we take a sample of 186 low-redshift galaxies at redshifts 0.5 <  < 1, and create simulated images of these galaxies at higher redshifts, in intervals of Δ = 0.5, up to  = 7.5, including all known cosmological effects.This is done in order to separate real evolution effects from redshift effects.The galaxies selected are low-redshift galaxies within our final sample of galaxies where a good fit was obtained, ensuring the initial low- measurements are reliable.To do this simulation we use the redshifting code AREIA5 .We give here a brief overview of the steps taken are described below, for a more detailed discussion we refer the reader to Tohill et al. (2021) and Whitney et al. (2021).
First, the source is extracted from the original stamp by measuring a segmentation map with GalClean6 .Then, the image is geometrically re-binned from the source redshift to the target redshift based on the standard cosmology, preserving its flux.This is done to ensure that higher redshift sources have the appropriate geometric scaling due to the adopted cosmology.The resulting image from the rebinning has its flux scaled due to the cosmological dimming effect.Furthermore, shot noise is sampled from the source new light distribution, which is then convolved with the target JWST PSF of the rest-frame filter in the target redshift.Then, the final redshifted source is placed on a random real CEERS background to mimic a real observation.This results in a sample of 2418 simulated images.
Although AREIA allows the user to include size corrections and brightness corrections to mimic redshift evolution, we keep all intrinsic properties of the galaxies constant at each redshift, simulating only observational effects.Ferreira et al. (2022b).At all redshifts, we find that spheroid galaxies have the smallest radius, and the highest Sérsic index, displaying the compact, concentrated nature of these objects.

EPOCHS VI
Following the same method described in subsection 4.1, we measure the sizes and Sérsic indices of our new simulated galaxies.We follow the same selection method, described in section 5 to ensure robust results.For our Sérsic index analysis, we further select objects with a GALFIT measured magnitude < 30.
We also analyse the trends for objects with values above and below the overall median, indicated by the 'high' and 'low' radius and Sérsic groups.
We use the emcee package to obtain the lines of best fit (Foreman-Mackey et al. 2013).As shown in Figure 13, we find that our sizes and Sérsic indices are best fit by linear fits, with gradients and errors stated in Table 3.We find our results are consistent with "flat slopes" -that is we find no change with redshift for the measured sizes and Sersic indices within these simulations.This shows that the simulated galaxies continue to have very similar size measurements at different redshifts, meaning that in the absence of evolution we would expect the same galaxies to have the same effective radii and Sersic indices measured at all redshifts.This confirms that our method recovers the same result regardless of the redshift in which the galaxy is observed.This is vastly important for this work, and shows that the trends obtained in section 6, are due to a change in galaxy properties and (a) Median simulated galaxy sizes at each redshift.The gray points and line show this change for the simulated galaxies for the total simulated sample, whilst the red and blue are for those galaxies that are larger and smaller, respectively, than the median radius.
(b) Median Sérsic index of simulated galaxies at each redshift.The lines are the same as explained in the plot of effective radius with redshift (Figure 13a).populations with increasing redshift, not because the galaxies appear smaller at higher redshifts due to cosmological or redshift effects.

DISCUSSION
The results in this paper reveal a myriad of new observational facts about the sizes and shape evolution of galaxies up to  ∼ 8. Our core result is that galaxies become progressively smaller at fixed stellar mass at higher redshifts.Whilst this was known for some time up to  ∼ 3 (Trujillo et al. 2007;Buitrago et al. 2008;van der Wel et al. 2012;van Dokkum et al. 2010), JWST is now allowing an analysis of this in the rest-frame optical light at higher redshifts than previously possible.In fact, we find that from  ∼ 7 to  ∼ 3, galaxies with larger masses roughly double in size.This evolution is not as dramatic as is seen for the highest mass galaxies at  < 3 (e.g., Buitrago et  2008) and we will require that more area be covered before we have statistics to probe these very high mass galaxies at such high redshift, as very few are imaged due to the limited numbers and sky coverage with existing and reliable JWST data.
Another major result is that we find very little variation in the size distribution and size evolution for our sample at  > 3 irrespective of how we divide the sample.This is true for different Sérsic cuts, as well as for different cuts in the sSFR for these galaxies.One of the main signatures of the formation of the Hubble sequence is a bifurcation in galaxies into morphologically distinct populations of star-forming and relatively passive systems (often simply divided into discs and ellipticals).Whilst we are not seeing the entire formation of this Hubble sequence at  ∼ 3, it is clear that the major bifurcation begins at this epoch and increasingly differentiates itself.This is likely due to different formation mechanisms coming into play at  < 3 that were not present at the higher redshifts.This is likely something to do with mergers and feedback from either AGN or star formation (e.g., Bluck et al. 2012), particularly inside out star formation, where star forming galaxies begin to grow more rapidly than their passive counterparts.
While we do not see much difference between galaxies at  > 3, we do find that galaxies have a well established difference in size and Sérsic index as a function of the wavelength of observation.This is such that galaxies are more compact in redder wavelengths, showing that the outer parts are made up of more recent stars and star formation events.This could be a sign that galaxies are forming inside-out and that we are witnessing the formation of bulges and the cores of giant galaxies at these early times that are growing outward from minor mergers and/or accreted gas in star formation (e.g., Tacchella et al. 2015aTacchella et al. ,b, 2018;;Nelson et al. 2016;Nelson et al. 2019;Wilman et al. 2020;Matharu et al. 2022;Roper et al. 2023).This is consistent with the formation of bulges and disks occurring gradually at about  ∼ 3 and at lower redshifts (e.g., Margalef-Bentabol et al. 2016).
We also find that there is a correlation between stellar mass and size for galaxies -those that are star forming and those that are more passive -up to  ∼ 3. Our JWST results allow us to accurately probe these mass ranges, even down to the lower redshifts where HST has had a difficult time resolving these systems.This is another indication that something is regulating the sizes and masses of galaxies, and that this appears to be more present at  < 3 than at earlier times.When we compare with simulations, such as the TNG50 simulation (see Figure 5) we find that there is a good agreement.What remains to be determined is the causes of this change and the physics behind the mergers that produce these size increases within these simulations.

SUMMARY AND CONCLUSIONS
We present an analysis of 1395 carefully selected massive galaxies with stellar masseslog ( * / ⊙ )>9.5, within the CEERS and CAN-DELS fields between z = 0.5 and z = 8, using light profile fitting.Our galaxy sample is taken from the CANDELS field to enable the use of robust masses, redshifts, and star formation rates from optical to NIR data, which is necessary to obtain accurate measurements for galaxies over our large redshift range.In this paper we fit single Sérsic profiles to our galaxies and analyse their effective radius and Sérsic index to probe the evolution of these properties through cosmic time.We also probe the variation in size and shape as a dependence on stellar mass and wavelength.The trends we find are robust to redshift effects as we show through simulations of placing low redshift galaxies at high redshift that parameters would not change simply due to being at higher redshifts as imaged with JWST.
To carry out our analysis we use a custom built GALFIT pipeline, fitting a single Sérsic fit, and fitting neighbouring galaxies where appropriate.We verify this method via a comparison with another galaxy profile fitting code, IMFIT and find that our results are generally in good agreement between these two codes, and thus reliably measured.We then analyse the evolution of effective radius and Sérsic index, for our sample as a whole, and for quiescent/star-forming population, and high/low Sérsic index populations.We verify that our results are due to evolutionary effects rather than redshifting effects, by fitting the same galaxies redshifted to higher redshifts using the exact same process we apply on the real galaxies.We find "flat" relations between measured parameters and redshift, thus confirming the robustness of our method and its ability to recover the correct measurements regardless of the distance to the objects.
Our main findings are as follows: • Galaxies at the mass ranges we probe become smaller with increasing redshift as R  ∼ (1 + ) −0.71±0.19, with high and low Sérsic index populations showing differing evolution only at  < 3, showing that this aspect of the 'Hubble Sequence' and galaxy bifurcation starting to be established at  ∼ 3.At higher redshifts we find no significant differences in the pattern of sizes with various properties.
• Sérsic indices decrease on average with increasing redshift, suggesting a higher proportion of "disc-like" galaxies in the early universe.We find that passive and star-forming populations of galaxies show a different evolution of Sérsic index with redshift, with starforming galaxies hovering around value of  = 1, suggesting that most star formation occurs within disc-like galaxies, at least in terms of structure.We cannot however rule out that some of these galaxies are involved in mergers.In principle this confirms what has been found when classifying galaxies visually (Ferreira et al. 2022b).
• In general, more massive galaxies have a larger effective radius up to at least  ∼ 3 compared to lower mass galaxies.At redshifts higher than  ∼ 3, at a fixed stellar mass, star-forming and quiescent galaxies have very similar sizes, suggesting that quiescent galaxies may not be smaller than star-forming galaxies at fixed stellar mass at high redshift, which has been a finding for almost 20 years at  < 3 (e.g., [ Buitrago et al. 2008).
• We find that galaxies appear more compact and smaller when observed in redder filters, and demonstrate the mass dependence of this effect, where more massive galaxies are more compact in redder filters than their lower mass counterparts.
• We find that visually classified spheroid galaxies are smaller than other galaxy types at all redshifts, and have a higher Sérsic index at all redshifts than other galaxy types, showing their small, compact nature.
• We verify that our results are due to real evolutionary effects only, shown by fitting simulated high redshift galaxies with the intrinsic properties preserved, and we recover results that do not vary based on redshift effects.
Overall, this paper, we show that the evolution of galaxy size and structure continues to the highest redshifts, with disc-like galaxies forming most stars within the universe at all epochs.High redshift morphology studies are revealing a new picture of the structural evolution of galaxies, which will continue further with increasing numbers of high-redshift galaxies being discovered with JWST.
This study is just the start of this type of analysis.The benefit of the CEERS fields is that we have very accurate photometric redshifts at lower redshifts, due to overlap with existing HST observations.In the near future this will be available for many other and larger fields where more subtle changes in the size and structural features of galaxies will be studied, and this will lead to a more complete understanding of galaxy evolution over nearly the universe's entire history.

Figure 1 .
Figure 1.Plot showing the rest-frame wavelengths at given redshifts, for all filters used within the JWST CEERS NIRCam observations, and used within this paper.The shaded regions show the selected filter we use to observe sources in the rest-frame optical at the given redshift.The grey dashed lines show where the filter we use to provide a rest-frame optical view of galaxies at different redshift changes.

Figure 2 .
Figure 2. Plots showing the Kron radii (semi-major axis of the Kron ellipse),where the red radius in each image is that of the primary source, and the blue radii are those of the neighbouring sources.We give several scenarios for how these systems would be found.Left: No other sources would be fit simultaneously to the primary, and the pixels belonging to all other objects according to the SExtractor segmentation maps are masked.Right: The source where the Kron radius overlaps that of the primary source are simultaneously fit in this instance, and all other sources would be masked.Cutout sizes shown are 6" × 6".

Figure 3 .
Figure 3. Example fits showing the data image, model image, and the residual image (data -model).Each cutout is 6" × 6".The redshift of each galaxy is shown to the left of the images.

Figure 4 .
Figure 4.A comparison of two measures -size (top) and Sérsic index (bottom) -as obtained with GALFIT and IMFIT.The black line shows the one-to-one relation.Size and Sérsic index are generally in good agreement, with more variation at larger values, in particular with the Sérsic index.We show the outlier rate ( ) and NMAD in the bottom right corner of each figure.

Figure 5 .
Figure 5. Power law fit showing the size evolution of all galaxies within our sample, compared to results from previous work.The black dashed line is of the form   ( ) = 4.50 ± 1.32(1 + ) −0.71±0.19, with the grey, shaded area showing the error on the power-law fit.The grey diamond points are the median galaxy sizes in each redshift bin, and error bars are 1 in length.The previous work shown in the Figure uses HST data, except for Yang et al. (2022), which uses JWST data, van Dokkum et al. (2010) which uses NOAO/YaleNEWFIRM Medium Band Survey Data, and Costantin et al. (2023) which uses the TNG50 simulation to produce mock CEERS observations.We note that we only plot redshift errors forBridge et al. (2019), as radius errors are not provided.

Figure 6 .
Figure 6.Power law fit for galaxies separated into two groups by Sérsic index as defined at  = 2.The sizes of these objects mostly diverge at the lowest redshifts.The diamond points are the median galaxy sizes in each redshift bin, and the error bars are 1 in length.The shaded region around each line represents the error on the power-law fit.

Figure 7 .
Figure 7.The Sérsic index -redshift relations for all galaxies (top), and passive and star forming populations (bottom).Large diamond and circle points are median values of each redshift bin, with error bars 1 in length.The grey dashed line at  = 1 represents the special case of the exponential disc profile.

Figure 8 .
Figure8.The size-stellar mass distribution of 'passive' and 'star forming' galaxies, separated by the midpoint of the specific star formation rate in each bin, as explained in subsection 6.2.The larger points represent the median values in mass bins.We use a 50% error floor, with error bars one standard deviation in length, or representing a 50% error on the median, in cases where the error floor is larger.

Figure 10 .
Figure 10.A comparison of sizes measured in the F444W and F150W filters,showing that F444W sizes are smaller than those in F150W.This implies that galaxies measured in the near-infrared with JWST are more compact than rest-frame optical sizes previously measured with HST.Galaxies at cosmic noon (1 ≤  ≤ 2.5) are shown as red diamonds whilst the grey hexagons represent all galaxies within our sample that do not fall within cosmic noon.These latter galaxies do not show the same effect.The black, dashed line is the one-to-one relation.The error bar represents a typical error of 0.2 dex on these measurements.

Figure 11 .
Figure 11.Size comparison for different mass bins at cosmic noon.We find an increasing size difference with increased stellar mass.The black dashed line is the one-to-one relation.The percentages in the upper left corner are the average size difference for each redshift bin.The error bar represents a typical error of 0.2 dex.
Sérsic index (n).The grey dashed line represents the  = 1 profile of an exponential disc.

Figure 12 .
Figure 12.Size and Sérsic index evolution as a function of visually determined morphology fromFerreira et al. (2022b).At all redshifts, we find that spheroid galaxies have the smallest radius, and the highest Sérsic index, displaying the compact, concentrated nature of these objects.

Figure 13 .
Figure 13.Sizes and Sérsic indices for our simulated galaxies in redshift bins.The error bars represent the standard error of the median for each bin.We divide these samples into different sub-types to determine how different selections would evolve differently within these simulations.

Table 2 .
Average 5 depths in our reduced CEERS images, for point sources in 0.32" diameter apertures.Depths are calculated by placing random apertures in regions of the image that are empty based on the final segmentation maps.
Between Stage 1 and Stage 2, we subtract templates of 'wisp' artefacts from the F150W

Table 3 .
al. Best fit results for linear fits to all galaxies, and high and low subsets for both radius and Sérsic index.The slopes are consistent with being flatwith little change with redshift, showing that our main findings are due to evolutionary effects, not redshift effects.