GLU: a software package for analysing continuously measured glucose levels in epidemiology

Abstract Continuous glucose monitors (CGM) record interstitial glucose levels ‘continuously’, producing a sequence of measurements for each participant (e.g. the average glucose level every 5 min over several days, both day and night). To analyse these data, researchers tend to derive summary variables such as the area under the curve (AUC), to then use in subsequent analyses. To date, a lack of consistency and transparency of precise definitions used for these summary variables has hindered interpretation, replication and comparison of results across studies. We present GLU, an open-source software package for deriving a consistent set of summary variables from CGM data. GLU performs quality control of each CGM sample (e.g. addressing missing data), derives a diverse set of summary variables (e.g. AUC and proportion of time spent in hypo-, normo- and hyper- glycaemic levels) covering six broad domains, and outputs these (with quality control information) to the user. GLU is implemented in R and is available on GitHub at https://github.com/MRCIEU/GLU. Git tag v0.2 corresponds to the version presented here.


Introduction
Epidemiological and clinical studies interested in circulating glucose as a risk factor or outcome typically measure levels in the blood (fasting, non-fasting and/or post-oral glucose) at a single or widely spaced time-points (e.g. every few years). [1][2][3][4] Although these are important health indicators, there has been an increasing appreciation that glucose levels and variability in free-living conditions during both the day and night, may also provide important health measures in clinical (e.g. diabetic or obese) and 'healthy' populations. [5][6][7][8][9][10][11] Continuous glucose monitoring (CGM) systems measure interstitial glucose levels by implanting a sensor subcutaneously. 12 Typically, finger prick blood glucose measurements are needed to calibrate the interstitial glucose levels to capillary blood glucose levels, although devices that do not need this calibration step are now becoming increasingly available. 12,13 Throughout this paper we refer to the sensor predicted capillary glucose levels as 'sensor glucose'.
CGM systems were initially used in research evaluating their potential value in patients with diabetes. 8,9,[11][12][13][14][15][16][17][18][19] For instance, some studies have assessed the accuracy of CGM as a proxy measure of blood glucose, 13,14,18 whereas others have assessed the effectiveness of CGM in the management of type I or type II diabetes. 11,15,16,19 As a result CGM is now increasingly used in the management of type I and type II diabetes. 20,21 More recently, CGM has been used in a wider range of epidemiological studies, where the aim is to understand the relationship of characteristics in these CGM data with other health traits and disease. For instance, CGM has been used to measure glucose levels 'continuously' over a number of days to identify hypoglycaemia in those receiving intensive care, and in 'healthy' populations to explore whether it can be used to identify groups at increased risk of diabetes, including gestational diabetes. [22][23][24][25][26] Unlike the glucose level at a single time-point providing only a 'snap-shot' of glycaemic control, or glycated haemoglobin that gives a single measure indicating mean glucose levels over a period of weeks, researchers can use these CGM data to assess how interstitial glucose levels vary across the day and night for several days or weeks and identify determinants of this variation and its health impact. [22][23][24][25][26] Researchers using CGM data tend to first derive summary variables that are then used in their subsequent analyses (e.g. exploring the association of these summary variables with later health outcomes). Summary variables might include area under the curve (AUC) (i.e. the average glucose level over time) or time spent in low, medium or high levels. Although there are a set of variables that may be commonly derived in CGM studies there are increasing examples of studies addressing broadly similar research questions but deriving different summary variables. For example, we found two papers assessing glycaemic variability in non-diabetic people, one that included morbidly obese participants 22 and the other that included healthy people. 27 Whilst both of these studies used standard deviation (SD), coefficient of variation (CV) and mean amplitude of glycaemic excursions (MAGE) as measures of variability, the one in morbidly obese people also used mean of daily differences (MODD) 22 and the other used mean absolute rate of change (MARC). 27 These two studies illustrate that (i) several measures of variability can be derived from CGM data and it is important to justify which are used and differences between them, which neither of these papers did, and (ii) we would want consistent measures to be used across studies. Even when different studies derive a variable representing the same fundamental property it may be defined differently, e.g. using different thresholds to define hypo-, normo-and hyperglycaemia. 5,22 This lack of consistency across studies, together with insufficient reporting of study methods, means that it is difficult to interpret results. It is also difficult to seek replication or pool study results in meta-analyses when varied measures are derived. [5][6][7][8]11,[28][29][30] For example, a recent review that compared studies according to the proportion of time in hypo-normo-and hyper-glycaemia was limited because researchers used different thresholds or did not include these measures at all. 22 It is also unclear whether researchers derive many summary variables but only present those for which analysis supports their hypothesis, such that the evidence published in the literature and on which clinical decisions are based may be biased. 31 The American Diabetes Association recently suggested some summary statistics [such as the CV to assess variability and proportion of time in ranges (hypo-, normo-and hyper-glycaemia)] to assess glucose control in patients with diabetes, but acknowledged further research was needed to establish which summary measures are most useful even in diabetes patients. 8 Outside this guidance we are unaware of any that has been suggested for the broader use of CGM in epidemiology; nor are we aware of any general epidemiology research tools to systematize analyses of CGM data.
In this paper, we present GLU, a general opensource software package for processing CGM data, for use by researchers wishing to assess the relationship of characteristics in CGM data with other traits and disease, using data from any study design, including prospective cohort studies or randomized trials, of general or clinical populations. The widespread use of this software across different research studies will help to identify the key measurements from CGM that have most clinical relevance in different contexts and groups of patients, and in time potentially result in the most efficient and effective use of CGM in clinical practice. GLU performs quality control and derives a set of glucose characteristics (illustrated in Figure 1), that can be used in subsequent analyses. Use of a common tool will help to standardize methods across research studies. Hence, in the future it will be easier to compare and metaanalyse results across studies, and perform replication analyses. An open source tool also improves transparency of methods as all code is freely available, aiding interpretation of results. Furthermore, we intend to update GLU as methods advance. The presentation of this tool is timely as CGM is beginning to be widely adopted in epidemiological research, including both observational studies and randomized controlled trials. [22][23][24][25][26] Implementation GLU is implemented in R and requires the following R packages: optparse, ggplot2, stringr [see GitHub repository (https://github.com/MRCIEU/GLU) for package versions]. GLU supports CGM data from the Medtronic iPro2, 32 Abbott Freestyle Libre 33 and Dexcom G6 34 CGM devices, specified using the 'device' argument. GLU can be used with CGM data from other devices by converting to a general format and specifying the device as 'other'. GLU is run by specifying two directories; the location of the CGM data files and the location where derived data (e.g. summary variables and plots) should be stored. The CGM data is processed in two main stages: (ii) quality control, and (ii) deriving summary variables (illustrated in Figure 1). GLU allows the user to specify optional arguments, and these include: • nightstart and daystart: specifies the start time of the day-time and night-time periods within each day period to accommodate different populations (e.g. an early bedtime may be more appropriate for studies of children). By default, night-time is between 11.00 pm and 6.30 am. If other times are used then this should be reported. • firstvalid and dayPeriodStartTime: specifies the start time of each day period, either the time corresponding to a participants first sensor glucose value (hence specific to each participant), or the time specified in the dayPeriodStartTime argument (hence the same across participants). By default this is set to the night-time period start time. • pregnancy and diabetes: indicates that the data pertains to pregnant women or diabetic patients, respectively, such that summary variables specific to these populations are derived (i.e. the thresholds used to determine the time spent in hypo-, normo-and hyper-glycaemia levels, described in the 'Deriving glucose summary variables' section below). If neither of these options is selected summary variables are produced that assume participants are from a 'general population' without selection for pregnancy or diabetes. • impute: specifies that GLU should perform 'approximal' or 'other day' imputation, rather than restricting to 'complete days', as described in the 'CGM data quality control' section below.
GLU generates a comma-separated value (CSV) file of derived summary variables, which can be imported into statistical software for analysis.
CGM data quality control GLU performs quality control to help researchers ensure the integrity of the data, consisting of three automated steps: resampling, outlier identification and dealing with missing data (illustrated in Supplementary Figure 1, available as Supplementary data at IJE online). GLU also provides plots for manual review of the CGM data after these automated steps.

Resampling
We resample the sensor glucose values across each participant's CGM sequence to 1-min intervals using linear interpolation (i.e. assuming a straight line between values at adjacent time-points), to facilitate computation of summary variables. Given two adjacent time-points t 1 and t 2 , with sensor glucose values SG 1 and SG 2 , respectively, linear interpolation estimates the glucose value of time-point t 0 where t 1 t 0 t 2 as:

Outlier detection
Previous work has suggested that outliers can be detected by identifying time-points that are more than two standard deviations (SD) from the sensor glucose values at both the previous and subsequent time-points. 5 However, as noted previously, 6,24,35 glucose levels may not be normally distributed, so SD may not be an appropriate measure of variability. Furthermore, this approach is sensitive to the resolution of the glucose trace such that changes in resolution would affect which regions of a glucose trace are marked as outliers. This is because SD is invariant to changes in sampling frequency of a glucose trace, whereas the difference in glucose levels between adjacent timepoints is not. For example, if sensor glucose is recorded every 1 min rather than every 5 min then the difference in glucose levels between adjacent time-points will be smaller but the overall distribution of sensor glucose values, and hence the outlier detection threshold (based on the SD of this distribution), will not change.
Using data described in our usage example (see Usage section below), we visually assessed the distribution of sensor glucose values for each participant and found these distributions to be very variable-some were normally distributed whereas others were skewed. We therefore base our outlier detection on the distribution of the differences of adjacent sensor glucose values rather than the distribution of sensor glucose values. We found that the distributions of the difference of adjacent sensor glucose values were more consistently normally distributed compared with the distributions of sensor glucose values. Also, using the differences of adjacent values means that this approach is invariant to changes in the resolution of a glucose trace. We use a threshold d, of kÂSD of a participant's distribution of differences between adjacent values. 36 Time-points with a glucose value that deviate more than d from the value at both the previous and subsequent time-points, are marked as outliers for further consideration by the researcher. We chose a threshold of 5ÂSD based on experimentation with our example data (see Supplementary Section S1, available as Supplementary data at IJE online for further details). Users can also change the value of k using GLU's outlierthreshold argument (see GLU GitHub repository for details), to make the outlier detection more conservative or lenient. Should outliers be detected and confirmed by visual inspection of the glucose trace then researchers may wish to: (i) use other data such as diet diaries to determine whether detected outliers may be due to some underlying cause such as food intake (rather than erroneous), and (ii) perform sensitivity analyses to see the effect that removing identified outliers has on their results. Our outlier detection method uses a threshold determined using artificial outliers because we have no CGM data containing clear (erroneous) outliers on which to base our approach (Supplementary Section S1, available as Supplementary data at IJE online). As CGM becomes more widely used, it will be possible to improve detection of outliers using outlier examples, and we plan to update GLU outlier detection as the field matures.
Assessing the impact of missing data assumptions CGM data may have missing time periods when the device is unable to record an interstitial glucose value, for example, if the device becomes displaced. When missing periods do exist, there may be systematic differences between the missing and observed values in the CGM data, such that the derived GLU summary variables may be biased. For instance, if sensor displacement (or removal) occurs during swimming and swimming is associated with low glucose values, then a swimmer's average glucose levels estimated using the observed data may be higher compared with the true underlying value. Under those circumstances associations of the GLU summary variables with a potential outcome or a risk factor may be biased. Alternatively, the CGM missing time periods may be missing completely at random-for instance, some technological failures of CGM devices may be due to chance. We note that there are two related but distinct biases when using GLU-derived summary variables: (i) bias of the derived values of participants GLU summary variables, and (ii) bias in subsequent analyses using these summary variables. Bias from the former does not necessarily cause bias for the latter as this depends on the specific analyses performed.
GLU provides three approaches to help address missing data called 'complete days', 'approximal imputed' and 'other day imputed', that make different missingness assumptions. GLU's complete days approach uses only days with complete sensor glucose values to derive glucose characteristics [e.g. 24 (h) x 60/5 ¼ 288 values when using CGM data with 5 min intervals]. If the days of CGM data are missing completely at random (MCAR days ) such that there are no systematic differences between the days with and without missing CGM data, then the derived CGM statistics will be unbiased, hence this missingness will not bias results of subsequent analyses. 37 The MCAR days assumption of the 'complete days' approach may be violated. For example, characteristics of the participants such as their age or employment status may influence whether or not they complete the required number of capillary blood tests or the likelihood of the CGM device being displaced. However, even when MCAR days does not hold, analyses using GLU's complete days statistics may still be unbiased depending on the specific further analysis in which they are used. 37 In general, imputation may help to reduce the amount of excluded data and relax the missing data assumptions, such that missing at random (MAR) [or sometimes missing not at random (MNAR)] may be assumed rather than MCAR. 37 However, glycaemic control is influenced by several characteristics such that imputing portions of a glucose trace is non-trivial.
GLU includes two simple imputation approaches that fill in the missing periods using non-missing regions of a participant's data. We refer to these approaches as 'approximal imputation' and 'other day imputation', and both require that a day has at most 6 h missing data to be considered for imputation. The approximal imputation approach fills in the missing periods using non-missing regions near to the missing region, within the same day. This approach splits the missing period in half, and uses the sensor glucose data on the left to fill in the left half, and the sensor glucose data on the right to fill in the right half, as illustrated in Supplementary Figure 2, available as Supplementary data at IJE online. Formally, given a missing period of sensor glucose values fSG i , SG iþ1 . . . SG jÀ1 , SG j g, fSG i . . . SG k g is replaced with fSG 2iÀkÀ1 . . . SG iÀ1 g and fSG k' . . . SG j g is replaced with fSG jþ1 . . . SG 2jþ1Àk 0 g, where k¼iþfloor((jÀiþ1)/2)-1 and k 0 ¼k þ 1 are the end start indexes of the first and second halves of this missing period, respectively. Each missing sequence must be less than 2 h long to be considered for imputation.
The 'other day imputation' approach fills in the missing periods using data from the same time period on a different day of the same participant's data. For each missing period, this approach first identifies all days for this participant where the same time period has complete data. GLU then randomly selects one of these days and then replaces the missing period with this day's time-matched data. For both the approximal and other day imputation approaches the imputed data is labelled such that the transitions between non-imputed and imputed sections (and transitions between the left and right halves of imputed sections for the approximal imputation approach), are not incorporated into summary variables-i.e. only transitions within sections are incorporated (see Supplementary Figure 2).
Approximal and other day imputation may help to reduce bias in the derived CGM statistics and hence bias in subsequent analyses that use these statistics. Under the assumption that the region used to impute each missing period is representative of that particular missing period, then CGM statistics derived from imputed data may be less biased. In particular, the approximal approach assumes nearby time periods on the same day are representative of the missing period, whereas the other day approach assumes that glucose patterns over a 24 h period are broadly similar such that regions at the same time on alternate days are representative of the missing period. It may however be more likely that missing regions are systematically different to the non-missing regions used for the imputation. For example, if a device is unable to record very high glucose values then the non-missing glucose values used to impute the missing region will be systematically lower. In this case approximal or other day imputation may still help to reduce bias in the derived CGM statistics. This is because, if days with missing data are systematically different to days without missing data then approximal imputation will enable information from (the non-missing time periods on) these systematically different days to be incorporated into the derived summary variables. Similarly, if the CGM data are MCAR day (i.e. the days of CGM data are MCAR as described above) then the summary variables derived using approximal or other day imputed data will be unbiased and more precise than the complete days version.
By default, GLU uses the complete days approach. Users can use the approximal or other day imputation approaches by running GLU with the imputeApproximal or imputeOtherDay arguments, respectively. A researcher wishing to apply another imputation approach to their data (e.g. mean imputation, if appropriate) can do this prior to running GLU. In the rest of this paper we refer to days with complete CGM sequences (after imputation if this option is used) as the set of included days. We would suggest that researchers run their analyses using both complete days and imputed data (both approximal and other day versions) and present all results from further analyses so that over time we can learn more about the nature of CGM missing data and its impact on different research questions. It is important to note that these are 'simple' imputation approaches that fill in missing data prior to any analyses, such that standard errors in subsequent analyses using these imputed data may be underestimated.

Manual review
In the Data visualization section we describe two plots generated by GLU; these can be used to further check data validity (see Usage section for a description of how we do this in our example).

Deriving glucose summary variables
After quality control, GLU derives a set of summary variables illustrated in Figure 1. A full list of summary variables computed by GLU is given in Supplementary  Table 2). 8 GLU also provides the average of each summary variable across all days for each participant to give a single overall value for each summary variable (for each participant). For example, GLU returns the following AUC statistics: (i) AUC for each included day (24 h), (ii) AUC for each included day for the night-time period, (iii) AUC for each included day for the day-time period, (iv) mean AUC over all included days (based on a 24-h day), (v) mean AUC of night-time periods over all included days, and (vi) mean AUC of day-time periods over all included days. The daily statistics provided by GLU allow variability both between and within days to be assessed.
Glucose summary variables output by GLU were chosen to represent broad categories of glucose characteristics that reflect a set of six broad domains that might, independently of each other, relate to outcomes or be influenced by exposures (including interventions in randomized controlled trials). Supplementary Table 1, available as Supplementary data at IJE online, lists these variables, together with other variables that have been included in some publications but are not included here (together with our reasons for not including them). The six broad domains are: overall glucose levels, overall variability (dispersion), excursions (deviations from 'normal'), variability from one moment to the next, fasting levels and post-event levels. GLU includes one variable from each domain and brief explanations for these choices are given in Supplementary Table 1. For example, we considered three measures of dispersion that have been used in previous publications-SD, CV and median absolute deviation (MAD); GLU includes only MAD. This is because sensor glucose values may not be normally distributed and the number of sensor glucose values across which GLU will calculate dispersion will be low (e.g. 1 day contains 288 values for 5 min epochs), meaning that SD and CV are unlikely to be credible measures of dispersion. All GLU summary variables are independent of the length of the time period for which they are calculated.

Overall glucose levels
Overall glucose levels are characterized by the AUC, and specifically GLU derives the mean AUC per minute so that these levels are comparable across time periods of different lengths (e.g. night-time vs day-time). 8 For each day, the AUC is calculated using the trapezoid method, 5 as the sum of the area of the trapezoids created using linear interpolation between sensor glucose values at adjacent time-points (as described above). We divide by the number of minutes in the time period (e.g. 1440 for whole days) to give the average glucose level (mmol/L) per min.

Proportion of time in hypo-, normo-and hyper-glycaemia
We calculate the proportion of time spent in hypo-, normoand hyper-glycaemia. 8,25,38 In patients with diabetes GLUs default for hypo-glycaemia is <3.9 mmol/L and for hyperglycaemia is !10.0 mmol/L (with normo-glycaemia defined as !3.9 mmol/L to <10 mmol/L). 20,39 In a 'healthy' (non-diabetic) and non-pregnant population hypo-glycaemia is defined as <3.3 mmol/L 40 and we use the diabetes threshold (!10 mmol/L) to define hyper-glycaemia, such that normoglycaemia defined as !3.3 to <10 mmol/L. For 'healthy' (non-diabetic) pregnant women we use the recommended targets of glucose control during pregnancy for both type 1 and type 2 diabetes of 3.5-7.8 mmol/L, such that hypoglycaemia is defined as <3.5 mmol/L and hyper-glycaemia is defined as !7.8 mmol/L. 20 The 7.8 mmol/L threshold is also consistent with other guidelines such as the UK National Institute for Health and Care Excellence (NICE) 2-h postprandial threshold for diagnosing gestational diabetes. 41 As already described, these diabetic and pregnancy specific thresholds can be specified using GLU's diabetic and pregnancy arguments, respectively. Because thresholds for defining hypo-and hyper-glycaemia (in 'healthy', diabetic and pregnant populations) vary geographically and overtime, [42][43][44] and differ for other groups (e.g. patients in intensive care units 25 ), GLU also allows users to specify other thresholds. For instance, a study in a diabetic population may wish to use the <3.0 mmol/L hypo-glycaemia threshold recommended by the International Hypoglycaemia Study Group for clinically significant biochemical hypo-glycaemia. 44 However, since GLU is intended to provide standard measures that can be compared (and as appropriate pooled) across studies, where researchers do this a clear justification should be given.

Overall variability
Although SD and CV are widely used measures of glucose variability, 9,30 as discussed above, the distribution of sensor glucose values for a given participant may not be normally distributed. For this reason we use the MAD as a measure of overall variability of sensor glucose levels, defined as: Thus, after calculating the distance of each sensor glucose value from the median value, MAD is the median of these distances.

Variability from one moment to the next
We capture variability in a person's glucose levels across time using a measure based on the length of the line of a glucose trace (i.e. as if the peaks and troughs were stretched out into a line). This idea was recently suggested for CGM data 45 and previously proposed as a measure of complexity for time-series analyses in general. 46 Intuitively, if you stretch out a glucose trace then the resultant straight line will tend to be longer when a trace has a larger overall variability (represented by MAD) and is more complex (a higher number of peaks, valleys and values 46 ) see Supplementary Figure 3, available as Supplementary data at IJE online and 46 for examples. This is distinct from MAD because, unlike MAD, the length of the line is affected by the order of the sensor glucose values, i.e. how the sensor glucose values change from one moment to the next (see Supplementary Figure 4, available as Supplementary data at IJE online). Glycaemic variability percentage (GVP) 45 is a rescaling of the average length of the line per minute such that a trace with no variability (i.e. a constant trace) has a GVP of zero. A trace with a GVP of 100% would imply that the length of the trace is double the length of a straight glucose trace. We adapt this measure to capture complexity but not overall variability (in line with 46 ), as overall variability is captured by MAD. We standardize each glucose trace prior to deriving GVP by subtracting the median and dividing by the MAD. We refer to the GVP calculated using the standardized glucose levels as standardized GVP (sGVP). Formally, sGVP is defined as: and T and SG s are vectors of timestamps and standardized sensor glucose values, respectively, of length N. 46 For example, given a day period of CGM data after resampling to 1 min epochs then T(10) and SG s (10) are the tenth minute in this day and the standardized sensor glucose value on this tenth minute, respectively.
This measure satisfies three useful properties: (i) invariance to the intervals between time-points, (ii) invariance to differences in overall variability and (iii) invariance to differences in the duration of the CGM trace. The first property is satisfied by using a measure based on the length of the line (see Supplementary Figure 5a, available as Supplementary data at IJE online) and means that results of work using different intervals can be compared or metaanalysed. The second property is satisfied by standardizing the CGM trace before calculating GVP (see Supplementary  Figure 5b, available as Supplementary data at IJE online) and means that associations with sGVP are not due to a relationship with overall variability (i.e. MAD). The third property is satisfied by dividing by the total duration in the above equation (see Supplementary Figure 5c, available as Supplementary data at IJE online) and means that variability from one moment to the next can be compared across time periods of different length.

Fasting glucose proxy
Although fasting glucose levels have previously been approximated using CGM data, the methods used to derive this measure can be unclear. 47,48 In studies where meal times are known, fasting glucose levels may be inferred using CGM data recorded before breakfast or after at least 7 h fasting, 5,26,49 e.g. using the mean of the six consecutive values (with 5 min intervals) before breakfast. 26 Others have used glucose levels during particular periods of the night-time as fasting levels, when meal times are not known. 50 This can be problematic if participants eat during the night-time period, 5 which occurs in an important minority who may be different in terms of their health and health-related behaviours to those who do not eat during the night. 51 GLU derives a general proxy measure of fasting glucose that does not require knowledge of meal times, calculated as the mean of the 30 lowest consecutive minutes (equating to 6 CGM values at 5 min intervals) during the night-time.

Event statistics
Studies may ask participants to report their meal times and where this is the case GLU will generate three statistics describing subsequent glucose levels: time to peak, and glucose levels 1-and 2-h post-prandial. 5 Time to peak is calculated as the number of minutes from the meal to the next peak in sensor glucose values-i.e. the nearest subsequent sensor glucose value SG t at time t where SG t1 < SG t > SG t2 , and t1 and t2 are the nearest previous and subsequent time-points to t, respectively, where SG t 6 ¼ SG t1 and SG t 6 ¼ SG t2 . We cannot simply find the time-point with a higher glucose value than the time-points directly before and after, as the peak may consist of a plateau where multiple time-points have the same value.
The 1-h and 2-h post-prandial glucose measures are calculated as the AUC during the 15-min period 1-and 2-h, respectively, after the meal was recorded. We also calculate the 1-and 2-h AUC for exercise and medication events, when this information is available. In addition to deriving the average of these summary variables on each included day, and across all included days, the saveevents argument can be used to output the summary variables for each event. This can be useful where the number of events across days is highly variable such that averaging within and across days may not be appropriate.

Data visualization
The following plots are produced by GLU.
• Sensor glucose trace plots for all participants that can be visually inspected. This plot also includes indicators of events (where these are provided) including the timing of a meal, exercise, use of relevant medications and capillary blood glucose measurement levels. Identified outlier values and imputed time periods (as described above) are also shown on these plots. • Poincare plots to illustrate the stability of each participants blood glucose levels. 10,35,38 Each point on a Poincare plot is the sensor glucose level at time-point t (on the x-axis) against the sensor glucose level at timepoint t þ 1 on the y-axis. Thus, where a participant's sensor glucose levels change slowly their Poincare plot will be aligned along the ascending diagonal, but those with erratic (and potentially erroneous) sensor glucose levels will have a spread further from the ascending diagonal.
Example sensor glucose trace plots and Poincare plots are shown in Supplementary Figure 6, available as Supplementary data at IJE online, and in the GLU GitHub repository.

Usage
In this example, we demonstrate GLU by deriving GLU summary variables from CGM data measured during pregnancy and postnatally, and exploring associations of body mass index (BMI) with these variables during pregnancy.

Study sample
We used data from the Avon Longitudinal Study of Parents and Children-Generation 2 (ALSPAC-G2). 52 The ALSPAC study website contains details of the data that are available through a fully searchable data dictionary: http://www. bris.ac.uk/alspac/researchers/data-access/data-dictionary/. . We refer to the CGM data collected at a particular time-point for a particular participant as a CGM instance. While wearing the device, participants were asked to measure their capillary blood glucose levels by finger prick four times daily, for CGM calibration, and record mealtimes in a hand-written diary.
In this pilot a total of 96 CGM instances had been collected, in 63 women. Using GLU's complete days approach, nine of the 96 instances were excluded due to missing data (one recorded no sensor glucose data and eight had no complete days). One participant has two early pregnancy time-points corresponding to two different pregnancies; we excluded the time-point for the later pregnancy. We also excluded one participant (with 1 time-point) who did not have a measure of BMI. Thus, our pilot includes 85 CGM instances (including a total of 321 included days)-29 in early pregnancy, 25 in late pregnancy, 15 at 6 months postnatal and 16 at 12 months postnatal. These 85 instances were measured in 61 women. Imputing the sensor glucose data using the approximal and other day imputation approaches both resulted in an additional 2 CGM instances with at least 1 included day, for 2 additional participants. Our approximal imputed dataset includes 87 CGM instances, with a total of 333 included days, in 63 women. Our other day imputed dataset includes 88 CGM instances, with a total of 357 included days, in 63 women. Supplementary Table 3, available as Supplementary data at IJE online, shows the patterns of repeat instances within our sample, using the complete days, approximal imputation and other day imputation approaches.
Women's weight and height were measured at the clinic visit when the CGM device was inserted and used to calculate BMI (kg/m 2 ). We considered age, parity and gestational age at CGM measurement as potential confounding factors. Age and parity were reported by the woman; gestational age was calculated from the dates for which the CGM was worn and the woman's expected date of delivery based on her antenatal records (for the vast majority this would be based on a dating scan).

Analyses
Since GLU uses different thresholds for defining hyponormo-and hyper-glycaemia in pregnant compared with non-pregnant women, we divided our CGM instances into pregnancy and postnatal subsets. For the pregnancy subset, we ran GLU with the pregnancy argument. For the postnatal subset, we used the default GLU settings (i.e. we did not specify any optional parameters). For both, we ran GLU with the complete days approach (which is used by default), and the approximal and other day imputation approaches. We manually reviewed the trace and Poincare plots to determine whether there may be any anomalies. Poincare plots show how a person's glucose levels vary across moments in time (specifically one minute to the next, because GLU resamples CGM data to 1-min intervals as a pre-processing step). A deviation from the trend along the ascending diagonal on this plot may reflect an erroneous sensor glucose value in the original CGM data, rather than true variation of glucose levels. Sensor glucose values will tend to vary smoothly on CGM trace plots so erratic changes shown on these plots may also indicate erroneous data.
We summarized our derived GLU summary variables at each of the two pregnancy and two postnatal time-points using median and IQR. We then examined the association between early pregnancy BMI (exposure) and GLU CGMderived variables during pregnancy, using the 43 women with a measure during pregnancy. Of these 43 women 32 had just one set of CGM data during pregnancy (18 earlyand 14 late-pregnancy) and 11 had data for both early and late pregnancy. For the main analyses we used late pregnancy data for the 11 participants with data at both pregnancy time-points. We also undertook a sensitivity analysis in which we instead used early pregnancy measures for these 11 participants. We used linear regression to estimate the association of BMI with the following glucose trace summary variables: overall mean glucose level, MAD, sGVP, fasting glucose proxy, post-prandial time-to-peak and post-prandial 1-and 2-h AUC. MAD, sGVP and postprandial time-to-peak were right skewed and hence log transformed to achieve approximately normally distributions of residuals from the regression model. We converted the proportion of time spent in hypo-, normo-and hyperglycaemia to the number of minutes, by multiplying these by the number of minutes in the defined period (e.g. 1440 in whole days). We then estimated the association of BMI with these outcomes using negative binomial regression. Our analyses were performed using Stata version 15, and code is available at https://github.com/MRCIEU/GLU-UsageExample/. Git tag v0.2 corresponds to the version presented here.

Results
In this pilot we did not identify any outlier time-points having a glucose value with a large deviation from the glucose values at previous or subsequent time-points (Supplementary Figure 6 shows some representative trace plots illustrating their smooth nature). Correlations between summary variables are given in Supplementary  Table 4, available as Supplementary data at IJE online. The smallest correlation was between MAD and postprandial time to peak [Pearson's r ¼ À0.005 (P ¼ 0.98)], whereas the largest was between proportion of time spent in hypo-and normo-glycaemia [Pearson's r ¼ À0.958 (P < 0.01)]. Our sample rarely reached hyper-glycaemic glucose levels [e.g. median 0.000 (interquartile range: 0.000, 0.003) in early pregnancy; Supplementary Table 5, available as Supplementary data at IJE online], such that correlations between time spent in hypo-glycaemia and normo-glycaemia were close to À1 (Supplementary Table  4). While GLU gives the sGVP summary variable, Supplementary Table 4 also includes correlations with GVP (that uses the unstandardized glucose trace) for comparison. MAD was positively correlated with GVP [Pearson's r ¼ 0.86 (P < 0.01)] and negatively correlated with sGVP [Pearson's r ¼ À0.54 (P < 0.01)]. We hypothesized that this is because a glucose trace with a larger overall variability (characterized by MAD) will on average have a lower frequency (intuitively the bigger the deviation the longer it will take to return from this deviation) resulting in a shorter 'length of the line' of the glucose trace after standardization. To check this, we derived a simple measure of the number of peaks in each trace and found this had a negative correlation with MAD [Pearson's r ¼ À0.24 (P ¼ 0. 13)]. Overall, sGVP was less correlated with other GLU summary variables, compared with both MAD and (unstandardized) GVP.
Overall mean glucose levels were very similar ($5 mmol/L) across the four time-points (Supplementary Table 5). However, these similar overall glucose levels conceal very different patterns of variation in glucose across the four time-points and in the day-vs the night-time. MAD values were higher during pregnancy (both early and late) than postnatally, and higher during the day-time compared with the night-time. Fasting glucose levels were, on average, higher 12 months postnatally compared with early pregnancy. Whereas most time was spent normo-glycaemic both during pregnancy and in the postnatal period, the amount of time spent with levels that fulfilled the criteria for hypo-glycaemia was higher during pregnancy compared with postnatally. In interpreting the proportion of time spent in different glycaemic states across these four time periods it is important to remember that we used the pregnancy argument for the early and late pregnancy measures but the default (non-pregnancy, 'healthy') option for the postnatal measures. Hence different thresholds were used to define hypo-, normo-and hyper-glycaemic ranges in pregnancy vs postnatally. It is possible that in some women pregnancy-related changes in glucose levels might persist postnatally, so we repeated the analyses with the pregnancy function applied to the postnatal time-points. The proportion of time spent in normo-glycaemia postnatally was lower when using the pregnancy argument, because the pregnancy target normo-glycaemia range is narrower. Results were broadly similar when missing data at some time-points were imputed using both the approximal and other day approaches (Figure 2 and Supplementary Table 6, available as Supplementary data at IJE online).
In analyses using complete days, approximally imputed and other day imputed data a higher BMI during pregnancy was associated with higher overall mean glucose levels during both the day-and night-time (as measured by AUC), higher time spent in hyper-glycaemia during the night-time and shorter post-prandial time to peak, with similar magnitude of association across the three different approaches to missing data ( Figure 2). For example, during the night-time a 1 kg/m 2 higher BMI was associated with a 0.024 mmol/L higher glucose level per min (95% confidence interval: 0.004, 0.044), after adjusting for covariates (age, parity and gestational age). A higher BMI during pregnancy was associated with higher overall variability of glucose levels during the night-time (as measured by MAD), but we found little evidence of an association with the trace complexity (as measured by sGVP) although this may be due to insufficient statistical power. Results were broadly consistent when we used the early pregnancy measures for the 11 women with both early and late pregnancy results (Supplementary Figure 7, available as Supplementary data at IJE online).

Conclusions
In this paper, we have presented GLU, an open-source tool for researchers working with CGM data. GLU automatically performs quality control and derives a set of summary variables capturing key characteristics in these data. Widespread use of this tool across different research populations will help to identify the key measurements from CGM that have most clinical relevance in different contexts and groups of patients, which in turn will inform the most efficient and effective use of CGM in clinical practice.
There are other previously published tools for analysing CGM data, and Supplementary Table 7, available as Supplementary data at IJE online, provides a comparison of these with GLU. Compared with each of these, GLU is the only one to implement outlier detection. Also, GLU's quality control ensures that only whole days (either before or after imputation) are included in analyses to minimize bias. Our imputation methods seek to maintain the integrity of the data where the imputed regions are realistic CGM sequences (e.g. in contrast to linear interpolation). While the recently published CGManalyzer tool has a diabetes focus, 53 GLU is a general tool for researchers analysing CGM data in any population. We have developed this software so that it can be used to produce a standard set of CGM summary variables in any population and study type, including 'healthy' populations, whether pregnant or not, as well as in studies of people with diabetes, in cohort, case-control or randomized trials. Using a single tool for research in these different populations will aid comparison of summary variables and results across them. The Glycemic Variability Analyzer Program (GVAP) tool 54 is implemented in MATLAB and, hence, requires a licence to use, whereas GLU can be freely used by anyone. In contrast to GVAP, GLU can be used directly with CGM data from several devices. Another tool called EasyGV is implemented as a Microsoft Excel workbook; 55 it is not open-source (i.e. it is not possible to view the macro code), hindering research reproducibility, and the 'point and click' interface means it is difficult to integrate programmatically within a research pipeline. Systems provided by CGM companies, such as the Medtronics Carelink iPro website, also provide summary variables but using these for research would hinder consistency across studies since the summary variables and derivations used would depend on the CGM device, and these are also not open-source, hindering transparency.
In addition to the quality control steps implemented in GLU, another key strength is our choice of summary variables, where these characteristics each represent one of six broad domains. For example, overall glucose levels and variability from one moment to the next are represented by the AUC and sGVP, respectively. Our choice of summary variable from each domain was informed by previous work, interpretability of each variable, and the statistical properties of CGM data. For example, given the skewed nature of glucose data, GLU uses MAD as a measure of spread, rather than the SD or CV (see Supplementary  Table 1).
Previously, there has been a lack of consensus in relation to the methods used to derive variables from CGM data. Furthermore, methodological details have often been missing from research articles, making it difficult to replicate studies and compare results across studies. 5,28 Therefore, our main aim was to improve research practice by providing an open-source software package for CGM research, to improve transparency and consistency across studies. Using GLU to perform CGM processing and persuading researchers to present findings using all of its summary measures (even if some are presented as supplementary material) should improve the consistency across studies and hence the opportunity for replication and pooling of results, which is important for improving the robustness of research in this field. Furthermore, over time this would allow insights to emerge related to which glucose trace properties are important for different populations and in relation to different exposures and outcomes. For example, our pilot data suggest that, during pregnancy, BMI is positively associated with mean glucose levels, including during the day and night, as well as time spent hyper-glycaemic and overall variability of glucose levels during the night-time, but has little association with complexity of the glucose trace. We acknowledge that these are pilot data and for some of the outcomes estimates are imprecise (with wide confidence intervals). As larger CGM datasets become available it will be possible to estimate associations with greater Figure 2. Associations of BMI with GLU summary variables. Estimates using 'complete days', 'approximal imputed' and 'other day' imputed data, after adjustment for covariates (age, parity and gestational age at CGM measurement). Estimates use the mean of the respective summary variable across all included days. All AUC measures are computed as the average AUC per minute. Parts (a) and (b) have different scales, and hence are interpreted as: (a) difference in means of outcome, for a 1 kg/m 2 higher BMI; (b) percentage difference of outcome, for a 1 kg/m 2 higher BMI. n complete data: 43 (except postprandial n: 33; time to peak n: 32); n approximal imputed: 44 (except postprandial and time to peak n: 33); n other day imputed: 44 (except postprandial and time to peak n: 32). Meal event measures could not be calculated for some participants (e.g. because they have no recorded meals on included days or no peak after a recorded meal event) such that these summaries are based on a subset of our sample. Analyses included one summary value per participant. Where a participant had measures at both pregnancy time-points this analysis used the later pregnancy time-point. See Supplementary Figure 7 for results of our sensitivity analysis including instead the early pregnancy time-point for these participants. Number of participants with both time-points was 11 in complete days and approximal imputed data, and 12 in other day imputed data.
precision . As this field matures we plan to update GLU with any additional summary variables or options (e.g. revised thresholds for hypo-, normo-and hyperglycaemia) that emerge and we encourage researchers to send feedback on the tool and suggest additions (via the corresponding author email).

Supplementary Data
Supplementary data are available at IJE online. The funders had no role in the development of GLU or the analyses and interpretation of results presented here. The views expressed in this paper are those of the authors and not necessarily any funders.