Abstract

Summary

The R package CGManalyzer contains functions for analyzing data from a continuous glucose monitoring (CGM) study. It covers a wide and comprehensive range of data analysis methods including reading a series of datasets, obtaining summary statistics of glucose levels, plotting data, transforming the time stamp format, fixing missing values, evaluating the mean of daily difference and continuous overlapping net glycemic action, calculating multiscale sample entropy, conducting pairwise comparison, displaying results using various plots including a new type of plot called an antenna plot, etc. This package has been developed from our work in directly analyzing data from various CGM devices such as the FreeStyle Libre, Glutalor, Dexcom and Medtronic CGM. Thus, this package should greatly facilitate the analysis of various CGM studies.

Availability and implementation

The package for Windows is available from CRAN: http://cran.r-project.org/mirrors.html. The source file CGManalyzer_1.0.tar.gz is available in the Supplementary Material and at the website of Zhang’s lab https://quantitativelab.fhs.umac.mo/analytic-tool/.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

The core concept of diabetes management is controlling blood glucose levels (Probstfield et al., 2016). Part of effectively controlling glucose levels lies in the ability to properly monitor them. These methods must not only be accurate but convenient. This has led to prestigious companies like Apple, Samsung and Google taking steps to innovate wearable technology for monitoring glucose levels. A continuous glucose monitoring (CGM) system is part of the new ‘cutting edge’ technology in diabetes care (Elenko et al., 2015). However, it is a challenge to process and analyze the data from the CGM studies due to the large volume and the inherent non-linearity of the data. Currently, there is no R package specifically designed for analyzing CGM studies. To fill this gap, we developed an R package CGManalyzer.

This package can be used to analyze a CGM study from the very beginning to the end, including reading and displaying data, calculating regular statistics (e.g. mean, median, SD, confidence interval) and non-linear statistics (e.g. multiscale sample entropy (MSE)), evaluating mean of daily difference (MODD) and intraday glycemic variation being represented continuous overlapping net glycemic action (CONGA; McDonnell et al., 2005; Rodbard et al., 2009), conducting group comparison and displaying results. On top of providing a complete workflow for CGM analysis, this package includes two features that are new to CGM analysis: one is the implementation of strictly standard mean difference (SSMD; Zhang, 2007) and the class of effect size (Zhang, 2011) and the other is the development of a new type of plot called antenna plot. This package can directly be applied to analyze various CGM devices such as FreeStyle Libre, Glutalor, Dexcom and Medtronic CGM.

2 Features

CGManalyzer has main functions createFolder.fn, summaryCGM.fn, boxplotCGM.fn, timeSeqConversion.fn, equalInterval.fn, fixMissing.fn, MODD.fn, CONGA.fn, plotTseries.fn, compileC.fn, MSEbyC.fn, pairwiseComparison.fn, MSEplot.fn, antennaPlot.fn and ssmdEffect.fn.

2.1 Read data

The CGM data are usually generated one sensor by one sensor which means that it is usually stored by sensor: one file for the data in each sensor. Therefore, it is natural to take this feature into account when we read CGM data in R. To do so, in addition to createFolder.fn being able to create a folder to hold the data files, the creation of a .bat file 00fetchFileNameInDirectory.bat is described in the SPEC.R to automatically collect the names of all data files in the Data folder and store them in 00filelist.csv, which avoids the tedious work of manually collecting names of data files one-by-one. Since there exists a variety of CGM devices such as FreeStyle Libre, Glutalor, Dexcom, Medtronic CGM, the file SPEC.R contains parameter settings that can be altered to fit each type of device. The package provides three files of SPEC.R for FreeStyle Libre, Glualor and Medtronic CGM, respectively, so that users may directly choose the SPEC file that corresponds to their CGM device.

2.2 Time stamp conversion and fixing missing values

Since CGM measures glucose levels in a continuous time series, time stamp information is critical. The difficulty is that various CGM devices have different formats for their time stamps, which presents a challenge when we process CGM data. To overcome this challenge, we use timeSeqConversion.fn to convert various time stamps into a sequence of time values after the format of time stamp is specified. The functions can convert any format such as ‘2016: 08: 11: 09: 14: 00,’ ‘11/08/2016 09: 14: 00’ and others. The requirement is simply that the positions for year, month, day, hour, minute, second are fixed and consistent in all data files and ‘0’ before a non-zero number (such as ‘0’ before ‘8’ in ‘08’) cannot be skipped. When non-linear statistics such as sample entropy are calculated, it is required that the interval between two consecutive time points is equal. CGManalyzer has a function equalInterval.fn to adjust the data so that equal space between any two consecutive time points can be achieved. CGManalyzer has also a function fixMissing.fn to fix missing values when necessary.

2.3 Summary statistics and displaying data

For CGM data, it is common to want to see the summary statistics such as number of data points, mean, median, SD, minimal and maximal values of glucose levels measured by a sensor. CGManalyzer has a function summaryCGM.fn to calculate those values, MODD.fn to calculate MODD and CONGA.fn to calculate CONGA.and boxplotCGM.fn to display them (e.g. Fig. 1A). In addition, the function plotTseries.fn can be used to display the glucose levels in a time series (e.g. Fig. 1B). When the main code in CGManalyzer is run, summaryStatistics.sensor.csv will be generated automatically to hold the summary statistics and a PDF file timeSeriesPlot.Glucose.pdf will be generated to show the glucose time series for each sensor.

Fig. 1.

(A) Boxplot for glucose levels in several subjects, (B) time series plot for glucose levels of a subject over a 3-day span, (C) individual MSE in a group, (D) average MSE in a group, (E) antenna plot for glucose levels and (F) antenna plot for MSE at a scale of 3 min. In Panels (C)–(F), dI, dII, dPRE and H denote type I, type II, pre-diabetes and healthy people

Fig. 1.

(A) Boxplot for glucose levels in several subjects, (B) time series plot for glucose levels of a subject over a 3-day span, (C) individual MSE in a group, (D) average MSE in a group, (E) antenna plot for glucose levels and (F) antenna plot for MSE at a scale of 3 min. In Panels (C)–(F), dI, dII, dPRE and H denote type I, type II, pre-diabetes and healthy people

2.4 Non-linear statistics and multiscale entropy plot

MSEbyC.fn calculates MSE. The calculation is fast because the function calls a C program (i.e. ‘mse.c’ from physionet.org) to obtain the major results. The calculated MSE can be displayed by individuals (Fig. 1C) or by groups (Fig. 1D) through MSEplot.fn. When the main code is run, a PDF file MSEplot.pdf is automatically generated to show the calculated MSE by individuals and by groups. When MSE is shown by groups, the error bar can be chosen to represent standard error or SD for each group at each scale.

2.5 Group comparison and antenna plot

There are four major groups of subjects related to diabetes: type I diabetes, type II diabetes, pre-diabetes and healthy individuals. Researchers and doctors are interested in the pairwise comparison between any pair of these groups either in glucose levels or MSE. The key statistics for each comparison include mean difference, confidence interval, SSMD, P-value of t-test. PairwiseComparison.fn can calculate those statistics in addition to calculating mean, SD and number of subjects in each group. The calculated results can be displayed using a forest plot. SSMD is the mean divided by the SD of a difference between two groups. Thus, SSMD measures effective size for group comparison effectively (Zhang, 2011). Based on SSMD, here we propose a new plot in which the x-axis is similar to that in a forest plot but the y-axis is SSMD. Because the shape of this plot looks like an antenna, it is termed an antenna plot. The function antennaPlot.fn can generate antenna plots for glucose levels (Fig. 1E) and for MSE at each scale (Fig. 1F). When the main codes is run, groupCompSSMDpvalue.MSE.csv and groupComp.mean.csv will be automatically generated and contain the calculated MSE results for each pairwise comparison and for each group, respectively, groupEffect.csv will contain the strength of difference, and antennaPot.pdf will contain a series of antenna plots for glucose levels and MSE at each scale.

The antenna plot in Figure 1F illustrates that for sample entropy at a scale of 3 min, there is no significant difference between three pairs, dI and dII, dPRE and dII, dPRE and H, but significant differences between three other pairs, dI and dPRE, dI and H, dII and H. The strengths of differences are very weak between dI and dII and between dPRE and H, fairly weak between dII and dPRE, fairly moderate between dI and dPRE, and moderate between dI and H and between dII and H, based on the SSMD criteria in Zhang (2011). Similarly, we can interpret the results in Figure 1E for average glucose levels.

3 Discussion and conclusion

Here, we develop an analytic tool CGManalyzer for CGM studies. This tool has multiple, useful features. First, it can be applied to data measured by various existing CGM devices such as FreeStyle Libre, Glutalor, Dexcom and Medtronic CGM. Second, it can analyze a CGM study from the beginning to the end. Third, it reads a series of data files with each representing a sensor or subject. Fourth, it converts various formats of time stamps as long as the time stamp format is fixed in all data files. Fifth, it handles missing values. Finally, it calculates regular and nonlinear statistics. Moreover, it has been partially applied to analyze CGM experiments successfully (Zhang et al., 2017). Therefore, this package should greatly facilitate the analysis of data generated from a fast growing technology–wearable CGM device. MSE measures the irregularity and complexity of a dynamic physiological signals and may have novel utility in diagnosis and prognosis of various diseases (Chen et al., 2017; Jin et al., 2017). This package calculates MSE by essentially calling a c programming developed by Costa et al. (2014). Thus, it is much faster and more efficient than the existing R packages such as ‘Pracma’ for calculating MSE and could be used to analyze continuous monitoring data with high density such as continuously monitoring respiratory sound data (e.g. Niu et al., 2017). Furthermore, we introduce a new plot called an antenna plot for displaying the analytic results in CGM experiments, which may show the CGM changing pattern better than a forest plot (He et al., 2016) and a dual-flashlight plot (Zhang and Zhang, 2013).

Funding

This work was supported by the Start-up Research Grant (SRG2016-00083-FHS) at University of Macau.

Conflict of Interest: none declared.

References

Chen
C.
et al.  (
2017
)
Complexity change in cardiovascular diseases
.
Int. J. Biol. Sci
.,
13
,
1320
1328
.

Costa
M.D.
et al.  (
2014
)
Dynamical glucometry: use of multiscaleentropy analysis in diabetes
.
Chaos
,
24
,
033139.

Elenko
E.
et al.  (
2015
)
Defining digital medicine
.
Nat. Biotechnol
.,
33
,
456
461
.

He
L.
et al.  (
2016
)
Demonstrating placebo effect in clinical trials of DPP-4 inhibiters conducted in China: meta-analysis
.
BMC Pharmacol. Toxicology
,
17
,
40.

Jin
Y.
et al.  (
2017
)
Entropy change of biological dynamics in human chronic obstructive pulmonary disease
.
Int. J. Chronic Obstructive Pulmonary Dis
.,
12
,
2997
3005
.

McDonnell
C.M.
et al.  (
2005
)
A novel approach to continuous glucose analysis utlizing glycemic variation
.
Diabetes Technol. Therapeutics
,
7
,
253
263
.

Niu,J. et al. (

2017
) Detection of sputum by interpreting the time-frequency distribution of respiratory sound signal using image processing techniques. Bioinformatics, doi: 10.1093/bioinformatics/btx652.

Probstfield
J.L.
et al.  (
2016
)
Glucose variability in a 26-week randomized comparison of mealtime treatment with rapid-acting insulin versus GLP-1 agonist in participants with type 2 diabetes at high cardiovascular risk
.
Diabetes Care
,
39
,
973
981
.

Rodbard
D.
et al.  (
2009
)
Improved quality of glycemic control and reduced glycemic variability with use of continuous glucose monitoring
.
Diabetes Technol. Therapeut
.,
11
,
717
723
. [PMC][10.1089/dia.2009.0077] [19905888]

Zhang
X.D.
(
2007
)
A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays
.
Genomics
,
89
,
552
561
.

Zhang
X.D.
(
2011
)
Optimal High-Throughput Screening: Practical Experimental Design and Data Analysis for Genome-Scale RNAI Research
.
Cambridge University press
,
New York
.

Zhang
X.D.
,
Zhang
Z.Z.
(
2013
)
Displayhts: an R package for displaying data and results from high-throughput screening experiments
.
Bioinformatics
,
29
,
794
796
.

Zhang
X.D.
et al.  (
2017
)
Decreased complexity of glucose dynamics preceding the onset of diabetes in preclinical species
.
Plos One
,
12
,
E0182810.

Author notes

The authors wish it to be known that, in their opinion, Xiaohua Douglas Zhang and Zhaozhi Zhang authors should be regarded as Joint First Authors.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Associate Editor: Jonathan Wren
Jonathan Wren
Associate Editor
Search for other works by this author on:

Supplementary data