-
PDF
- Split View
-
Views
-
Cite
Cite
Xiaohua Douglas Zhang, Zhaozhi Zhang, Dandan Wang, CGManalyzer: an R package for analyzing continuous glucose monitoring studies, Bioinformatics, Volume 34, Issue 9, 01 May 2018, Pages 1609–1611, https://doi.org/10.1093/bioinformatics/btx826
Close -
Share
Abstract
The R package CGManalyzer contains functions for analyzing data from a continuous glucose monitoring (CGM) study. It covers a wide and comprehensive range of data analysis methods including reading a series of datasets, obtaining summary statistics of glucose levels, plotting data, transforming the time stamp format, fixing missing values, evaluating the mean of daily difference and continuous overlapping net glycemic action, calculating multiscale sample entropy, conducting pairwise comparison, displaying results using various plots including a new type of plot called an antenna plot, etc. This package has been developed from our work in directly analyzing data from various CGM devices such as the FreeStyle Libre, Glutalor, Dexcom and Medtronic CGM. Thus, this package should greatly facilitate the analysis of various CGM studies.
The package for Windows is available from CRAN: http://cran.r-project.org/mirrors.html. The source file CGManalyzer_1.0.tar.gz is available in the Supplementary Material and at the website of Zhang’s lab https://quantitativelab.fhs.umac.mo/analytic-tool/.
Supplementary data are available at Bioinformatics online.
1 Introduction
The core concept of diabetes management is controlling blood glucose levels (Probstfield et al., 2016). Part of effectively controlling glucose levels lies in the ability to properly monitor them. These methods must not only be accurate but convenient. This has led to prestigious companies like Apple, Samsung and Google taking steps to innovate wearable technology for monitoring glucose levels. A continuous glucose monitoring (CGM) system is part of the new ‘cutting edge’ technology in diabetes care (Elenko et al., 2015). However, it is a challenge to process and analyze the data from the CGM studies due to the large volume and the inherent non-linearity of the data. Currently, there is no R package specifically designed for analyzing CGM studies. To fill this gap, we developed an R package CGManalyzer.
This package can be used to analyze a CGM study from the very beginning to the end, including reading and displaying data, calculating regular statistics (e.g. mean, median, SD, confidence interval) and non-linear statistics (e.g. multiscale sample entropy (MSE)), evaluating mean of daily difference (MODD) and intraday glycemic variation being represented continuous overlapping net glycemic action (CONGA; McDonnell et al., 2005; Rodbard et al., 2009), conducting group comparison and displaying results. On top of providing a complete workflow for CGM analysis, this package includes two features that are new to CGM analysis: one is the implementation of strictly standard mean difference (SSMD; Zhang, 2007) and the class of effect size (Zhang, 2011) and the other is the development of a new type of plot called antenna plot. This package can directly be applied to analyze various CGM devices such as FreeStyle Libre, Glutalor, Dexcom and Medtronic CGM.
2 Features
CGManalyzer has main functions createFolder.fn, summaryCGM.fn, boxplotCGM.fn, timeSeqConversion.fn, equalInterval.fn, fixMissing.fn, MODD.fn, CONGA.fn, plotTseries.fn, compileC.fn, MSEbyC.fn, pairwiseComparison.fn, MSEplot.fn, antennaPlot.fn and ssmdEffect.fn.
2.1 Read data
The CGM data are usually generated one sensor by one sensor which means that it is usually stored by sensor: one file for the data in each sensor. Therefore, it is natural to take this feature into account when we read CGM data in R. To do so, in addition to createFolder.fn being able to create a folder to hold the data files, the creation of a .bat file 00fetchFileNameInDirectory.bat is described in the SPEC.R to automatically collect the names of all data files in the Data folder and store them in 00filelist.csv, which avoids the tedious work of manually collecting names of data files one-by-one. Since there exists a variety of CGM devices such as FreeStyle Libre, Glutalor, Dexcom, Medtronic CGM, the file SPEC.R contains parameter settings that can be altered to fit each type of device. The package provides three files of SPEC.R for FreeStyle Libre, Glualor and Medtronic CGM, respectively, so that users may directly choose the SPEC file that corresponds to their CGM device.
2.2 Time stamp conversion and fixing missing values
Since CGM measures glucose levels in a continuous time series, time stamp information is critical. The difficulty is that various CGM devices have different formats for their time stamps, which presents a challenge when we process CGM data. To overcome this challenge, we use timeSeqConversion.fn to convert various time stamps into a sequence of time values after the format of time stamp is specified. The functions can convert any format such as ‘2016: 08: 11: 09: 14: 00,’ ‘11/08/2016 09: 14: 00’ and others. The requirement is simply that the positions for year, month, day, hour, minute, second are fixed and consistent in all data files and ‘0’ before a non-zero number (such as ‘0’ before ‘8’ in ‘08’) cannot be skipped. When non-linear statistics such as sample entropy are calculated, it is required that the interval between two consecutive time points is equal. CGManalyzer has a function equalInterval.fn to adjust the data so that equal space between any two consecutive time points can be achieved. CGManalyzer has also a function fixMissing.fn to fix missing values when necessary.
2.3 Summary statistics and displaying data
For CGM data, it is common to want to see the summary statistics such as number of data points, mean, median, SD, minimal and maximal values of glucose levels measured by a sensor. CGManalyzer has a function summaryCGM.fn to calculate those values, MODD.fn to calculate MODD and CONGA.fn to calculate CONGA.and boxplotCGM.fn to display them (e.g. Fig. 1A). In addition, the function plotTseries.fn can be used to display the glucose levels in a time series (e.g. Fig. 1B). When the main code in CGManalyzer is run, summaryStatistics.sensor.csv will be generated automatically to hold the summary statistics and a PDF file timeSeriesPlot.Glucose.pdf will be generated to show the glucose time series for each sensor.
(A) Boxplot for glucose levels in several subjects, (B) time series plot for glucose levels of a subject over a 3-day span, (C) individual MSE in a group, (D) average MSE in a group, (E) antenna plot for glucose levels and (F) antenna plot for MSE at a scale of 3 min. In Panels (C)–(F), dI, dII, dPRE and H denote type I, type II, pre-diabetes and healthy people
(A) Boxplot for glucose levels in several subjects, (B) time series plot for glucose levels of a subject over a 3-day span, (C) individual MSE in a group, (D) average MSE in a group, (E) antenna plot for glucose levels and (F) antenna plot for MSE at a scale of 3 min. In Panels (C)–(F), dI, dII, dPRE and H denote type I, type II, pre-diabetes and healthy people
2.4 Non-linear statistics and multiscale entropy plot
MSEbyC.fn calculates MSE. The calculation is fast because the function calls a C program (i.e. ‘mse.c’ from physionet.org) to obtain the major results. The calculated MSE can be displayed by individuals (Fig. 1C) or by groups (Fig. 1D) through MSEplot.fn. When the main code is run, a PDF file MSEplot.pdf is automatically generated to show the calculated MSE by individuals and by groups. When MSE is shown by groups, the error bar can be chosen to represent standard error or SD for each group at each scale.
2.5 Group comparison and antenna plot
There are four major groups of subjects related to diabetes: type I diabetes, type II diabetes, pre-diabetes and healthy individuals. Researchers and doctors are interested in the pairwise comparison between any pair of these groups either in glucose levels or MSE. The key statistics for each comparison include mean difference, confidence interval, SSMD, P-value of t-test. PairwiseComparison.fn can calculate those statistics in addition to calculating mean, SD and number of subjects in each group. The calculated results can be displayed using a forest plot. SSMD is the mean divided by the SD of a difference between two groups. Thus, SSMD measures effective size for group comparison effectively (Zhang, 2011). Based on SSMD, here we propose a new plot in which the x-axis is similar to that in a forest plot but the y-axis is SSMD. Because the shape of this plot looks like an antenna, it is termed an antenna plot. The function antennaPlot.fn can generate antenna plots for glucose levels (Fig. 1E) and for MSE at each scale (Fig. 1F). When the main codes is run, groupCompSSMDpvalue.MSE.csv and groupComp.mean.csv will be automatically generated and contain the calculated MSE results for each pairwise comparison and for each group, respectively, groupEffect.csv will contain the strength of difference, and antennaPot.pdf will contain a series of antenna plots for glucose levels and MSE at each scale.
The antenna plot in Figure 1F illustrates that for sample entropy at a scale of 3 min, there is no significant difference between three pairs, dI and dII, dPRE and dII, dPRE and H, but significant differences between three other pairs, dI and dPRE, dI and H, dII and H. The strengths of differences are very weak between dI and dII and between dPRE and H, fairly weak between dII and dPRE, fairly moderate between dI and dPRE, and moderate between dI and H and between dII and H, based on the SSMD criteria in Zhang (2011). Similarly, we can interpret the results in Figure 1E for average glucose levels.
3 Discussion and conclusion
Here, we develop an analytic tool CGManalyzer for CGM studies. This tool has multiple, useful features. First, it can be applied to data measured by various existing CGM devices such as FreeStyle Libre, Glutalor, Dexcom and Medtronic CGM. Second, it can analyze a CGM study from the beginning to the end. Third, it reads a series of data files with each representing a sensor or subject. Fourth, it converts various formats of time stamps as long as the time stamp format is fixed in all data files. Fifth, it handles missing values. Finally, it calculates regular and nonlinear statistics. Moreover, it has been partially applied to analyze CGM experiments successfully (Zhang et al., 2017). Therefore, this package should greatly facilitate the analysis of data generated from a fast growing technology–wearable CGM device. MSE measures the irregularity and complexity of a dynamic physiological signals and may have novel utility in diagnosis and prognosis of various diseases (Chen et al., 2017; Jin et al., 2017). This package calculates MSE by essentially calling a c programming developed by Costa et al. (2014). Thus, it is much faster and more efficient than the existing R packages such as ‘Pracma’ for calculating MSE and could be used to analyze continuous monitoring data with high density such as continuously monitoring respiratory sound data (e.g. Niu et al., 2017). Furthermore, we introduce a new plot called an antenna plot for displaying the analytic results in CGM experiments, which may show the CGM changing pattern better than a forest plot (He et al., 2016) and a dual-flashlight plot (Zhang and Zhang, 2013).
Funding
This work was supported by the Start-up Research Grant (SRG2016-00083-FHS) at University of Macau.
Conflict of Interest: none declared.
References
Niu,J. et al. (
Author notes
The authors wish it to be known that, in their opinion, Xiaohua Douglas Zhang and Zhaozhi Zhang authors should be regarded as Joint First Authors.

