CropSight: a scalable and open-source information management system for distributed plant phenotyping and IoT-based crop management

Abstract Background High-quality plant phenotyping and climate data lay the foundation for phenotypic analysis and genotype-environment interaction, providing important evidence not only for plant scientists to understand the dynamics between crop performance, genotypes, and environmental factors but also for agronomists and farmers to closely monitor crops in fluctuating agricultural conditions. With the rise of Internet of Things technologies (IoT) in recent years, many IoT-based remote sensing devices have been applied to plant phenotyping and crop monitoring, which are generating terabytes of biological datasets every day. However, it is still technically challenging to calibrate, annotate, and aggregate the big data effectively, especially when they were produced in multiple locations and at different scales. Findings CropSight is a PHP Hypertext Pre-processor and structured query language-based server platform that provides automated data collation, storage, and information management through distributed IoT sensors and phenotyping workstations. It provides a two-component solution to monitor biological experiments through networked sensing devices, with interfaces specifically designed for distributed plant phenotyping and centralized data management. Data transfer and annotation are accomplished automatically through an hypertext transfer protocol-accessible RESTful API installed on both device side and server side of the CropSight system, which synchronize daily representative crop growth images for visual-based crop assessment and hourly microclimate readings for GxE studies. CropSight also supports the comparison of historical and ongoing crop performance while different experiments are being conducted. Conclusions As a scalable and open-source information management system, CropSight can be used to maintain and collate important crop performance and microclimate datasets captured by IoT sensors and distributed phenotyping installations. It provides near real-time environmental and crop growth monitoring in addition to historical and current experiment comparison through an integrated cloud-ready server system. Accessible both locally in the field through smart devices and remotely in an office using a personal computer, CropSight has been applied to field experiments of bread wheat prebreeding since 2016 and speed breeding since 2017. We believe that the CropSight system could have a significant impact on scalable plant phenotyping and IoT-style crop management to enable smart agricultural practices in the near future.


Findings:
CropSight is a PHP and SQL based server platform, which provides automated data collation, storage, and information management through distributed IoT sensors and phenotyping workstations. It provides a two-component solution to monitor biological experiments through networked sensing devices, with interfaces specifically designed for distributed plant phenotyping and centralised data management. Data transfer and annotation are accomplished automatically though an HTTP accessible RESTful API installed on both device-side and server-side of the CropSight system, which synchronise daily representative crop growth images for visual-based crop assessment and hourly microclimate readings for GxE studies. CropSight also supports the comparison of historical and ongoing crop performance whilst different experiments are being conducted.

Conclusions:
As a scalable and open-source information management system, CropSight can be used to maintain and collate important crop performance and microclimate datasets captured by IoT sensors and distributed phenotyping installations. It provides near realtime environmental and crop growth monitoring in addition to historical and current experiment comparison through an integrated cloud-ready server system. Accessible both locally in the field through smart devices and remotely in an office using a personal computer, CropSight has been applied to field experiments of bread wheat prebreeding since 2016 and speed breeding since 2017. We believe that the CropSight system could have a significant impact on scalable plant phenotyping and IoT-style crop management to enable smart agricultural practices in the near future.
• Fig 1 has been modified and a legend has been added to clarify data flows throughout the user-system interactions, both internally and externally. • Supplementary Fig. 3 has been added to show the Star Network topology applied to wheat field experiment, as well as data transfer between distributed nodes and a server node. •The Star Network topology is described in lines 185-197.
2.The flowchart ( Fig.2.C) of the data transmission from each node and server would be easier to understand if it can be visualized in completed flowchart, kindly refer the example on this paper (https://doi.org/10.1016/j.compag.2016.04.025) Response: • Fig 2D has been improved by adding a completed section of detailed data flows. •The paper suggested by the reviewer has now been added in the literature review as a representative research-based data management system (lines 80-85).
3.Dealing with the utilization of camera in outdoor, is there any calibration method for white balance? Because the sunlight intensity is different every sampling. If there are any method to white balance adjustment it would be more useful. Response: •Although the imaging function is not part of the CropSight system, infield crop growth imaging function has been described briefly in lines 260-264. •The Python-based imaging script has also been added to the GitHub CropSight project repository for download and reference (please go to https://github.com/ Crop-Phenomics-Group/CropSight/releases/, camera_capture_script.py).
4.The environmental sensor position during environmental measurement also should be standardized, if it will be used for estimating the reference Evapotranspiration (ETo), it should follow the standard on FAO56 Penmann Monteith Response: •While the placement of sensors is out of the scope of this information system article as it is independent of the CropSight system, we have improved the manuscript to emphasise the importance of sensor standardisation and infield positioning in lines 299-305 and lines 334-339.
Reviewer #2 1.Line 60-93, the introduction of different platforms is good. One concern is that the remote sensing imagery has long been recognized as an essential data source for evaluating crop properties over large areas (as sensors cannot be deployed to cover large areas), how the platforms mentioned here deal with remote sensing imagery and extract crop information? Response: •The focus of this manuscript is researching and developing data and experiment management software systems, including image-and sensor-based data transfer, and data collation. Hence, we focused on reviewing the literatures published in the relevant research domains. •To reflect reviewer's concerns in terms of evaluating crops over large areas using imagery sensing, we have improved the introduction section by adding new text of image-based phenotyping approaches and a new literature (lines 64-85).
•We are talking about how sensors and analysis algorithms could be utilised for dealing with larger areas and maintain quality crop information in the manuscript. To emphasise on this matter, lines 267-271 and lines 334-339 have now been added to the manuscript.
2.Line 235, the authors described uploading images of crops to server and users can check the images to understand crop condition. Since there may be a large number of photos taken every day/week, manual evaluation would be labour intensive. Is that possible to add some software that can automatically analyze these images and provide results to the users? Response: •Computer-vision based algorithms developed for analysing crop growth and phenotypic analysis using crop image series are independent of the CropSight system and have been described in Zhou et al [1], which is under review at the moment. •We followed the reviewer's comments and made clear in the text (Lines 267-271).
•The analysis algorithms are not integrated into the CropSight system, because: a.These algorithms have been described in [1]; b.They rely on specific phenotyping devices (e.g. CropQuant workstations); c.CropSight is platform independent, which means it is expandable to incorporate other hardware sensors and single-board computers; d.It is beyond the scope of this open-source data/experiment information management system. 3.The authors introduced extensively the integration or connection of various sensors in the system, but didn't describe clearly which specific sensors can be integrated (e.g., soil moisture sensor? Fertilizer sensor?), how to setup these sensors in the field, and how the data from sensors are analyzed. These information will help readers to further understand the operation of the monitoring system. Response: •Lines 299-305 have now been added to specify exactly which sensors have been used in experiments and their installation in the field, together with the clarification of how the CropSight system collated data generated by these sensing modules as well as the future expansion. •Lines 334-336 have been added to explain the sensor placement.
•Although data analysis is not within the scope of the CropSight system, we have added and described briefly the Python-based imaging script (lines 260-264), image selection (lines 267-271), and Additional File 2 (an algorithm to analyse environmental factors using plotted figures). •All scripts described above have been added to the GitHub project repository for download and reference (https://github.com/Crop-Phenomics-Group/CropSight/ releases/). 4.In the Discussion and Outlook section, specifically 343-357, the authors discussed the potential of applying the monitoring system in real world to solve various challenges, which is good. However, the authors didn't describe clearly the challenges in deploying the system for large areas. How many sensors and how much cost needed? Although the authors indicated in 370-391 that the system is scalable and the cost can be reduced, more specific suggestions on the application of system for large areas will be helpful. Response: •Lines 334-336 have been added to the paper to describe the deployment of the system and sensors to a larger area. •Approximate costs of an individual phenotyping cluster (with 10 distributed nodes and one server node) has been included in lines 191-197. •The effective range of a star network and infrastructure requirements in terms of data storage have been added in lines 191-197 and line 234.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 agronomists and farmers to closely monitor crops in fluctuating agricultural conditions. Recently, some research-based systems have also been introduced to the scientific community.   3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 The above industrial and academic efforts identify the need to develop a scalable and openly available 106 information management system to deal with our growing experimental needs and biological datasets.

107
It needs to handle different types of datasets acquired in plant phenotyping experiments. To integrate 108 data transfer, calibration, annotation and aggregation effectively, such a system should be flexible for 109 changeable experimental designs and expandable with third-party hardware and external software.

110
More importantly, the system needs to enable users to closely monitor experiments conducted in 111 different locations whilst experiments are being carried out.

112
With these design requirements in mind, we developed CropSight, a scalable IoT-based information 113 management system that is easy to use and flexible to deploy in diverse experimental scenarios.

114
CropSight is an open-source software system, which provides a range of interfacing options for the 115 community to adopt and extend. We followed a distributed systems design during the development, so  IoT is a fast-growing field. IoT-based sensors are generating terabytes of data for plant research and 129 agriculture services everyday [25]. Since the existing data/experiment management solutions heavily 130 rely on bespoke data collection approaches, they cannot be easily adopted and extended. Also, most of 131 the present solutions require the construction of a centralised management system, which could not 132   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 resolve the problem of scalability and accessibility, because the distributed nature of IoT technologies 133 and the centralised data administration infrastructure are likely to confound each other. Instead, we 134 developed a two-component solution. The first part of this is a device-side system that is lightweight 135 and capable of interacting directly with distributed IoT sensing devices, which can ensure onboard data 136 standardisation and data collection. The second component is a server-side system that collates and 137 stores image-and sensor-based data, with SQL as the back-end. This server-side system is more  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 side system can give access to each phenotyping device, so that live video streaming and remote system 160 configuration can be initiated by users to deploy phenotyping devices ( Supplementary Fig. 1) as well 161 as to establish indoor or infield experiments just using a smartphone or a tablet. Also, the GUI allows 162 users to enter metadata including trials, experiments (e.g. genotypes, treatments and biological 163 replicates), and brief description, while phenotyping devices are being installed. The distributed IoT-164 based design has massively improved the mobility and flexibility of phenotyping tasks.

165
The server-side system bridges the connection between data aggregation and cloud-based interfacing 166 (Fig. 1B). This approach facilitates biological data acquired at different locations to be synchronised 167 with a centralised server for data management, detailed traits analyses, and decision making in crop 168 management. PHP5+ was used to develop the system that supports Apache and an SQL server such as

175
Whilst CropSight is designed to allow users with no technical background to use, the installation of 176 the system still requires an IT technician to complete (see Additional File 1 for detailed instructions).

226
The device-side CropSight provides a tailored GUI window, within which users can deploy (see 227 Additional File 1), monitor, assess and download captured data on demand. Second, the device-side 228 system synchronises with the server at regular intervals, based on which CropSight provides a more 229 comprehensive GUI to present both experimental and technical status (i.e. system status) of ongoing 230 experiments. The device-side system is designed to be distributed. So, if a given IoT device cannot 231 make a direct internet connection for any reasons, the device-side system will enable local data storage 232 as a server node. After the networking is re-established, the system can then forward collected data 233 automatically (the onboard USB memory stick can store up to 60 days' image and sensor data).  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 challenging if we need to calibrate and verify datasets collected from sensing devices deployed in 239 different sites. In particular, low-quality and missing data often leads to analysis errors and unusable 240 results, which normally can only be identified after the completion of experiments [33]. Hence, the 241 server-side CropSight system was designed to oversee ongoing experiments based on representative 242 daily images, hourly sensor data collected from each phenotyping device, as well as experimental 243 settings such as genotype, treatment, drilling date, plot position and biological replicate.

326
Furthermore, the climate datasets can be used for cross-validating the soundness of infield sensors, for 327 example, whether soil temperature correlates with ambient temperature (Supplementary Fig. 5A); and 328 why readings from many low-cost sensors could provide more representative information of the field 329 in comparison with one expensive central weather station (Supplementary Fig. 5B).  sensor data and manages these historical datasets with easy reference and access (Fig. 6) 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 the GPS-tagged geolocation of an accomplished project and devices used in the project together with 347 project references (Fig. 6A). By clicking a specific plot within the experimental field, CropSight can 348 directly reference environmental and image datasets in the plot, with device name, date of last capture, 349 and last image taken by the phenotyping device (Fig. 6B). If users want to revisit previous datasets in 350 the project, they can download both sensor data packages and/or growth image series in monthly 351 archives by clicking the archive links (Fig. 6C). This design enables a unified cloud-ready platform to 352 facilitate both ongoing and historical data management for in-and post-experiment comparison. automatically, which can be used to enable both phenotypic analyses and agricultural decision making.

364
By associating environmental conditions with crop growth data, we also trust that the system is capable   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 be scaled up to the national scale if a broader IoT in agriculture infrastructure is in place. As collected 374 data is annotated and pre-selected on distributed phenotyping or IoT devices, only standardised crop-375 environment datasets are collated to support detailed traits analyses and cross-referencing. Finally, 376 openly sharing results from different sites and different experiments will enable crop researchers, 377 breeders, and farmers to gain great benefits, for example, predicting and prewarning disease spread at 378 the national scale so that early adoption of preventative measures can be arranged.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 much easier and thus establish an important tool to inform farmers and growers to apply fungicides and 401 chemical treatments to the appropriate areas. Hence, CropSight has a high potential to serve sustainable 402 agriculture and environmentally friendliness of food production under today's changeable climates. To establish a data and experiment information management system that is scalable and usable on 406 regional, national or even global crop research and agricultural practices, we believe that, with further 407 development, CropSight in connection with distributed IoT sensors can meet the future demand of 408 usability and scalability. One area of expansion is in scalability. The system is currently tested on local 409 server with a direct network connection to at least one of the distributed nodes. To allow the expansion 410 at a larger, national, or even global scale, the reliance on maintained servers would be less effective 411 than a true cloud-based service. Hence, by moving the CropSight system to a globally accessible cloud 412 server with cloud enabled distributed storage is a potentially feasible approach that removes the 413 requirements for institutions and agricultural practitioners to maintain servers and storage. Given the 414 lack of network infrastructure in rural areas in many countries, the addition of 3G or 4G mobile data 415 networks to key distributed nodes in the field can improve the infield network, upon which the data 416 communication of a large number of Agri-Tech devices can be relied.