GUIdock-VNC: using a graphical desktop sharing system to provide a browser-based interface for containerized software

Abstract Background: Software container technology such as Docker can be used to package and distribute bioinformatics workflows consisting of multiple software implementations and dependencies. However, Docker is a command line–based tool, and many bioinformatics pipelines consist of components that require a graphical user interface. Results: We present a container tool called GUIdock-VNC that uses a graphical desktop sharing system to provide a browser-based interface for containerized software. GUIdock-VNC uses the Virtual Network Computing protocol to render the graphics within most commonly used browsers. We also present a minimal image builder that can add our proposed graphical desktop sharing system to any Docker packages, with the end result that any Docker packages can be run using a graphical desktop within a browser. In addition, GUIdock-VNC uses the Oauth2 authentication protocols when deployed on the cloud. Conclusions: As a proof-of-concept, we demonstrated the utility of GUIdock-noVNC in gene network inference. We benchmarked our container implementation on various operating systems and showed that our solution creates minimal overhead.


Background
Modern workflows in computational fields such as bioinformatics consist of multiple software implementations, each with their own set of dependencies. Software container technology such as Docker (http://www.docker.com) package the dependencies with the software and provide a method to reproduce these complex pipelines on multiple hardware and cloud platforms. For example, BioShadock [1], BioDocker and Bioboxes [2] are two frameworks aimed at reproducibly deploying bioinformatics workflows using Docker containers.
Many bioinformatics pipelines have a component that requires a graphical user interface (GUI) that can potentially limit the portability of the Dockerized workflows as different platforms use different methodologies to render the GUI. We have previously described and implemented GUIdock-X11 [3], an X11based methodology for portably supporting GUI applications in containers on different platforms. While the X11-based display method can be conveniently deployed in the local environment by exposing a file socket from a container, deploying the image on a cloud and accessing it remotely is non-trivial. In addition, on systems such as Windows, where there is no native X11 support, additional client software must be installed by the user to render the X11 graphics locally. Here we describe GUIdock-VNC, which implements an improved browser-based solution that does not require the user to map ports, configure firewalls, or install any additional specialized software.
GUIdock-VNC uses the Virtual Network Computing (VNC) protocol [4] to render the graphics. Instead of transferring commands and allowing a local client to render the graphics, VNC transfers a pre-rendered screen. Bandwidth requirements are minimized by only transferring the differences between the current screen and the last screen. This can actually be less chatty than the X11 methodology, which is constantly sending display commands. noVNC is a browser-based VNC client implemented using HTML5 Canvas and WebSockets [5]. Modern browsers can use the HTML5-based noVNC client to display the screen locally. The browser transparently downloads the noVNC client from the container and becomes the terminal, thus eliminating the need for the user to configure and install separate software. This is a major advantage as the users of bioinformatics workflows are not necessarily technically trained in configuring computer systems.
Most importantly, GUIdock-VNC also facilitates the deployment of Docker applications on the cloud. With a browser-based solution, we also have access to web-based authentication protocols such as Oauth2 [6], which allows for authentication using an email account. The host service is accessed and authenticated through the HTTP/HTTPS port, greatly simplifying the configuration necessary to support cloud-based platforms.

Our contributions
We implemented GUIdock-VNC, which adds and configures a software layer inside a Docker container to allow applications to export a GUI using the VNC protocol. When deployed on the cloud, authentication is provided using Oauth2. In addition, we provide a set of minimal base images to allow the users to add the host graphical desktop interface to any existing Dockerfiles. No client software installation is necessary for the users as GUIdock-VNC uses the HTML5 noVNC browserbased client to display the GUI. All our tools are publicly available on GitHub.
We benchmarked the implementation on a real-world bioinformatics pipeline. Our results showed that noVNC creates minimal overhead and GUIdock-VNC is superior to our previous work, GUIdock-X11 [3], and other virtual machine-based deployment solutions.

Software containers and Docker
A software container packages an application with everything it needs to run, including supporting libraries and system resources. Containers differ from traditional virtual machines (VMs) in that the resources of the operating system (OS), and not the hardware, are virtualized. In addition, multiple containers share a single OS kernel, thus saving considerable resources over multiple VMs. Linux has supported OS-level virtualization for several years. Docker (http://www.docker.com/) is an open source project that provides tools to setup and deploy Linux software containers. While Docker can run natively on Linux hosts, a small Linux VM is necessary to provide the virtualization services on Mac OS and Windows systems. On non-Linux systems, a single Docker container consists of a mini-VM, the Docker software layer, and the software container. However, multiple Docker containers can share the same mini-VM, saving considerable resources over using multiple individual VMs. Recently, support for OS-level virtualization has been added to Windows and the Macintosh operating system (Mac OS). Beta versions of Docker for both Windows and Mac OS are now available that allow Docker to run natively. Subsequently, these beta versions allow native Windows and Mac OS software to be containerized and deployed in a similar manner [7]. Docker containers therefore provide a convenient and light method for deploying open source workflows on multiple platforms.

GUIdock-X11
Although Docker provides a container with the original software environment, the host system, where the container software is executed, is responsible for rendering graphics. Our previous work, GUIdock-X11 [3], is one of the solutions in bridging the graphical information from user and Docker containers by using the X11 common graphic interface. GUIdock-X11 passes the container X11 commands to a host X11 client, which renders the GUI. Security is handled by encrypting the commands through secure shell (ssh) tunneling. We demonstrated the use of GUIdock-X11 [3] for systems biology applications, including Bioconductor packages written in R, C++, and Fortran, as well as Cytoscape, a standalone Java-based application with a graphical user interface. Neither Windows nor Mac OS uses X11 natively to render their graphics. Additional software such as MobaXterm [8] or socat [9] is needed to emulate X11 and locally render the graphics commands exported by the Docker container. However, a major advantage of the X11 method is that the commands to render the graphics and not the graphics themselves are transmitted, potentially reducing the total bandwidth required. Table 1 summarizes the differences between GUIdock-VNC and our previous work, GUIdock-X11.

Case study: inference of gene networks
The inference of gene networks is a fundamental challenge in systems biology. We use gene network inference as a case study to demonstrate that GUIdock-X11 and GUIdock-VNC can be used to yield reproducible results from bioinformatics workflows. We have previously developed inference methods  In the proposed architecture, each container is a self-contained web server that can be accessed using a single port. When deployed on the cloud, the services can be accessed using the cloud provider's network address translation (NAT) mechanism. Each container is also capable of OAuth2 authentication that can be enabled while deploying the application. Once enabled, the user will be required to sign in through an identity provider (such as Google in the current prototype). After the authentication, the user browser will be automatically redirected to the application.
using a regression-based framework, in which we searched for candidate regulators (i.e., parent nodes) for each target gene [10][11][12]. Our methods are implemented in R, C++, and Fortran, and the implementation is available as a Bioconductor package called networkBMA (http://bioconductor.org/packages/ release/bioc/html/networkBMA.html) [13]. In order to visualize the resulting gene networks, we previously developed a Cytoscape app called CyNetworkBMA (http://apps.cytoscape.org/ apps/cynetworkbma) [14]. Cytoscape is a Java-based stand-alone application with a GUI to analyze and visualize graphs and networks [15][16][17]. Our app, CyNetworkBMA [14], integrates our networkBMA Bioconductor package into Cytoscape, allowing the user to directly visualize the resulting gene networks inferred from networkBMA using the Cytoscape utilities. The integration of multiple pieces of software, each with its own software dependencies, makes CyNetworkBMA an ideal proof-of-concept application for the illustration of the utility of GUIdock-VNC. Fig. 1 shows an overview of GUIdock-VNC.

Virtual Network Computing
VNC is a framebuffer-based protocol that was written to view and control a remote desktop over the internet [4]. VNC is essentially a server program that attaches to a display server like X11 and creates a proxy between the client and the display server. The proxy server takes in input from the client and relays it to the display server while at the same time the display server sends pre-rendered display images to the client. VNC is thus a network intensive protocol, although the amount of data transferred back and forth can be reduced by using various compression technologies on the transfer layer. In our implementation, we use Xvnc/Xvfb (X virtual frame buffer) [18]   to act as an interface for the container. The first web service is the noVNC interface connected to the VNC server. The noVNC server is a Python plus JavaScript application framework used for establishing WebSocket for VNC packet interchange. The second web service is an optional broker service that helps in exchanging data through a datastore interface (Mongodb) and a message-passing queue (RabbitMQ).
built in. An additional layer on the host is required for the VNC server to communicate through WebSocket. We use nginx (https:/www.nginx.com/), a fast and light reverse-proxy web server, for this purpose.

Authentication methods: Oauth2
For containers that are not deployed locally, i.e., on a network or cloud, security is a concern as the traffic between the viewer and server can be seen by anyone with access to the network. Oauth2 [6] is an authentication method that is commonly used by major corporations to validate third-party applications. Specifically, Oauth allows users to log onto third-party websites using their existing Google, Twitter, or Facebook accounts, thus avoiding the creation of additional accounts for the thirdparty websites. In the present era, where it is extremely difficult to host identity services and secure communication, public authentication services like Oauth2 plays an important role. Providers like Google, Facebook, and LinkedIn can be used to validate any user registered with an email at one of these websites. In our prototype, we have created a prototype for the Google identity server. Each container can be forced to login through one of these public providers.

Automatic conversion of base Docker images
To add the noVNC graphical desktop to a Docker image, the converted image requires a web server, a headless display server running inside the container, and a VNC server (see Fig. 2). The web server is required to serve the JavaScript-based noVNC client, and the headless display server routes all drawing instructions to the VNC server; they are then sent to the client running the browser-based noVNC JavaScript client. Due to the three active components running inside the container, we generate a bash script to work as the entry point for the container. Therefore, in order to assist users and to let them start application-dependent services, we have created a tool to bootstrap standard images with noVNC capability. The tool accepts a JSON script as input with defined parameters for files to be copied, additional software to be installed (using apt-get), and the location of the startup script, which is then tied into the entry point. Given that the interaction to the container is made through a browser, any device with a browser supporting HTML5 will have consistent user experience and results, even on mobile devices such as Android and iOS devices. In the case of mobile devices, the Docker container runs on remote machines (e.g., on a virtual machine instance on cloud).
The tool is extensible and can be used to define all Dockerfile parameters as standard run commands.

Applications
We illustrated the utility of GUIdock-VNC in a proof-of-concept case study of gene network inference. Specifically, we applied GUIdock-VNC to a RNA-seq dataset consisting of 675 human cancer cell lines [19]. We downloaded the variance-stabilized version of the normalized RNA-seq data and extracted a subset of 84 genes that belong to 21 cancer-related pathways (see Supplementary Table 12 in Klijn et al. [19]). We applied the ScanBMA [12] gene network inference algorithm as implemented in the CyNetworkBMA app from within the GUIdock-VNC container.
We show that we get identical results after deploying the package across different browsers on different operating systems. Fig. 3 shows screenshots of using (a) Internet Explorer on Windows 8.1, (b) Google Chrome on Ubuntu Linux, (c) Safari on Mac OS, and (d) Google Chrome on Android. To summarize, we demonstrate the reproducibility of analytical results when GUIdock-VNC is deployed on different browsers and different operating systems.

Benchmarking computational efficiency
Since we have added extra services to the container, it is essential to investigate the performance overhead introduced by these additional services and the container. We conducted an extensive empirical study on comparing performance over different platforms with different hypervisors using GUIdock-X11 and GUI-VNC. As a baseline, we compared the performance of running GUIdock-X11 and GUIdock-VNC on Docker containers to running the CyNetworkBMA app natively. In addition, we also compared our results to running the CyNetworkBMA via a virtual machine (VM). We tested the performance of each of these four options (native, GUIdock-X11, GUIdock-VNC, VM) on the Linux, Macintosh, and Windows operating systems.
In our benchmarking experiments, we used the time series data of the first network from the DREAM 4 crowdsourcing challenge [20,21]. This simulated dataset consists of 100 genes across 21 time points. In order to account for variability in our empirical experiments, we repeated each configuration, i.e., each (OS, option) pair four times. These replicated experiments are represented by "RUN1," "RUN2," "RUN3," "RUN4" in Table 2. In addition, we added warm-up runs to ensure steady-state execution time. Table 2 shows a consistent minimal overhead of running the proposed container, which is only marginally higher than running the application natively. In particular, we computed the ratio of the average execution time over the four runs to the "native" baseline execution time. Fig. 4 shows the ratio of the average execution time of each of GUIdock-X11, GUIdock-VNC, and VM to the baseline "native" on the Linux, Mac OS, and Windows operating systems. On the Linux, Mac OS, and Windows operating systems, we observed comparable execution time for both GUIdock-X11 and GUIdock-VNC. "Native" means running the CyNetworkBMA app natively on the corresponding OS. "VM" means running the CyNetworkBMA app from a virtual machine on the corresponding OS. The column "average" is the average execution time over the four runs. The column "ratio" is the ratio of the average running time to the "native" baseline. The first value for each platform is the ratio of the native execution runtime to itself, and therefore is always equal to 1. The remaining three values correspond to the ratio of the average execution time for GUIdock-X11, GUIdock-VNC, and VM, respectively, to the baseline "native."

Discussion
We present a container tool called GUIdock-VNC that uses a graphical desktop sharing system to provide a browser-based interface for containerized software. The merits of GUdock-VNC are summarized in the following sections.

No installation on client side.
Our proposed container GUIdock-VNC is a self-contained display server with user application and an HTML5-based VNC client, which can be accessed using the web browser. Therefore, any HTML5-capable browser is sufficient on the client side. Almost all modern popular browsers such as Mozilla Firefox, Webkit (Apple Safari and Google Chrome), and Microsoft Edge support HTML5 extensions.

Mobile capable.
Since HTML5 extensions that are required to access noVNC are also supported on mobile versions on modern web browsers, our GUIdock-VNC solution is also accessible through mobile devices such as phones and tablets. The container is a stateful solution, such that it preserves session information even after a disconnected user session. Therefore, researchers can access the container on the go and subsequently continue working on the workstations in the labs.

Cloud integration.
There is a self-contained web server inside the GUIdock-VNC container that is required to access the applications. Our GUIdock-VNC container can be hosted on the cloud by using a reverse proxy or simple NAT rule to pass on any incoming request to the container. We tested the solution with major cloud vendors such as Amazon AWS, Google Cloud, and Microsoft Azure, and the NAT forwarding works without any network interference.

Security using OAuth2.
To ensure secure access to the container, we are using OAuth2 from Google (replaceable by any other identity provider such as LinkedIn, Facebook, Twitter, etc.) to authenticate users by ensuring that the identity registered with the container is owned by the user requesting container access. This can be done by providing email ID, provider user ID and secret password as parameters to the container. If these parameters are provided while initiating the container, the broker will redirect the user to the identity provider to verify the email address. Once authenticated, the identity provider redirects the user to the container.