Special Topic : High Performance Computing High-performance computing environment : a review of twenty years of experiments in China

A high-performance computing environment, also known as a supercomputing environment, e-Science environment or cyberinfrastructure, is a crucial system that connects users’ applications to supercomputers, and provides usability, efficiency, sharing, and collaboration capabilities.This review presents important lessons drawn from China’s nationwide efforts to build and use a high-performance computing environment over the past 20 years (1995–2015), including three observations and two open problems. We present evidence that such an environment helps to grow China’s nationwide supercomputing ecosystem by orders of magnitude, where a loosely coupled architecture accommodates diversity. An important open problem is why technology for global networked supercomputing has not yet become as widespread as the Internet or Web. In the next 20 years, high-performance computing environments will need to provide zettaflops computing capability and 10 000 times better energy efficiency, and support seamless human-cyber-physical ternary computing.


INTRODUCTION
High-performance computing (HPC), also called supercomputing, has become an essential tool for science and engineering.A high-performance computing environment (HPCE) is a system that connects users' applications and supercomputers that belong to multiple institutions.An advanced HPCE connects scientific computing users, data, applications software, middleware, and supercomputer centers and integrates them into a single research environment.This is a crucial system that turns supercomputing resources into a nationwide, sometimes even worldwide, productive environment by providing integration, usability, efficiency, sharing, and collaboration capabilities.An HPCE is also called an e-Science environment, especially in Europe.It is synonymous with the term 'cyberinfrastructure' used by the US National Science Foundation (NSF).Another name often used is a computational grid, or computing grid.
An example of a worldwide HPCE is the Worldwide LHC Computing Grid (WLCG) in high-energy physics.Its mission is 'to provide global computing resources to store, distribute and analyse the ∼30 Petabytes of data annually generated by the Large Hadron Collider (LHC) at CERN' [1,2].It integrates resources from over 170 supercomputing centers in 40 countries, and handles over 1 million computational jobs per day for 1700 scientist users.The WLCG played an important role in the discovery of the Higgs boson (Nobel Prize in 2013).
China initiated activities to build a nationwide HPCE in 1995.Over the past 20 years, despite numerous difficulties, a set of coherent nationwide efforts persisted to enable the growth of the supercomputing field in China.Coherence in this instance means that these efforts were more or less aimed at achieving a similar goal.Drawing from these 20-year REVIEW Xu et al. 37 experiments, this review presents important lessons related to the research agenda, technology choice, and science policy.These lessons are organized into three observations and two open problems: (i) Observation 1 (research agenda): HPCE helps to grow China's supercomputing ecosystem by orders of magnitudes.A persistent research agenda for a nationwide HPCE, instead of ad hoc and isolated projects to develop supercomputers and their applications, is crucial for building a healthy supercomputing ecosystem.We present concrete evidence of this observation.(ii) Observation 2 (technology): Loose coupling accommodates diversity.In scientific computing, there is diversity in science drivers, technology advances, management models, and user preferences.This diversity is inevitable, and an HPCE has to cope with it.An important technology design decision is that the HPCE technology stack should be loosely coupled to provide flexibility.(iii) Observation 3 (policy): International cooperation is essential, even for a nationwide HPCE such as CNGrid.(iv) Open Problem 1 (policy): What is a feasible institution to enable a sustainable HPCE? (v) Open Problem 2 (technology): The worldwide scientific community benefits from a global Internet and global Web, but why is a global HPCE not widespread yet?
HPC will become increasingly more important in the coming decades.We offer perspectives for HPCE for the next 20 years (2015-2035), focusing on three questions: Will there be zettaflops system capability?Can we achieve 10 trillion computational operations per joule (10 TOPJ) energy efficiency?Can we build, by 2035, a seamless environment for human-cyber-physical ternary computing?

Two approaches for supercomputing development
When setting the country's supercomputing research agenda, China followed two approaches in the 39 years between 1976 and 2015.(The year 1976 was significant in the world because Cray-1 was announced, which heralded the modern supercomputing era.It was significant in China because the Cultural Revolution ended in 1976, and China started to reemphasize science and technology development.)We call them the machine approach and the environment approach.The machine approach was the main practice in China before 1995.It was ad hoc to some extent, featuring isolated projects for building supercomputers and their applications.This machine approach was flexible but lacked a coherent long-term direction for the nation's supercomputing field.After 19 years, China's supercomputing field experienced significant growth, but continued to lag behind the world level (see Fig. 1 and Table 1).For instance, in 1995, the peak speed of a single world-level supercomputer had already exceeded 235 gigaflops [9].More significantly, the Information Wide Area Year (I-WAY) experiment was demonstrated at the International Conference on Supercomputing in 1995, and provided a wide-area visual supercomputing environment (SCE) by interconnecting 17 sites [10], which heralded grid computing [11].By contrast, there were only five disconnected public supercomputer centers in China nationwide-not counting private supercomputer centers within companies-and their peak speed totaled approximately 22 gigaflops.These systems were mostly running C/Fortran programs with little parallelism.Only a few scientific papers using computational results on these supercomputers were published in peer-reviewed journals and presented at international conferences.There were many external reasons for this lag, such as a severe lack of funding and human resources.However, the ad hoc, supercomputer-centric, machine approach showed disadvantages.
Since 1995, a more systematic approach has been gradually adopted in both China and the world.The research agenda has become to build, upgrade, and maintain a nationwide HPCE, wherein developing supercomputers and applications has become part of the research objectives.An HPCE also needs to connect and integrate supercomputer centers, scientific data, and middleware, and to offer services to a growing number of users.Figure 1 and Table 1 show that after 1995, HPC development in China and the world both significantly accelerated.Although there are many contributing factors, we argue that, at least for China, an HPCE helps to grow China's supercomputing ecosystem and the environment approach has advantages.Measured by speed (floating-point operations per second, or flops) on the Linpack benchmark [9], the annual growth rate was 40% in China and 47% worldwide before 1995.After 1995, the annual growth rate was 136% in China and 84% worldwide.

HPCE helps to form a core community: CNGrid
More significant than the growth in raw computing power, Table 2 demonstrates that China's HPC users and scientific outcomes also experienced an increase of two orders of magnitude in the 20 years from 1995 to 2015.
The main reason that the environment approach is more effective than the machine approach is that the former enables the formation of a nationwide supercomputing community, and consequently, a growing supercomputing ecosystem.This community has two circles.At its core is CNGrid, funded by China's Ministry of Science and Technology, with matching funding from the Chinese Academy of Sciences (CAS), Ministry of Education, National Natural Science Foundation of China, and local governments.The outer circle, called CNGrid+, is a larger community that consists of SCEs within companies-both private and state-owned-universities, research institutes, and regional supercomputing centers.
Community forming is very important for the development of the national supercomputing field.Community building takes time, and it should not exclude innovation, dissident voices, and diversity, which are important for science and technology development.Nevertheless, the advantages of forming a community outweigh its disadvantages.We list three main benefits: a common direction, platform, and school, with evidence provided in Table 2.
A common direction: A community eases consensus building, especially to establish a long-term direction.By contrast, lacking a community could result in a set of conflicting Brownian movements.The community's efforts could accumulate into an accessible articulation to gain the support of the public and decision makers, resulting in executable national policies.An example is China's 'National Planning Framework for Mid-and Long-Term Science and Technology Development (2006-2020)', in which petaflops supercomputing was chosen as one of the 62 national priority directions [12].This choice was made in 2005 after 2 years' intensive deliberation and consultation among thousands of scientists from all fields of science and technology.A common school: A community over an HPCE provides a hands-on platform to train young people.As shown in Table 2, the number of power users and HPC-related PhD awardees has steadily increased.
Table 2 shows in detail the growth of China's core HPCE and the CNGrid community over the past 20 years.The mode of HPCE evolved from nonexistence (ad hoc, isolated supercomputer centers) to interconnected but not managed HPC centers to a nationwide environment providing grid services with a single-system image to general-purpose grid services with support for multiple science domains.The number of HPC centers, also called HPCE sites, increased from 5 in 1995 to 17 in 2015.Supercomputer architecture converged to Linux clusters.The middleware for managing the HPCE evolved from nonexistence to rudimentary National High-Performance Computing Environment software to a Web services-based CNGrid GOS [8,13] to a simplified, more lightweight SCE today.The total computing speed and total storage capacity increased 5 million and 0.4 million times, respectively.The number of applications grew from around 100 small Fortran and C programs to thousands of Fortran/C/Java/Python programs, including hundreds of large programs.In addition to user-developed software, applications have included both commercial and open source software packages.Application software for over 10 application domains has been developed and deployed, ranging from a digital observatory for astronomy, gene sequencing, and climate computing to automobile simulation.The annual number of published peer-reviewed scientific papers supported by HPCE grew from only fewer than 10 in 1995 to hundreds in 2015.

LOOSE COUPLING ACCOMMODATES DIVERSITY Technology must provide for usage diversity
Over the past 20 years, HPC users in China practiced multiple, different ways of using supercomputing resources, and this trend is continuing.This diversity seems inevitable, and technology has to cope with it.In fact, Richard Karp recently highlighted that this diversity may be fundamental from a scientist user's viewpoint.He summarized four generations of the relationship between computing and sciences [14,15], shown in Table 3.An HPCE needs to support all generations.
From a technology perspective, among the 20year experiments and experiences of developing and using HPCE in China, we can identify four ways of performing supercomputing, and thus, four types of technology needs, as shown in Fig. 2. They are formed along two dimensions.The horizontal dimension regards whether users' applications are executed on a single site or multiple sites, where a 'site' is another name for a supercomputing center.The vertical dimension differentiates control, that is, whether the system is owned by and, thus, managed by one institution (centralized) or multiple institutions (decentralized).In more technical terms, a decentralized system has multiple administrative domains and a centralized system has one administrative domain.
The most familiar type is a traditional supercomputer center, which is a centralized, single-site system.Users apply for an account from the site administrator to use resources, which include the user's account, home directory, job queues, work directory (scratch space), software, data, and various resource quotas.Even today, many HPC users in China prefer this usage mode.Decentralized single-site systems are mostly used in industry.A typical example is a cohosted datacenter, where the resources are allocated to different institutions with long-term contracts.The institutions manage and operate their own regions of resources.
The most popular centralized multisite system in industry is the cloud computing system, such as the Amazon EC2 cloud [16].Such a system is owned and managed by a single institution, such as Amazon.Users can rent resources for computing, such as a cluster of virtual machine instances and S3 storage space.An application in such a centralized HPC system may execute on multiple sites or within a single site.Such HPC clouds have become increasingly popular in recent years because they usually have a viable business model and are easier to use and manage than federating resources from multiple institutions.High-speed interconnects, such as InfiniBand, and accelerators, such as GPUs, are added to HPC clouds to provide higher performance.
The most popular decentralized multisite systems are the Internet and World Wide Web (WWW).The WLCG is a multisite system that consists of over 170 supercomputing centers in 40 countries.However, the WLCG has both decentralized and centralized features.It is decentralized in the sense that its supercomputing centers are owned and managed by more than 100 institutions, which volunteer to contribute their resources for the common research agenda in high-energy physics.It is centralized in the sense that these resources, once contributed, are controlled and administrated in a common way, with a single-system image.For instance, the 1700 scientist users have the same global user account scheme, computational jobs are scheduled centrally, and petabytes of data are managed in one data space.A cross-site middleware platform supported by the European Middleware Initiative provides single-system image functionality and central management [17].
The Supercomputing Center of the Chinese Academy of Sciences (SCCAS) has developed a RESTful web interface called SCEAPI to enable access to various resources such as computing queues, applications, and data.SCEAPI allows scientists to securely connect to non-WLCG computers to analyze data for ATLAS experiments at CERN without a complicated WLCG middleware setup.
By a common understanding, a nationwide HPCE needs to interconnect and manage users, data, and programs hosted in multiple supercomputer centers that are geographically distributed.However, many users still prefer the single-site usage modes.Thus, the HPCE in China is designed to enable all four modes in Fig. 2. In Table 4, we list the main generations of progressively more sophisticated technology.Note that older technology is not necessarily worse technology for a user.In addition, these four generations are not inclusive, that is, later technology does not necessarily include earlier technology, and they reflect different user preferences.In fact, all four generations are needed and in use today.
HPCE 0 is characterized by isolated sites.Some sites are not even on the Internet, or blocked by firewalls.The HPCE 0.1 mode only allows users to enter a supercomputer site to conduct on-site computing.The HPCE 0.2 mode allows users to use a client machine to access resources remotely at a supercomputer site through protocols such as SSH and SFTP.A user may access multiple sites via the same client device, but the sites themselves are not interconnected.Some sites even force users to log in through a dynamic passcode counter for security reasons.HPCE 1 features interconnected sites.The I-WAY project in 1995 was probably the first interconnected HPC environment, or HPCE 1.1.If there are n sites, a user needs to send out n resource request applications for n user accounts, one for each site.Then the user can develop multisite resource sharing and collaboration capabilities at the application level, but with no platform support.In HPCE 1.2, some platform support is provided through middleware for cross-site resource sharing and collaboration.Often-used middleware is GridFTP [18] for intersite data transfer.Furthermore, a user only needs to send out one resource request application to the environment, which covers the desired resources from all sites.HPCE 2 features federated sites.Federation in this instance means that the sites each set aside some resources to be managed by the environment.An important feature is that a global, environment-level user account system is available, which enables a user to use resources in all sites.Some type of singlesystem image is also available.Although the sites still belong to different institutions or organizations, a virtual organization is set up to control and manage the environment resources.In HPCE 2.1, the environment is designed mainly for one application domain.A good example is the WLCG for high-energy physics.In HPCE 2.2, the environment needs to support multiple application domains.As an example, the CNGrid environment today is designed to support three communities, in addition to generalpurpose scientific and engineering computing.The three application domains are drug discovery, movie rendering, and industrial simulation.
With HPCE 3.0, the cyberinfrastructure is no longer just for cyberspace.It is extended to human society and the physical world for human-cyberphysical ternary computing.This is a future trend.

Loosely coupled architecture demonstrates flexibility
An HPCE should support general-purpose scientific and engineering computing, in addition to domainspecific systems, over multiple sites belonging to different organizations.This is a challenging task.Over the practice of China's HPCE development, four technical guidelines have emerged.The central concept is loosely coupled system architecture, that is, the HPCE system architecture needs to allow the corunning of multiple technical stacks.
Guideline 1: Users are different, sites are different, and communities are different.An HPCE needs to recognize and respect these differences.It is difficult or even impractical to force all stakeholders them to follow a single management policy or single usage mode.
Guideline 2: The HPCE needs to provide sitelevel, environment command line-level, and Web portal-level interfaces.Many users only want to run applications and are not concerned about optimized performance.They are often satisfied with using an environment-level Web portal interface.Power users are concerned about performance and efficiency, and are willing to take time to assess system details.They need command line interfaces at both the environment and site levels.
Guideline 3: HPCE middleware is there to help, not impede.The HPCE system architecture must allow users and sites to bypass HPCE middleware when developing and executing applications.The HPCE can provide information and knowledge, not just management.
Guideline 4: The HPCE is comprised of two circles: at the core is the CNGrid itself, and the outer circle is the CNGrid+.The CNGrid is managed by the same middleware, but allows users to also access a site directly.The CNGrid+ has additional users and sites that can be managed differently, but share information and knowledge with CNGrid.They can even leave CNGrid later to form a subcommunity of a nationwide HPCE.
By 2015, the CNGrid HPCE had 17 sites and over 3000 users, of which 40% accessed resources through CNGrid SCE middleware.The CNGrid+ spun off many private HPC environments.They began as members of CNGrid, developed their networked computing knowledge base, and then left CNGrid to form their own HPC environments.For instance, Beijing Genomic Institute (BGI) was a member of CNGrid in 2000-2008.BGI utilized resources in CNGrid to quickly become a genome 'sequencing superpower' [19], and to develop its rice genomics information service system, BIG-RIS [20].BGI is now one of the leading genome sequencing institutions in the world.Another example is China Aviation Corp (AVIC), which developed its HPCE prototype [21] as a project in CNGrid, but later ran the developed HPCE production system as an intracompany computational grid.These HPCEs became private mainly because their computations involved proprietary software and data, or private information.
Loosely coupled architecture also enables innovations, sometimes extending beyond the boundary of traditional HPC, to enter emerging areas such as cloud computing and big data computing.For instance, built on experiences in HPCE, Tsinghua University's HPC site now offers all students and faculty members a cloud account, with storage space and computing power.The CNGrid GOS team developed open source software such as CCIndex [22], RCFile [23], and DataMPI [24,25], for big data computing.
Although loosely coupled architecture allows users to bypass HPCE middleware, implementing  Figure 3 illustrates the 'environment abstraction' benefit, that is, an HPCE user can see and use an abstract collection of resources from all sites through a familiar interface, although actual executions of computations occur in the sites.
The CNGrid HPCE offers each user two types of accounts: an environment account and a set of site accounts, one for each granted site.Through the environment account, the user sees a uniform user interface that hides site heterogeneity and offers site location transparency.An HPCE user can use either the Web portal or command line method to use resources, with the same environment account name and password.
The HPCE user sees an abstract collection of permitted job queues; software, encapsulated as services; and storage spaces in all participating clusters of the sites.A special storage space is set aside to hold all users' environment accounts and home directories.HPCE users execute the same operating steps or commands as they would if they used a cluster, but the resources used can be on any site.For example, the HPCE command line 'bsub -n 128 -q normal vasp' means submitting a job to the queue 'normal' utilizing 128 cores and VASP software, even though the executing clusters may have different job management systems or VASP environments and commands.
When any cluster in HPCE is down, users can transparently use other clusters without needing to apply for another user account or read new cluster manuals.In September 2010, one cluster was down for almost one month because of water cooling system damage.As a result of an HPCE, users were able to use other clusters, and most of their jobs were executed without interruption.
Figure 4 illustrates the 'site augmentation' benefit.More precisely, the benefit is 'site user augmentation', that is, a traditional HPC site user can be augmented by HPCE middleware by bringing in extended job queue and software resources.The site user sees an augmented site, with more job queues and software than the existing site originally offered.
The important factor is that a site user can be elevated to an HPCE user by the HPCE administrator.Then the user can remain as a user of the existing site, supported by the familiar site team, but use job queues and software in other sites.A recent example occurred in 2014, when a user of the SCCAS site constantly complained that his jobs suffered from a very long pending time.This scientist has been studying the effect of aluminum on the protein structure using quantum computation on Gromacs software.The SCCAS site elevated him by providing an HPCE account.He could then submit Gromacs jobs to other sites.Now, he is still one of most active users at the SCCAS site.

INTERNATIONAL COOPERATION IS ESSENTIAL
Although CNGrid is a nationwide HPCE environment mainly for China, we found that international cooperation is essential in the following three aspects: First, China's HPC field is part of the international HPC community.Interactions among scientists and engineers are very important for the healthy development of the worldwide HPC field, especially when the world community joins forces, going in the same direction.
Second, China's HPCE development has benefitted from international cooperation projects.We participated in the WLCG [1,2], the European Commission's XtreemOS project [27], and the UK's e-Science projects [28], and collaborated with US cyberinfrastructure projects such as Globus [29] and TeraGrid [3].We learned the importance of supporting application domains from the NanoHub project [30].
Third, China's HPCE development contributes to the worldwide HPC community.Hundreds of graduate students educated in the process of developing China's HPCE now work in the HPC field worldwide.Scientific data produced by China's HPCE are used by colleagues worldwide.Early examples are rice genome data [31] and rice information services [20], which have been widely used and cited.China's HPCE has also contributed open source software to the worldwide community, a partial list of which is shown in Table 5.

What is a sustainable development model for an HPCE?
We summarized in Fig. 1 and Tables 1 and 2 that the environment approach in 1995-2015 helped to grow China's supercomputing field.China's sustainable development and scientific advance will benefit if the growth trends continue in 2015-2035.For instance, there is still a severe lack of HPC human resources in China's IT industry today.People who understand HPC are in high demand, whether the knowledge is in HPC systems architecture, computational models, parallel algorithms, performance optimization, or coding for MPI, OpenMP, or GPU application programs.Can we increase China's HPC users by two orders of magnitude as we did in the 20 years from 1995 to 2015?That is, can we have REVIEW 100 000 HPCE users by the year 2035?This is not an unrealistic goal because many students in China's science and engineering schools are already using GPUs in their personal computers and local servers to speed up computation.
There is no evidence that the growth trend will continue for the next 20 years.In particular, the environment approach is not guaranteed to be adopted in the future.In fact, we face the danger that future development of China's supercomputing field may degenerate into the isolated sites (HPCE 0.0) situation of 1995, back to the machine approach.
The problem is that we have not found a sustainable model, especially a sustainable institution model for the long-term development of the supercomputing field in China.By contrast, the space program in China has such a long-term, sustainable development model.
There is related work and practice worldwide, especially in Europe, the USA, and Japan.The USA seems to have a relatively comprehensive model.The federal government plays a leadership role, with participation from industry and local governments.Approximately every 10 years, the US federal government issues a binding strategy document.The most recent White House executive order was announced in July 2015 and set up a coordinated 20-year strategy called the National Strategic Computing Initiative (NSCI).At the agency level, an interesting example is how NSF supports HPCE, which is called cyberinfrastructure by NSF.Cyberinfrastructures are 'research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services distributed over the Internet beyond the scope of a single institution' [47].Before 2004, NSF supported cyberinfrastructures by project.Between 2005 and 2012, NSF established a temporary Office of Cyberinfrastructure to perform the role.After 2013, this office became a permanent Division of Cyberinfrastructure.
China can learn from all the above initiatives, but has to find its own model to suit its situation.Two central questions are as follows: How should the central government play a leadership role in setting up a long-term national strategy?What institution should be established to conduct the strategy?

Why not a global HPCE like the Internet and WWW?
Both the Internet and WWW were mainly created and implemented by the public research commu-nity, and they are both 'beyond the scope of a single institution'.They reached global, massive public use in fewer than 20 years.Today, we have a single, global Internet and a single, global Web used by billions of ordinary citizens.
If we set I-WAY [10] as the world's first HPCE, 20 years have passed since its invention.Thus, why is HPCE not yet widespread?Why is there not a single, global HPCE, or cyberinfrastructure?What makes HPCE different from the Internet and Web?What is missing?
We heard many anecdotal 'reasons'.However, research is greatly needed to gain a systematic and scientific understanding.We briefly discuss three common complaints below.
An HPCE is supposed to provide resource sharing and collaboration capabilities among multiple sites.However, resources in most HPC centers in China and worldwide are already heavily booked, leaving no resources available for sharing, for instance, the utilization of the CAS Supercomputing Center's machines is above 75%.Another common complaint is that the current HPCE technology stack is too heavy and complex and difficult to learn, use, and operate.This was made worse by the Web services movement that started in the early 2000s, when big companies arrived and pushed Web services standards, such as the simplest example called WS-I [48], and this had an overall negative effect on HPCE development.More recent HPCE software has shed much of the complexity and has opted for the simpler REST architectural style [49].Yet, another complaint is that the current HPCE lacks cloud computing's elasticity and agility, such that users can almost instantly expand their computing resources from 10 cores to 10 000 cores in a few minutes or even seconds.This is made possible by the fact that a cloud is centralized, owned, and operated by a single institution.An HPCE is a federation of multiple sites 'beyond the scope of a single institution'.[47] It is not as easy to realize such elasticity.

OUTLOOK AND PERSPECTIVES
What will happen in the next 20 years?What will we see when looking back in 2035?We offer perspectives on three important issues: Will we see zettaflops system capability?Could this zettaflops capability be provided with 10 TOPJ (trillion operations per joule) energy efficiency?Can we build, by 2035, a seamless environment for human-cyberphysical ternary computing? 45

Zettaflops computing capability by 2035
In China, the installation of 100-petaflops systems by 2016 is scheduled for the CNGrid HPCE.President Obama of the USA recently issued an executive order to create an NSCI, which will 'create systems that can apply exaflops of computing power to exabytes of data', probably around 2023.What about 2035?Will the world see zettaflops computing capability?Should China set zettaflops supercomputing on zettabytes of data as a national research priority?This is a serious question, not idle speculation.In 2012, the CAS launched a New-Generation Information and Communication Technology strategic priority project, where a core component was cloud-sea computing systems for zettabytes of data [50].The Chinese Academy of Engineering recently started a community consultation process to identify potential national priority research directions of engineering technology by 2035 [51].One of the six candidates in the information technology area is zettaflops supercomputing.However, Thomas Sterling, a coauthor of the influential book Enabling Technologies for PetaFLOPS Computing, conjectured that 'we will never reach zettaflops' [52].
We believe that zettaflops supercomputing on zettabytes of data should be a long-term national research priority because there are scientific and societal needs, especially in intelligent computing with multiscale, high-dimensional data in a human-cyberphysical universe.However, the capability could be best provided by an HPCE, not a single supercomputer system.The systems architecture, program- ming model, and application frameworks could be quite different from today's SCEs.Research should start now.

000 times improvement of energy efficiency
The modern HPC era started in 1976 with the introduction of the Cray-1 supercomputer.When one looks back carefully at the history of the past four decades, two major phases may be observed in HPC systems development, each lasting roughly 20 years.The first is called the performance-first phase, lasting roughly from 1976 (Cray-1) to 1994.The most important priority of HPC systems development in this phase was performance, or flops speed.Factors such as systems cost and application scope were of secondary consideration.Energy consumption was considered insignificant.The second phase is called the scalability-first phase, spanning from 1994 (IBM SP-2) to the present.The most important priority of HPC systems development in this phase is scalability, including market scalability and systems scalability.The worldwide overall HPC market revenue grew from US$2 billion in 1990 to approximately US$25 billion in 2015, according to the market research firm IDC.The application scope significantly expanded.An important feature of systems architecture to support application market expansion is the convergence to clusters, so that a big system can scale down to smaller systems.This increased product volume thus improves the performance/cost ratio.For highend applications, a system can scale up and scale out to provide more parallelism.The number of cores per system increased from 140 in 1995 to 31 million in 2015.Now we are entering a third phase: the efficiencyfirst phase.In the next 20 years, supercomputing systems research needs to increase energy efficiency by 10 000 times, in addition to continuing performance and scalability advances.The reason is illustrated in Fig. 5, which lists the speed (operations executed per second), energy efficiency (operations executed per kilowatt hour), and system power consumption (watt) of the world's fastest computers of the past 70 years.We can observe a disturbing recent trend.For 60 years, the energy efficiency improved at the same rate as the speed.However, in the past 10 years, this has changed, and energy efficiency improvement now lags behind speed improvement.
The research community should reverse this trend by setting a bold goal of achieving energy efficiency of 10 tera operations per joule (10 TOPJ), or 10 tera operations per second per watt (10 TOPS/W) by 2035.Today, CPUs can deliver top or laptop, but from tablets, smartphones, or even sensor devices.The HPCE we see today is mostly within cyberspace.We may see a trend to extend the HPC environment to the physical world.In fact, today's CNGrid already has environmental science applications, where sensor devices are used together with the HPCE to collect and analyze data for bird migration patterns in the Qinghai Lake Nature Reserve (See Fig. 7) [56].This is a rudimentary humancyber-physical ternary computing scenario: sensors and wireless communication technology extend the HPCE infrastructure to the physical world, the backend supercomputer performs data analytics, and scientist users orchestrate and steer field research and backend computation tasks.Can we have, by 2035, a seamless environment for human-cyber-physical ternary computing [57]?In such an environment, the scientist users, cyberspace, and physical world all become resources and research targets in a new type of HPCE.We can expect that some type of seamless intelligence [58] will become available for scientific research by 2035.Such a seamless environment would not only enable high-end scientific research, but also benefit scientific and engineering experiments for students in high schools and universities.

Figure 3 .
Figure 3.The CNGrid architecture as seen by an environment user.

Figure 4 .
Figure 4.The CNGrid architecture as seen by a site user.

2 Figure 5 .
Figure 5.The trends of speed, energy efficiency, and power consumption of the world's fastest computers over the past 70 years (1945-2015).

REVIEW Xu et al. 47 Figure 7 .
Figure 7.A video monitoring system deployed in the core protection area of the Qinghai Lake nature reserve, networked to the CNGrid backend.

Table 1 .
Growth trends of top supercomputers in China and the world before and after 1995.

Table 3 .
Richard Karp's four phases of relationship between computing and sciences.

Table 4 .
Evolution of four generations of the HPCE.

Table 5 .
A partial list of open source software contributed by China HPCE.