Fortran interface layer of the framework for developing particle simulator FDPS

Numerical simulations based on particle methods have been widely used in various fields including astrophysics. To date, simulation softwares have been developed by individual researchers or research groups in each field, with a huge amount of time and effort, even though numerical algorithms used are very similar. To improve the situation, we have developed a framework, called FDPS, which enables researchers to easily develop massively parallel particle simulation codes for arbitrary particle methods. Until version 3.0, FDPS have provided API only for C++ programing language. This limitation comes from the fact that FDPS is developed using the template feature in C++, which is essential to support arbitrary data types of particle. However, there are many researchers who use Fortran to develop their codes. Thus, the previous versions of FDPS require such people to invest much time to learn C++. This is inefficient. To cope with this problem, we newly developed a Fortran interface layer in FDPS, which provides API for Fortran. In order to support arbitrary data types of particle in Fortran, we design the Fortran interface layer as follows. Based on a given derived data type in Fortran representing particle, a Python script provided by us automatically generates a library that manipulates the C++ core part of FDPS. This library is seen as a Fortran module providing API of FDPS from the Fortran side and uses C programs internally to interoperate Fortran with C++. In this way, we have overcome several technical issues when emulating `template' in Fortran. By using the Fortran interface, users can develop all parts of their codes in Fortran. We show that the overhead of the Fortran interface part is sufficiently small and a code written in Fortran shows a performance practically identical to the one written in C++.


Introduction
Numerical simulations based on particle methods have been widely used in many fields of science and engineering. For example, in astrophysics, gravitational N-body and smoothed particle hydrodynamics (SPH) simulations are commonly used to study dynamics of celestial bodies. In the context of engineering and biology, moving particle semiimplicit (MPS) method, molecular dynamics (MD), power dynamics, and distinct element method (DEM) simulations are utilized for a variety of purposes such as disaster prevention and drug discovery. The reason for the frequent use of particle-based methods may be attributed to the fact that various kinds of physical systems can be well modeled by collections of particles, and particle-based methods have several advantages (e.g., adaptivity in space and time, Galilean invariance). The numbers of particles used in simulations go on increasing because there has been a rise in the need for refining results drawn from numerical simulations. For instance, cosmological N-body simulations can be used to study the evolution of rare objects such as quasars and active galactic nuclei. In order to make a prediction for such objects with a good statistical accuracy, we need to simulate a large volume of the universe (Ishiyama et al. 2015). Large-scale simulations are also required in MD simulations of vapor-to-liquid nucleation for a direct comparison of predicted nucleation rates with those measured by laboratory experiments (Diemand et al. 2013). This trend requires researchers to develop simulation codes that run efficiently on modern distributed-memory parallel supercomputers.
However, developing such a code is quite difficult, timeconsuming task, because in order to achieve high efficiency we need to implement efficient algorithms, such as the Barnes-Hut tree algorithm (Barnes & Hut 1986), as well as some load-balancing mechanism into the code. To date, simulation codes have been developed by individual researchers or research groups in each field of science, costing a huge amount of time and effort, even though the numerical algorithms used are very similar to each other. Besides, in many cases, simulation codes are being developed for each specific application of particle methods, making it difficult for researchers to try a new method or an alternative method. This is not efficient.
In order to improve this situation, we have developed a framework called FDPS (Framework for Developing Particle Simulators; Iwasawa et al. 2015Iwasawa et al. , 2016 1 that enables researchers to develop codes for massively parallel particle simulations for arbitrary particle methods easily, without parallelizing their simulation codes by themselves. More specifically, FDPS provides a set of functions that (1) divide a computational domain into subdomains based on a given distribution of particles and assigns each subdomain to one Message Passing Interface (MPI) process, (2) exchange the information of particles among processes so that each process owns particles in its subdomain, (3) collect the information of particles necessary to perform interaction calculations in each process, and (4) perform interaction calculations using the Barnes-Hut tree algorithm for long-range force, or a fast tree-based neighbor search for short-range force. All of these functions [hereafter called FDPS API (application programming interface)] are highly optimized, and codes developed using FDPS show good scalings for a large number of processes (up to ∼10 5 processes; Iwasawa et al. 2016). Thus, FDPS allows researchers to concentrate on their studies without spending time on the parallelization and complex optimization of the codes. FDPS has already been used to develop various applications (e.g., Hosono et al. 2016aHosono et al. , 2016bHosono et al. , 2017Michikoshi & Kokubo 2017;Tanikawa et al. 2017;Iwasawa et al. 2017;Tanikawa 2018aTanikawa , 2018b. One issue affecting the previous versions of FDPS (version 2.0 or ealier) is that it requires researchers to develop codes in the C++ programming language. This limitation comes from the fact that FDPS is written in C++, in order to use the template feature in C++ to support arbitrary data types of particles. However, there are many supercomputer users who mainly use Fortran language to develop codes. In order for such people to use the functionalities of FDPS, they must first spend time learning the C++ language. In addition, programs which they have written in Fortran in the past can no longer be used in any newly-developed codes. To cope with this problem, we have redesigned FDPS to have a Fortran interface layer (hereinafter, we call it "Fortran interface"). This layer provides an API for Fortran, i.e., a set of subroutines/functions that enables the use of the functionalities of FDPS from Fortran. Thus, in FDPS version 3.0 or later, researchers can develop their simulation codes in Fortran. In this paper, we present the basic design and implementation of Fortran interface and compare the performance of a simple application written using Fortran interface with that written in C++ using FDPS. We show that the overhead of Fortran interface is sufficiently small and the performances of both codes are almost identical. This paper is organized as follows: In section 2, we briefly review the C++ core part of FDPS necessary to explain the design and implementation of Fortran interface, which are themselves described in section 3. The performance of the application developed using Fortran interface is shown in section 4. In section 5, we present one example of practical application. Finally, in section 6, we summarize this study.

Overview of FDPS
In this section, we provide a brief overview of the previous versions (version 2.0 or earlier) of FDPS. In subsection 2.1, we describe what FDPS does and what FDPS looks like for users. In subsection 2.2, we explain the implementation details of the previous versions of FDPS.

What FDPS does and its user interface design
FDPS provides C++ library functions that perform tasks required for parallel particle simulations. In distributedmemory parallel supercomputers, particle simulations are generally performed through the following procedure.
(1) The computational domain is divided into subdomains based on a given distribution of particles and each subdomain is assigned to one MPI process. We call this task "domain decomposition." (2) Particles are exchanged among processes so that each process owns particles in its subdomain. We call this task "particle exchange." (3) Each process collects the information necessary to perform interaction calculation for the particles belonging to this process. (4) Each process performs interaction calculation. (5) The information about particles is updated using the result of the interaction calculation in each process.
Among the procedures above, procedures (1)-(3) are specific for parallel computation and the implementation of them into a code is a daunting task. As described in section 1, FDPS provides APIs, a suite of functions that perform these procedures efficiently. Therefore, using FDPS, researchers can avoid the implementation of these complex procedures. In addition, FDPS also provide APIs that perform procedure (4), i.e., interaction calculation, using fast algorithms for both long-and short-range forces. Thus, users of FDPS can perform these procedures just by calling FDPS APIs. The other parts of users' codes including procedure (5) are implemented by the users. These parts do not involve parallelization and thus users of FDPS do not need to consider the parallelization of the codes explicitly. Figure 1 schematically illustrates a typical structure of a parallel particle simulation code developed using FDPS and how a user uses FDPS. The user program first defines particle data set and interaction using class and function in the C++ programming language, respectively. Thus, the user program must be written in C++. This is because, as shown in the right-hand portion of figure 1, FDPS is written in C++, for the reason given in subsection 2.2. FDPS provides APIs to perform domain decomposition, particle exchange, and calculation of interaction. They are defined as function templates in C++ (see subsection 2.2). These function templates are combined with the definitions of particle and interaction in order to instantiate functions that perform domain decomposition, particle exchange, and calculation of interaction in the user program. The instantiated functions are used in the main loop of the user program, as shown in the lower left-hand portion of figure 1. FDPS accepts arbitrary types of particle and interaction. Thus, we can use FDPS to develop any kinds of particle simulation codes.
In the next section, we will explain how the user interface design described here is realized using features in the C++ programming language.

Implementation details of FDPS
One development goal of FDPS is to make it possible for FDPS to support arbitrary types of particle and interaction.
In order to achieve this without a decrease in performance of simulation codes, FDPS is implemented using the template feature in the C++ programming language. Roughly speaking, this feature allows functions and classes to be described by the use of general data types (for detail, see descriptions in the ISO C++ standard [ISO/IEC 14882:2014]). Functions and classes described using the template feature are called function templates and class templates, respectively. All general data types used in a function template or class template are replaced by specific data types given through special arguments, called template arguments, at the compile-time of a program. Hence, there are no unknown data types at the compiletime, allowing compilers to optimize function templates aggressively. Thus, using the template feature, we can make a high-performance library for general particle types. These are the reasons why we adopted the C++ programming language to develop FDPS.
In order to make it possible for FDPS to provide the APIs that perform domain decomposition, particle exchange, and interaction calculation for arbitrary types of particle data and interaction, we adopt an internal structure for FDPS as described below. First, all functions in FDPS API are implemented as function templates to manipulate general particle types. As a result, users of FDPS must implement particle data as C++ classes and pass them to FDPS API through template arguments when using the API. Secondly, we require that particle classes defined by FDPS users must have some public member functions that have specific names and specific functionalities. This is because FDPS, for example, must know which member variable represents the position of a particle to perform domain decomposition and particle exchange. We solve this by requiring that a particle class has a public member function, named getPos, that returns the position of a particle. For similar reasons, there are other public member functions that a user-defined particle class should have. For a complete list, one can refer to the specification document for FDPS. Thirdly, we require that particle-particle interactions must be implemented as functions or functors that have a specific interface. With this, we can implement FDPS without knowing the contents of interaction functions. This requirement does not restrict types of interactions; we can still implement arbitrary types of interaction functions as long as they have the specified interface.
The FDPS API is actually provided as the public member functions of two class templates, ParticleSystem and TreeForForce, and one (non-template) class, DomainInfo. Hereinafter, they are collectively called FDPS classes. Each of them is used to make FDPS do a certain task. To be more precise, instances of FDPS classes ParticleSystem, DomainInfo, and TreeForForce are used, respectively, for particle exchange, domain decomposition, and interaction calculation.
To illustrate the usage of FDPS API, we show in figure 2 an example of a C++ code that uses FDPS. At the first line of the example, the file particle_simulator.hpp is included. This is the file where all FDPS classes are implemented. Hence, all user programs include this file. As explained above, in order to use the functionalities of FDPS, we first create instances of FDPS classes. This is done at lines 9-11 in the example, where "PS" is the name space in which FDPS classes are defined and the words separated by commnas in the angle brackets associated with FDPS classes ParticleSystem and TreeForForce are template arguments. ParticleSystem class takes FullParticle (FP) class as a template argument, where FP class is a class that contains all the information about a particle necessary to perform simulations as member variables. Tfp in the example is the user-defined FP class. In the example, TreeForForceLong class is used. It is a TreeForForce class specifically for long-range force and takes the following three particle classes as template arguments-the first one is Force class which contains the quantities of an i-particle used to store the results of the calculations of interactions as member variables; the second and third ones are EssentialParticleI (EPI) and EssentialParticleJ (EPJ) classes which contain the minimum sets of the quantities of i-and j-particles necessary to calculate interactions as member variables, where an i-particle is a particle that receives a force and a j-particle is a particle that exerts a force. In the example, Tforce, Tepi, and Tepj are the user definitions of these three classes. Hereinafter, we call the four kinds of particle classes described above user-defined types. 2 After creating the instances of FDPS classes, we can  More details about the previous version of FDPS can be found in Iwasawa et al. (2016).

Design, usage, and implementation of Fortran interface
In this section, we describe the details of our Fortran interface. In subsection 3.1, we describe the user interface design of Fortran interface. In subsection 3.2, we explain the usage of Fortran interface using a specific example of a Fortran program. In subsection 3.3, we describe the implementation details of Fortran interface. In the following, we call the part of FDPS corresponding to the previous version of FDPS the core part.

User interface design of Fortran interface
The Fortran interface to FDPS developed in this study wraps the core part of FDPS and hence has essentially the same user interface design as shown in figure 3. The user program first defines particle and interaction using derived data type and subroutine in Fortran 2003, respectively. Thus, the user program must be written in Fortran 2003. This is because, as will be explained in subsection 3.3, the Fortran interface uses features in Fortran 2003 in order to make Fortran interoperate with C++. In the definition of particles, the user must describe complementary information about particles using FDPS directives, which are Fortran comments having special format and are introduced by us (for details, see subsection 3.2). This procedure corresponds to define specific public member functions with special names in a particle class when using FDPS from C++ (see subsection 2.2). The Fortran interface to FDPS is written in a Fortran module, as shown in the right-hand portion of figure 3, and all of the FDPS API are defined as public member functions of a Fortran class for manipulating the core part of FDPS. Corresponding with the APIs in the core part of FDPS, there are APIs for domain decomposition, particle exchange, and calculation of interaction in Fortran interface. The user program calls these APIs in the main loop, as shown in the lower left-hand portion of figure 3. The Fortran interface is also designed to accept arbitrary types of particle and interaction as with the core part of FDPS. In the core part of FDPS, this is realized using the template feature in C++ as described in subsection 2.2. Fortran does not have such a feature. In the Fortran interface developed in this study, we realize it via an automatic generation of source programs of Fortran interface specifically for particle types given by users based on the information given through FDPS directives (the reason why this kind of mechanism is needed will be described in subsection 3.3). The generation of source programs of Fortran interface is done by a PYTHON script provided by us. The users of Fortran interface must run this script to use Fortran interface.
In the following subsections, we first explain the usage of Fortran interface in subsection 3.2, and then describe the implementation details of Fortran interface in subsection 3.3.

Usage of Fortran interface
In this section, we explain using a sample code how a user can write an application program using Fortran interface. The basic procedures to develop codes using Fortran interface are as follows. I. Define a particle using a Fortran-derived data type. II. Generate Fortran interface programs. III. Define an interaction using a Fortran subroutine. IV. Develop the main part of a code using the FDPS API given through Fortran interface.
Thus, the development procedures are similar to the case in which we develop codes in C++ using FDPS (see section 2). The only difference is the existence of Procedure II. There is no difficulty in this procedure because a user only has to execute the auto-generation script provided by us as stated earlier.
The sample code used is a gravitational N-body simulation code and is essentially the same as the one shown in subsection 2.2 in Iwasawa et al. (2016), except that the code shown here is written in Fortran. The behavior of the code is very simple: it sets the initial condition by reading particle data from a binary file, and then it follows the time evolution of the system by integrating Newton's equations of motion of particles with the leap-frog method. The gravitational forces acting on particles are calculated using FDPS with the tree algorithm in which the gravitational forces from distant particles are approximated as those from a superparticle with multipole moments. For simplicity, we Downloaded from https://academic.oup.com/pasj/article-abstract/70/4/70/5036218 by Kobe University Library user on 14 July 2020 use the center-of-mass approximation. In this case, the gravitational acceleration of particle i is evaluated as the sum of contributions from other particles and superparticles: (1) where r i , r j , r j are the positions of particle i, particle j, and superparticle j, respectively. m j and m j are the masses of particle j and superparticle j, respectively. i the gravitational softening of particle i, and G is the gravitational constant. Figure 4 shows the sample code, which runs either in a single-process execution case or an MPI parallel environment. The code consists of three parts: the definition of the particle (lines 7-18), the definition of the interaction functions (lines 22-88), and the other parts including numerical integration and I/O (lines 92-205). These three parts correspond to Procedures I, III, IV described at the beginning of this section. In the following, we explain each procedure in detail.

Defining particle types
Users of FDPS must first define particles as derived data types in Fortran (Procedure I). In the sample code, the particle data is defined as full_ptcl type in lines 7-18. It has the following member variables: id (particle identification number), mass (m i ), eps ( i ), pos (r i ), vel (v i ; the velocity of particle i), pot (φ i ; the gravitational potential at r i ), and acc (dv i /dt).
As we will describe in subsection 3.3, the particle data type must be interoperable with C because Fortran interface uses the C interoperability feature in Fortran 2003 to exchange data between Fortran programs and the core part of FDPS. In order to be interoperable with C, a derived data type must satisfy the following conditions. r It has the bind(c) attribute. r The data types of all member variables must be interoperable with C.
The first condition is satisfied if a derived data type is declared with the bind(c) keyword (line 7). As for the second condition, Fortran 2003 or later offers data types corresponding to primitive data types in the C language such as int, float, double, etc., through the iso_c_binding module. The second condition is fulfilled if we define member variables using these data types. integer(kind=c_long_long) is one such example. type(fdps_f64vec) type used at lines 14, 15, and 17, which is defined in module fdps_vector and represents a space vector, is also interoperable with C because their member variables are all interoperable with C.
Users of FDPS must write FDPS directives within the definition part of the particles. One can see a variety of Fortran comments which begin with !$fdps in the sample code. They are FDPS directives. FDPS directives are used to pass complementary information about particles, e.g., which member variable represents a physical quantity that FDPS must know, to FDPS. FDPS directives can be classified into the following three types.
(a)Directive specifying the type of a particle. (b)Directive specifying which member variable represents which physical quantity. (c) Directive specifying the way the data operation is performed in FDPS.
In the following, we explain each type of directive. Directive (a) is used to specify which user-defined type a derived data type corresponds to out of FP, EPI, EPJ, and Force described in subsection 2.2. A derived data type can serve multiple user-defined types. Directive (a) in this sample code is written at line 7: !$fdps FP,EPI,EPJ,Force With this, full_ptcl type can be used as any of user-defined types in the sample code.
Directive (b) is used to specify which member variable represents the physical quantity required by FDPS. In the present case, we need to specify the mass and the position of a particle (the mass is required to calculate the monopole information of superparticles). Therefore, the (b) directives are written as follows (see lines 12 and 14): real(kind=c_double) mass !$fdps charge type(fdps_f64vec) :: pos !$fdps position With these, FDPS recognizes that the variables mass and pos represent the mass and the position of a particle.
In the sample code, three directives of type (c) are written in lines 8-10 as follows. !$fdps copyFromForce full_ptcl (pot,pot) (acc,acc) !$fdps copyFromFP full_ptcl (mass,mass) (eps,eps) \ (pos,pos) !$fdps clear id=keep, mass=keep, eps=keep, \ pos=keep, vel=keep where the symbol \ is used to represent that the current line continues to the next line because of space limitations; it cannot be used in an actual code (FDPS directive must be written within a line). The first two directives containing keywords copyFromForce and copyFromFP specify how data copy is performed between different user-defined types. The former specifies the method of data copy from Force type to FP type and the latter specifies that from FP type to EPI type or EPJ type. The directive including the Downloaded from https://academic.oup.com/pasj/article-abstract/70/4/70/5036218 by Kobe University Library user on 14 July 2020 keyword clear is used to specify how to initialize Force type before starting the calculations of interactions.
The details of FDPS directives can be found in the specification document for Fortran interface in the distributed FDPS package.

Generating Fortran interface
The next step is the generation of Fortran interface programs (Procedure II), which can be done simply by executing the auto-generation script gen_ftn_if.py provided by us in the command line as follows.
./$(FDPS_LOC)/scripts/gen_ftn_if.py sample_ code.F90 where $(FDPS_LOC) represents the PATH of the top directory of the FDPS library. If the auto-generation succeeds, all the files described in sub-subsection 3.3.2 are created at the current directory.

Defining interaction functions
Users of FDPS must define interactions as subroutines in Fortran (Procedure III). In the sample code, two subroutines are defined-one for the gravitational interaction between particles (lines 22-54) and the other for the interaction between particles and superparticles (lines 56-88).
The interaction functions must also be interoperable with C because their function pointers are passed to the core part of FDPS using the C interoperability feature in Fortran 2003. To be interoperable with C, a subroutine must satisfy the following conditions. r It has the bind(c) attribute. r The data types of all variables used in the subroutine must be interoperable with C.
These conditions are the same as those for derived data types which are interoperable with C (see subsubsection 3.2.1). Hence, we can clear these requirements in the same way: first add the bind(c) keyword after the argument list of the subroutine (see lines 22 and 56), and define variables using data types which are interoperable with C.
The interaction functions must have the following arguments: (from the beginning) the array of i particles (ep_i), the number of i particles (n_ip), the array of j particles or superparticles (ep_j), the number of j particles or superparticles (n_jp), and the array of variables that stores the calculated interaction on i particles (f). Due to the specification of the core part of FDPS, the numbers of j particles and superparticles must be pass-by-value arguments. Hence, the value keyword is needed in the type declaration statements for these arguments (see lines 21 and 57). FDPS does not optimize the interaction functions. Thus, users of FDPS must optimize the interaction functions by themselves in order to achieve high efficiency. For simplicity, the interaction functions in this sample code are implemented without any optimization techniques. Some typical optimization techniques are presented in section 4.

Developing the main part of a user code
Users of FDPS must implement the main routine of a user code within a subroutine named f_main() for the reason explained in subsection 3.3 (Procedure IV). f_main() of the sample code is implemented in lines 92-122 and it consists of the following steps.
(1) Initialize FDPS (line 103). In the following, we first explain the variable declaration part of f_main(), then we explain each step in detail.
In the Fortran interface, FDPS API is provided as typebound procedures of Fortran 2003 class fdps_controller defined in the module fdps_module. This module is defined in FDPS_module.F90, one of the source programs of Fortran interface. In order to use FDPS API, we first need to make this module accessible from f_main(), and then we should create an instance of this class. These things are done at lines 93 and 102, respectively. Thus, in this sample code, FDPS API is used by calling type-bound procedures of class instance fdps_ctrl.
In step 1, API ps_initialize is called. This procedure initializes MPI and OpenMP libraries if they are used. If not, it does nothing.
In step 2, we create and initialize three instances of FDPS classes ParticleSystem, DomainInfo, and TreeForForce by calling type-bound procedures whose names contain "create" or "init", which, as the names suggest, create and initialize instances, respectively. Each of these procedures takes an integer argument (psys_num, dinfo_num, and tree_num in the sample code) because all instances of FDPS classes are identified by descriptors in integer variables in the Fortran interface. API create_psys receives the name of a derived data type representing FP as the second argument, and it creates an instance of ParticleSystem<Tfp>, where Tfp is the name of the C++ class corresponding to the derived data type having the specified name. In the sample code, full_ptcl is specified. Note that this API accepts only the names of derived data types qualified by FDPS directive of type (a) explained in subsubsection 3.2.1. API create_tree receives the type of an instance of class TreeForForce as the second argument. The type information must be given by a single string of characters that consists of five strings of characters delimited by commas. The first field represents the type of force (long-range or short-range), which is followed by the three names of derived data types corresponding to Force, EPI, and EPJ types, and the final field shows the subtype of tree for long-/short-range force. In this example, we calculate the gravitational forces as the long-range force and use the monopole approximation. In addition, full_ptcl type is used as Force, EPI, and EPJ types. Therefore, "Long,full_ptcl,full_ptcl,full_ptcl,Monopole" is specified.
In step 3, we read particle data from a binary file, using subroutine read_IC defined in lines 178-205. In the framework of the Fortran interface, all the particle data is stored in a C++ program, which is one of the Fortran interface programs (see subsection 2.2). In order to set up the particle data from a Fortran program, we should take the following steps.
(i) Create and initialize an instance of ParticleSystem class (this is already done at step 2). (ii) Allocate memory for this instance. In the sample code, this is done by calling API set_nptcl_loc. This API allocates a memory for the instance enough to store an array of the particles of size specified by the second argument (i.e., nptcl_loc in this sample). (iii) Get the pointer to this instance. This is done by calling API get_psys_fptr. Because this API returns the pointer corresponding to the instance specified by its first argument (i.e., psys_num in this sample), we have to prepare the pointer of the same type. In the present case, the pointer for an array of full_type type should be prepared. After the pointer is set, we can use this pointer like an array. (iv) Set the particle data using the pointer. This is done in lines 199-201.
In step 4, the interaction calculation is performed through the subroutine calc_gravity, which is defined in lines 124-140. This subroutine consists of the following three operations.
(i) Perform domain decomposition. This is done by calling API domain_decomposition (line 131). The first argument of this API is an integer variable corresponding to an instance of DomainInfo class. The second one is an integer variable corresponding to an instance of ParticleSystem class, which is needed because the positional information of the particles is required to perform domain decomposition. (ii) Perform particle exchange. This is done by calling API exchange_particle (line 132). This API takes an integer variable corresponding to an instance of DomainInfo class as the first argument. The second argument is an integer variable corresponding to an instance of ParticleSystem class. This is required because the information of subdomains is necessary to perform particle exchange. (iii) Perform the calculation of interactions. This is done by calling API call_force_all_and_write_back (line 135). The number of arguments of this API depends on the type of force. In the present case, it takes five arguments because of the long-range interaction. The definitions of the first, fourth, and fifth arguments should be obvious, and hence we do not explain them here. The second and third arguments are the function pointers for the subroutines that calculate particle-particle interaction and particle-superparticle interaction. The function pointers must be obtained by the intrinsic function c_funloc, and be stored in variables of c_funptr type. In the sample code, calc_gravity_pp and calc_gravity_psp are used for the calculation of interactions.
In step 5, the time integration is performed through the subroutines kick and drift, which are defined, respectively, in lines 142-158 and 160-176. Both subroutines have the same structure: first get the pointer to the particle data stored in the C++ side of Fortran interface programs using API get_psys_fptr, and then update the particle data using this pointer.
In step 6, API ps_finalize is called. It calls the MPI_Finalize function if MPI is used.
In this section, we have described how an application program can be written using Fortran interface and have shown that researchers can develop particle-based simulation codes by writing Fortran codes only. Thus, learning the C++ programming language is totally unnecessary in order to use FDPS.

Implementation details of Fortran interface
In this section, we describe the implementation details of Fortran interface. In sub-subsection 3.3.1, we describe why we need a generation of Fortran interface. In subsubsection 3.3.2, we describe the file structure of Fortran interface programs and the role of each file.

Necessity of generation of Fortran interface
One difficulty in developing a Fortran interface to FDPS is that the template feature in the C++ programming language cannot be used directly from Fortran (Fortran is not designed to interoperate with the C++ programming language). As described earlier, the template feature is essential for FDPS to support arbitrary types of particle and interaction. Hence, we need to work around this problem in some way in order to make Fortran interface support arbitrary types of particle and interaction.
In this study, we solve this problem by making use of (i) the fact that Fortran can interoperate with the C programming language through the use of the iso_c_binding module introduced in Fortran 2003, (ii) the fact that C++ functions can be called from C programs using extern "C" modifier, and (iii) auto generation of source codes. The outline of our solution is as follows. As a result of (i) and (ii), we can make Fortran interoperate with C++ via C. Therefore, it is also possible to use an FDPS library (a set of functions to manipulate FDPS) made for a specific particle class from a Fortran code through a C interface to the library. However, what we want to realize is to enable researchers to use an FDPS library for an arbitrary particle class from a Fortran code. The only way to realize this would be to generate an FDPS library for a given particle class and use it from a Fortran code. Another requirement in designing a Fortran interface is that researchers must be able to develop codes via Fortran only so that they will not need to invest much time in learning C++. To make this possible, we must generate an FDPS library based on some Fortran data structures representing particles, instead of based on C++ classes. It is natural to use Fortran's derived data type, which is similar to structure in C, to define a particle. Thus, we adopted the following solution: first define a particle as a derived data type in Fortran. Then, generate a C++ class corresponding to this derived data type. Finally, generate an FDPS library for this particle class. The generation of required C++ classes and a library is automatically done by a script provided by us. Hence, researchers do not have to be concerned about it.
The one remaining problem is how to generate a C++ class from a given Fortran derived data type. In FDPS, C++ class representing particles must have specific public member functions such as getPos as described in section 2. Fortran 2003 supports object-oriented programming and offers classes for it, which are derived data types having type-bound procedures and are essentially the same as classes in C++. Thus, a simple way to generate a C++ class is to define a particle as a class in Fortran 2003 that has public member functions required by FDPS and to convert it to a C++ class. However, we cannot take this approach because Fortran 2003 Standard specifies that derived data types must not have member procedures/functions for them to be interoperable with C. In order to solve this problem, we introduce FDPS directives. They are used by users of FDPS to pass all information necessary to generate public member functions of the corresponding C++ classes to the auto-generation script described above. For example, the auto-generation script needs to know which member variable of a given Fortran derived data type represents the position of a particle in order to generate a public member function getPos in the corresponding C++ class. This information is passed to the script by adding FDPS directive !$fdps position to the corresponding member variable of the Fortran derived data type. In other words, our auto-generation script is designed to generate getPos in the corresponding C++ class using a member variable of a given Fortran derived data type indicated by FDPS directive !$fdps position. In this way, we can pass all of the necessary information to the auto-generation script. In subsection 3.2, we have given a rough explanation of the FDPS directive for simplicity, but this kind of thing is indeed done.
The generation of a Fortran interface can be summarized schematically as shown in figure 5: First, researchers define particles as derived data types in Fortran with FDPS directives (Step in figure 5). Next, the auto-generation script generates C++ classes corresponding to the given Fortran derived data types (Step ). Using the information given by FDPS directives, the script can generate public member functions required by FDPS. Then, the script generates an FDPS library specifically for these C++ classes (Step ). The generated library is not a template library, but a normal library that can be called in the manner of C. Finally, the script also generates a Fortran module to manipulate the generated library (Step ). Users of FDPS can use FDPS through this module (Step ).

Structure of Fortran interface programs
In this section, we explain the file structure of the generated programs and their internal structures. Figure 6 is a brief summary of the file structure of the Fortran interface programs and their roles. Four files enclosed by the dashed line (FDPS_module.F90,FDPS_ftn_if.cpp,FDPS_Manipulators.cpp,main.cpp) are the files to be generated by the script, and the file f_main.F90 corresponds to user programs. In the files enclosed by the dotted line (FDPS_vector.F90,FDPS_matrix.F90,FDPS_super_particle.F90,etc.), several derived data types are defined, which are needed to define user-defined types and interaction functions in Fortran (see also subsection 3.2). In the following, we explain the role of each interface program.
Publications of the Astronomical Society of Japan (2018), Vol. 70, No. 4 70-11 Fig. 5. Schematic illustration of our solution to enable the use of an arbitrary type of particle in a Fortran interface.
At first, we explain the roles of FDPS_Manipulators.cpp and main.cpp. Because the core part of FDPS is written in C++, all the C++ instances of FDPS classes described in section 2 (i.e., DomainInfo, ParticleSystem, and TreeForForce classes) must be created and be managed in C++ codes. This task is performed by FDPS_Manipulators.cpp, which corresponds to a library shown in Step in figure 5. For the same reason, we must place the main function of the user program in a C++ file. Thus, main.cpp is generated. It calls a Fortran subroutine named f_main(). Hence, users should prepare a Fortran subroutine f_main() and must implement all parts of the simulation code inside f_main(). In the Fortran interface, all the C++ instances created in FDPS_Manipulators.cpp are assigned to Fortran's integer variables. Hence, users also need to manage these instances using integer variables.
The file FDPS_ftn_if.cpp provides C interfaces to the functions defined in FDPS_Manipulators.cpp. These C functions can be called from a Fortran program using the functionalities provided by module iso_c_binding in Fortran 2003, as described in sub-subsection 3.3.1. The file FDPS_module.F90 provides a class in Fortran 2003, named fdps_controller, for users, which is used to call the C interface functions described above. All of the FDPS API for Fortran are provided as public type-bound procedures of this class. Therefore, in order to use the FDPS API, we first need to create an instance of this class, and then call its type-bound procedures.
As described above, the particle data is stored in FDPS_Manipulators.cpp as explained above. It is actually an array of C structures representing particles (cf. Step in figure 5). Fortran module iso_c_binding enables us to access it if a derived data type in Fortran corresponding to this C structure is interoperable with C. The Fortran interface makes use of this functionality, and therefore we require that all derived data types that will be used in FDPS Downloaded from https://academic.oup.com/pasj/article-abstract/70/4/70/5036218 by Kobe University Library user on 14 July 2020 must be interoperable with C. We also use the functionalities of module iso_c_binding to pass interaction functions to FDPS_Manipulators.cpp. To take this approach, interaction functions must be implemented as subroutines in Fortran and they must be interoperable with C. This is because subroutines are passed in the form of the C addresses (i.e., function pointers in C), which can be obtained only if the subroutines are interoperable with C. These are the reasons why we require C interoperability in subsection 3.2.

Performance of application developed by Fortran interface
In this section, we present and discuss the performance of an application developed by Fortran interface to FDPS.
Here, we mainly focus on the overhead of the FDPS APIcall from a Fortran application because all the tasks of the API are processed in its C++ core and it is expected that the times required to process these tasks do not depend on development language. To this end, we compare the difference between the performances of Fortran and C++ codes for N-body simulations. The performance measurements are carried out in different computer systems to check the dependencies of the performance on CPU architecture and compilers. Table 1 lists the computer systems and the compiler information used in the measurements. The Fortran code used in the performance measurement is essentially same as the sample code described in subsection 3.2 (figure 4), except for the following two differences. First, we apply standard optimization techniques to interaction functions. Figure 7 shows a Fortran subroutine for the gravitational interaction between particles. In the subroutine, the information of j-particles are stored in local arrays (lines 11-16), and all the calculations in the innermost loop are described using arrays (lines 25-39). These modifications are aiming to facilitate compilers to optimize the code more easily. We use almost the same subroutine for the interaction between particles and superparticles. For a fair comparison, in the C++ version of the N-body simulation code, we use a function with the same structure, which is shown in figure 8. Secondly, we rewrite all the operations between derived data types using their member variables explicitly. This is because in the combination of Intel Xeon Phi and Intel compilers the executable file which is generated with the highest optimization compile flags does not perform some numerical operations correctly. We believe that this is a problem on the compiler side and this situation will be improved by future upgrade of the compiler. For now, however, we recommend users of Fortran interface to follow this prescription.
Both of the (Fortran and C++) codes solve the cold collapsed problem, which is one of the standard test problems in N-body simulation codes. Most of the calculation times are spent on the interaction calculation part, and the overhead due to Fortran interface may be hidden by this part. In order to clarify the overhead cost, we measure the performances for the following two cases: One is the case when the interaction functions are empty (empty cases) and the other is the case when the interaction functions shown in figures 7 and 8 are used (normal cases). Both codes are executed with a single core. In this case, the APIs for domain decomposition and particle exchange do nothing and just increase the overhead. The upper panels of figure 9 show the wall-clock time per step for different numbers of particles per processes for each computer system for the former case, while the lower panels show those for the latter case. As shown in the upper panels, the calculation times in both codes are almost the same for all computer systems we used. Relative differences of the wall-clock times between both codes are ≈5%. Thus, the overhead of Fortran interface is sufficiently small. The overall performances also show a similar behavior, as shown in the lower panels of figure 9. Downloaded from https://academic.oup.com/pasj/article-abstract/70/4/70/5036218 by Kobe University Library user on 14 July 2020 Compared to the case of empty interaction function, the relative differences become small in each computer system. Thus, a code written by Fortran interface shows a nearly identical performance to that written by C++.

Example of practical application
In order to demonstrate its ability, we perform a large-scale global planetary ring simulation using a code developed by Fortran interface. Ring particles around a planet interact with each other through mutual gravity and inelastic collision. In the code, we calculate the gravity using the Tree method with an opening angle criterion of θ = 0.5. The inelastic collision is modeled using the soft sphere model following Salo (1995) and Michikoshi and Kokubo (2017).
The initial condition is taken from Michikoshi and Kokubo (2017), who investigated the dynamics of two narrow rings around Centaur Chariklo through changing the radius and mass of ring particles. Among their models, we consider the case of r p = 5 m and ρ p /ρ C = 0.5, where r p and ρ p are the radius and density of ring particles, respectively. The density of Chariklo is ρ C = 1.0 g cm −3 . For simplicity, we consider the inner ring only, which is located at a distance of a = 390.6 km from the center of Chariklo. The radial width and optical depth of the ring are 6.7 km and 0.38, respectively. We model the ring using 79557408 particles. With these parameters, it is expected that self-gravitational wakes form in the ring in a dynamical time (Toomre 1964). The simulation is performed using 1088 MPI processes and four OpenMP threads on Oakforest-PACS.  Figure 10 show the distribution of ring particles at t = 10 t Kep , where t Kep is the orbital period at the distance of a. The particle distribution closely resembles that of Michikoshi and Kokubo (2017) as expected (see their figure 1).
Based on our experience, a great deal of time was not needed to complete this application. Thus, using Fortran interface, researchers can concentrate on their study without spending a lot of time developing simulation codes.

Summary
In this paper, we have presented the basic design and the implementation of the Fortran interface layer of FDPS, which is newly introduced in version 3.0. This layer provides API for Fortran and supports arbitrary types of particles and interactions as with the C++ core part of FDPS. This is realized by means of both a feature introduced in Fortran 2003 and auto-generation of interface programs. The Fortran interface also supports almost all the standard features of FDPS. Using Fortran interface, researchers can develop massively-parallel particle simulation codes in Fortran.
We also have presented the performance of an application developed by Fortran interface. By comparing it with the performance of a C++ version of the application, we have shown that the overhead cost due to Fortran interface is sufficiently small and the overall performances of both codes are very similar to each other.