ECHO : A hierarchical combination of classical and multi-agent epistemic planning problems

The continuous interest in Artificial Intelligence (AI) has brought, among other things, the development of several scenarios where multiple artificial entities interact with each other. As for all the other autonomous settings, these multi-agent systems require orchestration. This is, generally, achieved through techniques derived from the vast field of Automated Planning . Notably, arbitration in multi-agent domains is not only tasked with regulating how the agents act, but must also consider the interactions between the agents’ information flows and must, therefore, reason on an epistemic level. This brings a substantial overhead that often diminishes the reasoning process’s usability in real-world situations. To address this problem, we present ECHO , a hierarchical framework that embeds classical and multi-agent epistemic (epistemic, for brevity) planners in a single architecture. The idea is to combine (i) classical; and(ii) epistemic solvers to model efficiently the agents’ interactions with the (i) ‘physical world’; and(ii) information flows, respectively. In particular, the presented architecture starts by planning on the ‘epistemic level’, with a high level of abstraction, focusing only on the information flows. Then it refines the planning process, due to the classical planner, to fully characterize the interactions with the ‘physical’ world. To further optimize the solving process, we introduced the concept of macros in epistemic planning and enriched the ‘classical’ part of the domain with goal-networks . Finally, we evaluated our approach in an actual robotic environment showing that our architecture indeed reduces the overall computational time.


Introduction
In recent years, the field of cognitive robotics experienced significant progress, both from the academic (see [7] for a detailed survey) and the industrial point of view (e.g.[11,36,50]).While most of this attention is directed to frameworks designed for scenarios where only a single entity interacts with the environment, multi-agent settings are gaining more and more attention [33,43].However, most of the tools envisioned for multi-agent systems do not focus on modelling the agents' information f lows, i.e. they do not consider the epistemic aspects of the autonomous reasoning process.In fact, the intrinsic difficulty of dealing with these concepts [10] often brings a far too great overhead so that 'real-world' tools could actually exploit them.Nonetheless, as hinted in [5], considering the knowledge/beliefs of the entities involved in a multi-agent scenario should be the natural option as it is the most accurate representation of the interactions between independent agents.This is why, starting from a real-world setting with two Franka Emika manipulators as agents, we decided to model a framework that could arbitrate them while considering the epistemic aspects of the domain and maintaining acceptable performances.After the first iteration, we then generalized the architecture to be able to deal with different scenarios that can be formally described with specific action languages.
In particular, we envisioned and developed ECHO (EpistemiC Hierarchical sOlver): a hierarchical planning framework that coordinates a set of autonomous agents.This set-up is able to reason on complex multi-agent epistemic concepts while dealing with the computational issues that arise from the inherent complexity of epistemic reasoning.This is achieved by combining techniques from the Multi-agent Epistemic Planning (MEP) field and the more efficient classical planning approaches.Due to this combination, our architecture can benefit from the generality derived from MEP solvers and the efficiency of the classical ones.Broadling, our planner considers both an epistemic and a classical description of the planning problem and combines the two solving techniques in a hierarchical way.As we will see later (Section 3) in much greater detail, the architecture first solves the problem with greater abstraction using the epistemic solver.Then, the classical resolution process is used to refine the world-altering actions suggested by the epistemic counterpart.
While ECHO is an architecture that is independent of the specific planners, in this work we present a specific version of our tool.In particular, we decided to define two distinct classical planners, both implemented through the multi-shot Answer Set Programming (ASP) paradigm [23], to tackle the refined reasoning part.The former is a standard solver that employs a search without any heuristics, while the latter exploits the idea of goal-networks, as introduced in [46], to take care of the classical planning problems.The idea is that the goal-network one is used when we have extra information on the domain, as often happens, which can help in selecting the right decomposition of the current goal in a set of partially ordered sub-goals.Instead, to tackle the MEP problem, we employed EFP [17], an off-the-shelf reasoner that can comprehensively reason on epistemic domains.
Finally, inspired by [6], we also propose the concept of macros in the MEP environment.The use of macros helps in reducing the burden of the epistemic planning process, allowing ECHO to address even more multi-agent planning scenarios.In fact, such domains present several complications, e.g. the number of possible actions, which if not addressed correctly could render the solving process unfeasible.
To summarize, in this paper we present ECHO, a hierarchical planning architecture to efficiently solve MEP problems.This, in turn, is comprised of (i) an off-the-shelf epistemic reasoner, of which we enriched the solving process with the concept of macros; and(ii) two ASP-based classical planners, one of which exploits accumulated domain knowledge, due to the notion of goal-networks, to guide the search process.The contributions are coded in a python library that is able to decode planning problems in various formats and solve them with the ECHO architecture.In particular, this API is available from the URL: http://clp.dimi.uniud.it/sw/similar in spirit to the Unified Planning Framework of the AIPlan4EU project [1]; it is able to tackle (1) classical planning problems (whether enriched or not with goal-networks as defined in [46]); (2) MEP problems, as defined in [19] and (3) hierarchical MEP problems as defined in this work.
The remainder of the paper is structured as follows: (i) Section 2 provides some background to better understand the various contributions of the paper;(ii) in Section 3 we propose our contributions.In particular, we present the definitions of our architecture and of its principal components, i.e. the ASP solvers and the epistemic solver enriched with the idea of macros;(iii) Section 4, to better illustrate the behaviour and the capabilities of ECHO, presents the case study that motivated our work;(iv) in Section 5 we validate the behaviour of our framework through experimental results.Alongside the results, we also provide some explanatory examples on how to employ ECHO.(v)Finally, in Section 6, we conclude our manuscript by summarizing its main contributions and by discussing some future works.

Classical Planning
The area of automated planning is one of the most prominent in Artificial Intelligence (AI).This field studies how to devise tools that help us in deciding the best course of actions to reach a given goal, an activity that is done continuously in our life.Automated planning, therefore, represents one of the most interesting aspects of AI and, consequently, has been vastly studied [30,41,44,45].
In its 'basic' form, planning is usually referred to as classical planning.This setting represents the most known and studied variation of automated planning and has been continuously improved since its early life in the '60s.The initial research was focused on ways to formulate the problem of generating long-term plans for achieving goals, for problems of non-trivial size, in a computationally feasible manner [9].In order to achieve such feasibility and have tractable and approachable problems, classical planning must consider constrained environments, i.e. they have to be (i) singleagent: only one entity is acting upon the world;(ii) static: the environment is not subjected to external variations but can only be modified by the acting agent;(iii) deterministic: each action executed by the agent must have at most one possible outcome; and(iv) fully observable: every action executed must be witnessed by the acting agent itself [28].
An example of a classical domain, which respects all the aforementioned conditions, is the wellknown Blocksworld (BW) domain [29].Blocksworld, due to its simplicity, is one of the most employed examples when it comes to explaining the basics of planning.This domain consists of a few simple elements: -blocks of the same size that can be placed either on the table, or on top of another block; and -a mechanical arm-i.e. the acting agent-that can move the blocks and can determine whether it is holding a block or not.
Moreover, there are some constraints that regulate BW: -the mechanical arm can only hold, and therefore move, one block at the same time; and -a block can only be placed on top of a clear block-a block with no blocks on top of it and that is not held by the mechanical arm-or on the table.
In this domain, we have that a single acting agent, i.e. the mechanical arm, wants to move the blocks around in order to achieve a specific disposition of those.This, very informal, specification constitutes a planning problem in the classical setting where the properties of the world, e.g. the status of the blocks and the arm, are defined through logical statements called f luents.More formally a planning problem is defined as follows:

DEFINITION 1 (Planning Problem).
A planning problem is a tuple D, I, G where: -D is an action domain expressed in some language: i.e.D describes the properties of interest of the environment in which the agents are acting upon and also specifies how the agents themselves can manipulate these properties through actions; -I is a set of states of the domain-called Initial state-that describes the diverse (possible) starting configurations of the world.The initial state of BW is the position of the blocks and of the mechanical arm before the plan execution.-G is a set of states of the domain-called Goal state-that describes the desired configurations of the domain.In the case of BW, this identifies the disposition of the blocks that we want to reach.
Let us note that I and G are described as sets to keep the definition general.In fact, in various planning domains, we can have multiple initial states, due to partial information for example, as well as multiple desired states.As it is not in the scope of this paper to provide a complete explanation of the broad field of classical planning, we address the interested readers to [9,28,45] for a complete introduction on the topic.

Goal-networks
As mentioned, classical planning has been studied since the birth of AI.This extensive research has brought, among other things, very interesting and powerful ways to exploit any knowledge that we might have on the domain we want to plan on.In fact, while planning on specific scenarios, it is often the case that our domain knowledge is richer than what we simply write in the formal description of the problems.Even if solvers, in general, should be able to plan without any 'additional' knowledge, not exploiting this extra information would be a waste of resources.From this idea, the whole area of informed planning [45] was born.This specific field of study researches ways to exploit any additional knowledge-being it dependent or independent from the domain itself-we might have to enhance the solving process.
Using the previously cited Blocksworld as an example, we are aware that some conditions must be met before others when we want to achieve certain goals.For example, if we want to reach the configuration where block A is on the table, we are aware that the condition clear block A must be achieved before picking A and placing it on the table.Therefore, if the condition clear A is not yet true, we first have to free the blocks placed on top of block A. This kind of reasoning is elegantly captured by goal-networks, which, in turn, can be introduced in the planning domain.Informally, a goal-network is a partially ordered set (poset, for brevity) of goals, where each goal is a pair: goal id and creation time.The goal id is associated with a specific consistent conjunction of Boolean literals.The creation time is used to disambiguate between goals with the same id.
The precedence relation in the goal-network guides the search for a plan, i.e. at each step we would like to execute an action that leads us to the satisfaction of a goal, chosen among the minimal ones, i.e. among the goals that have no predecessors in the goal-network.Once a chosen goal is satisfied in the current state of the world, it can be deleted from the goal-network, and another minimal goal is selected.Sometimes, instead of executing an action, we may attack the current chosen minimal goal by the application of a method, which consists of the introduction of new goals that have a greater level of precedence with respect to the chosen goal.More formally, we refer to the definition of goal-network following [46].DEFINITION 2 (Goal-network).Let F be a finite set of propositional atoms also known as f luents, L = {f : f ∈ F}∪{¬f : f ∈ F} be the set of literals obtained over F and G id be a set of strings.A goal-network GN is a tuple G, ≺, α where: -G ⊆ G id × N is a finite set of goals.
-≺ ⊆ G × G is a partial order over G.
α : G id → 2 L is a function such that, for every goal id g id ∈ G id , α(g id ) is a conjunction of literals in L. Given a goal g = g id , t , the intended meaning of the goal formula α(g id ) = ϕ is that, in order to consider g satisfied, ϕ must hold in the current state.t represents the time at which the goal g was added to the poset.
Given a goal-network GN = G, ≺, α , a planning domain D = A, M consists of an action domain A enriched by a set of methods M. A method m is a triplet head, subgoals, prec where subgoals(m) = g 1 id , . . ., g k id is a sequence of goal ids, while prec(m) is a conjunction of literals, expressing the precondition for the method to be executed.We say that the post-conditions of m are post(m) = α(g k id ) if subgoals(m) is not empty, otherwise post(m) = prec(m).A method (or an action a) is relevant to a goal formula ϕ G = α(g id ) if post(m) (or effects(a), respectively) has a non-empty intersection with ϕ G and if post(m) (or effects(a), respectively) does not contain any negated literal from ϕ G .
The idea is that a method (or an action) can be applied (or executed), only when it is relevant to a current minimal goal.The application of a method m at step t consists of the injection into the current goal-network of the goals {(g id , t) | g id ∈ subgoals(m)}, with a higher priority in terms of precedence relation, compared with the already existing goals in the goal-network.
As in [46], let us define the set of solutions inductively.
DEFINITION 3 (Solution of the goal-network Planning problem).Given a goal-network problem P with a goal-network GN = G, ≺, α , a planning domain D = A, M , and a current state s t at time step t: -If G = ∅, then the empty plan is a solution.
-Otherwise, let g = g id , _ ∈ G be a goal in GN without predecessors, and F = α(g id ) be goal formula to satisfy.
• If F is already satisfied in s t , let P be the problem that results from removing g from GN; if π is a solution to P , then π is a solution to P. • Let a be an action that is relevant for α(g id ) and executable in s t .Let P be the problem obtained from P and the updated initial state s t = Φ(s t , a) with the time step If π is a solution of P then a, π is a solution to P. • Let m a method relevant to g.Then, the set of solutions for P includes the set of solutions for P , in which GN has been replaced by GN , resulting from the application of m to g at time step t.
Let us observe that only actions or methods relevant to the current chosen goal can be executed.This means that the goal-network setting constraints the search for the plan more compared with ordered landmarks [32], for instance.While several other studies exist that also take into the idea of goals preference, e.g.[27], we leave the investigation of such topic for future works.

2.1.2
The A V language In this section, we brief ly introduce the A V action description language, which has been designed to be integrated into ECHO to define classical planning problems.It essentially corresponds to language A, but some typed variables have been introduced (see [25] as a reference for the nomenclature of planning languages).Typed variables are merely used to write schemes describing finite sets of causal laws that are formed according to the same pattern.Schemes are particularly useful in some application domains, such as when an action that models an agent's movement can be parameterized by a variable that can be instantiated with a series of different positions.
In ECHO the A V planning instances are translated into Answer Set Programming (ASP) instances.ASP is one the most prominent logic programming paradigms and it is particularly useful in knowledge-intensive applications (see [40] for an introduction to its semantics).Its use as a planning framework has been deeply explained in [16].
The choice of encoding the planning problems in ASP, already presented in [12], is justified by the fact that ASP as a planning engine outperforms standard task planner for PDDL, where (i) domains are rich in f luents; and (ii) plans are usually short [35].Such characteristics precisely characterize the situations in which we intend to leverage the classical planner in the robotics domain use cases.Moreover, ASP, due to its declarative nature, lets us easily debug any functionality and test new features while also allowing for formal proof of their correctness.

Multi-Agent Epistemic Planning
Even if we often need to make decisions based on our beliefs, about the environment, and about others' beliefs, automated planners usually do not consider such intricacy.In fact, as said before, most of the efforts in the planning community address domains where the concept of beliefs and/or knowledge is not taken into account.Nonetheless, the growing interest in AI is pushing researchers to steadily improve and model more realistic scenarios.This momentum brought, among other things, to the formalization of a far more compelling (w.r.t.more classical approaches) form of planning, i.e. the so-called MEP.This area of planning reasons within environments where the streams of 'knowledge' or 'beliefs' need to be considered.
Formalizing and reasoning on the idea of knowledge and beliefs have always been of great interest among various research fields (e.g.philosophy, logic and computer science).In particular, in 1962, Hintikka proposed the first complete axiomatization of these concepts [31].From this initial effort stemmed the field of epistemic/doxastic logic which aims to formalize and reason on information itself.While this area of research presents various challenges, it is only focused on capturing the knowledge relations in static domains.To represent even more interesting and realistic scenarios Dynamic Epistemic Logic (DEL) was introduced, i.e. the logic of reasoning on information f lows in dynamic domains where agents can act and alter these relations themselves.
DEL represents the foundation of MEP, the setting concerned with finding the best series of actions that modifies the information f lows, to reach goals that (might) refer to agents' knowledge/beliefs.In what follows, for brevity, we will use the term 'knowledge' to encapsulate both the notions of an agent's knowledge and beliefs.In fact, these concepts are captured by the same modal operator in DEL and their difference resides in structural properties that the epistemic states respect (see [21] for more details).As it is not the objective of this paper to completely present MEP, in what follows we will provide only some fundamental concepts that are necessary to explain the contribution of this work.Far more complete introductions to this topic may be found in [4,21,51].
Let us start by presenting the language of well-formed DEL formulae used to express agents' knowledge.This is expressed as follows: where f ∈ F is a propositional atom called f luent, i is an agent that belongs to the set of agents AG s.t.|AG| ≥ 1 ϕ and ψ are belief formulae and ∅ = α ⊆ AG.A f luent formula is a DEL formula with no occurrences of modal operators.A belief formula is recursively defined as follows: -A f luent formula is a belief formula; -If ϕ is a belief formula and i ∈ AG, then B i ϕ (i knows/believes that ϕ) is a belief formula where the modal operator B captures the concept of knowledge; -If ϕ 1 , ϕ 2 and ϕ 3 are belief formulae, then ¬ϕ 3 and ϕ 1 op ϕ 2 are belief formulae, where op ∈ {∧, ∨, ⇒}; -If ϕ is a belief formula and ∅ = α ⊆ AG then C α ϕ is a belief formula, where C α captures the Common knowledge of the set of agents α.
The formula C α ϕ translates intuitively into the conjunction of the following belief formulae: -every agent in α knows ϕ; -every agent in α knows that every agent in α knows ϕ; and -so on ad infinitum.
The semantics of DEL formulae is traditionally expressed using pointed Kripke structures [38], but other representations are also possible [17,26].We refer the interested reader to [8,21,26] for a comprehensive introduction to how Kripke structures, or similar formalisms, are used to capture the idea of an epistemic state and how the concept of entailment is defined.
Let us note that the epistemic action language that we will consider in our work implements three types of action and three observability relations.These are standard concepts in the epistemic planning community and, therefore, we will provide an intuitive description of those addressing the interested reader to [4] for a complete description of this topic.In particular, we assume that each agent can execute one of the following types of action: -World-altering action (also called ontic): used to modify certain properties (i.e.f luents) of the world.-Sensing action: used by an agent to refine her beliefs about the world.
-Announcement action: used by an agent to affect the beliefs of other agents.
Moreover, each agent is associated with one of the following observability relations during an action execution: -Fully-observant: the agent is aware of the action execution and also knows the effect of the action.-Partially-observant: the agent is aware of the action execution without knowing the effects of the action.Let us note that no agent can be partially observant of an ontic action as it is impossible to decouple the witnessing of a world-altering action and the witnessing of its effects.-Oblivious: the agent is not even aware of the action execution.
Each type of action defines a transition function and alters an epistemic state in different ways.Given the complexity of the topic we address the reader to [4,17,18] for a formal definition of these, and others, update functions on diverse epistemic state representations.
Finally, let us introduce the concept of MEP domain that, intuitively, contains the information needed to describe a planning problem in a multi-agent epistemic setting.

DEFINITION 4 (MEP Domain).
A MEP domain is a tuple D = F, AG, A, ϕ ini , ϕ goal , where F, AG, A are the sets of f luents, agents, actions of D, respectively; ϕ ini and ϕ goal are DEL formulae that must be entailed by the initial and goal e-state, respectively.The former e-state describes the domain's initial configuration, while the latter encodes the desired one.
We refer to the elements of a domain D with the parenthesis operator; e.g. the f luent set of D is denoted by D(F).An action instance a i ∈ D(AI) = D(A) × D(AG) identifies the execution of action a by an agent i.Let D(S) be the set of all possible e-states of the domain.The transition function Φ : D(AI) × D(S) → D(S) ∪ {∅} formalizes the semantics of action instances (the result is the empty set if the action instance is not executable).

Answer Set Programming
We brief ly introduce the syntax and semantics of Answer Set Programming, a dialect of logic programming widely used in knowledge representation and reasoning (see, e.g.[24]).
Let P be a set of predicate symbols, F be a set of constant and function symbols and V be a enumerable set of variable symbols.ar(•) is a function applied to predicate and function symbols that return the number of arguments.Terms are defined recursively as usual: a variable is a term; if f ∈ F, such that ar(f ) = n ≥ 0, and t 1 , . . ., t n are terms, then f (t 1 , . . ., t n ) is a term.Let us observe that if c ∈ F, ar(c) = 0, then c is a term (a constant).If p ∈ P, where ar(p) = n ≥ 0, and t 1 , . . ., t n are terms, then p(t 1 , . . ., t n ) is an atomic formula (or atom).Let us observe that if p ∈ P, ar(p) = 0, then p (without arguments) is an atom, referred as propositional atom.A literal is either an atom A or its negation (as failure) notA.An ASP rule is of the form where A 0 , . . ., A n are atoms.A 0 is referred to as the head of the rule r, while the set of literals Variables possibly occurring in r are intended as universally quantified.For instance the program define the predicate sibling: for all X,Y,Z if Z is parent of x and of Y and X and Y are not the same person (they are not equal) then X and Y are siblings.These universal properties are intended to hold for all the terms the program deals with.Thus, since the constants a,b,c,d are used in the facts defining the predicate person, then the above rules can be instantiated: This process is called grounding and it transforms a first-order program into a propositional program (atoms without variables, i.e. ground, can be seen as propositional atoms).In what follows, we will focus on programs that have a finite grounding.
If a program does not use negated literals in rule bodies (definite clause programs), then its semantics is the minimum model semantics (that corresponds to the sets of atoms that hold in all logical models of the program).This set can be computed in polynomial time.In case of ASP programs using negation, the notion of minimum model does not make sense.For instance, the program admits the two independent minimal models {p(a)} and {q(a)}.The semantics of ASP programs is given in terms of answer sets; intuitively, an answer set S is a minimal model of the program that supports each true atom-i.e.assuming that the atoms in S holds and the others do not hold, for each atom in S there is a rule of the program having such atom as head and whose body is satisfied by S.
The precise definition is given in a guess and verify style: a set of ground atoms S is an answer set of a program P if S is the minimum model of the reduct program P S , which is obtained from P and S as follows: • Remove from P all rules r such that there is a negated literal not q in the body of r and q ∈ S; • Remove all negated literals from the remaining rules.
By construction, P S is a set of rules that do not contain any occurrence of negation and, as recalled above, it admits a unique minimum model.
Given an ASP program, an ASP-solver is used to compute its answer sets.ASP solvers typically work in two stages: first, the program is grounded, then an answer set of the ground version is looked for.In this paper we will use the ASP solver Clingo [22].

The ECHO System
In this section, we will present the original contribution of our work which is, essentially, threefold.We will start by presenting the overall architecture, in Section 3.1, to then illustrate its components with their relative novelties.In particular, Section 3.2 introduces the classical planners, one that uses a standard search process and the other that exploits the goal-network formalism, based on ASP.Finally, in Section 3.3 we describe how we implemented the idea of macros in MEP, within the planner EFP [17].

The Architecture
Let us now describe the general functionality of the ECHO system, which is the main contribution of this work.The complete code, alongside some working examples and several guides on how to use it, can be found at the following link: http://clp.dimi.uniud.it/sw/.The main object of our framework is to provide a tool that supports MEP and that is able to reason on this problem within acceptable times.We decided to accomplish this through a combination of MEP and classical planning techniques.In particular, ECHO combines these two strategies in a hierarchical way.To be more precise, our architecture first employs MEP for generating tasks-actions that have a certain degree of abstraction w.r.t.their functionality in the real world-and then it exploits classical planners to break down these tasks into their refined components.
Once again, we can use the Blocksworld domain to exemplify the difference between the level of abstractions that the epistemic and classical components of ECHO can consider.In particular, if we imagine modelling a 'real-world' version of such domain, where an actual robotic arm has to move blocks the task move block A on top of B would be comprised of much more detailed actions: (i) move the arm on the position of A;(ii) pick-up A;(iii) move the arm on the position of B; and(iv) release A on top of B. This final set of instructions, when converted into the right formalism (MoveIt!commands), represents the atomic actions that the arm should follow in order to execute the plan properly.Nonetheless, when we are planning on the epistemic level we are not interested in such a level of detail but rather in the information f lows.For this reason, we introduced the idea of hierarchical planning that allows us to plan on different levels of abstraction.This means that when we are concerned with the epistemic part of the planning we can simply assume that the agent has the ability to move blocks correctly to reduce the computational resources needed by such an intricate process.Once the MEP problem is solved then ECHO can refine all the ontic tasks by finding plans that are actually comprised of atomic actions and can, therefore, be used to manoeuver the arm.To summarize, we employ a hierarchical combination of the two solving techniques (i.e.classical and epistemic) to avoid unfeasibility in the planning process for domains rich in f luents and actions.The key idea is to abstract most of the domain intricacy from the epistemic level and handle it at the classical level, when it is possible, given its vastly superior performances.
Let us now explain in greater detail how the ECHO system processes, solves and eventually executes a planning problem.To better visualize the overall framework, we present a graphical classbased representation of it in Figure 1.In the scheme, we also included the 'actuators' because, as we will see later, we implemented a pipeline that directly sends instructions to actual robotics arms and executes the plan in a real-world domain.While what follows will explain the complete scheme, let us note that when we want to use ECHO in a simulated environment we just need to skip the interactions with the actuators.
First of all the master process reads the domain and problem descriptions, which contain the initial state description.This description is specified using a Python encoding that allows defining initial states, goal states, f luents and both epistemic and classical actions, everything using a coherent syntax.ECHO then generates the epistemic planning instance, specified in E-PDDL [20], and sends it to an epistemic planner, i.e.EFP [17], which is employed as a black box and returns the sequence of epistemic tasks to be executed.After this first resolution, each task is properly processed following Listing 1.1 to be then converted into a classical planning problem instance.The first step is to break down, or to f lat, macros (introduced in Section 3.3), then we can reason on a sequence of no macro tasks, hereinafter called simple tasks.Intuitively, a simple pure epistemic task can be directly executed.Simple ontic tasks that alter the physical world will undergo further processing.To break -the initial state is the classical initial state description for the first run of the classical planner, otherwise, it coincides with the final state reached by the previous run of the classical planner; -the goal is extracted from the effects of the ontic simple task if you run as a sub-routing the plain classical planner.Otherwise, if the goal-network classical planner is used, an initial poset of goals to satisfy is added.This is possible since the library allows to enrich simple ontic tasks with a poset of goals.
Then, iteratively, the master process sends the classical action tasks to the actuator process, which translates them into MoveIt!commands, and makes finally the robots execute the movement.
Pseudo-code for the main steps of ECHO.Let us stress, while our architecture has been devised to tackle hierarchical epistemic problems, the generality of the master process allows ECHO to 'understand' several scenarios.In particular, our framework can tackle: (i) 'pure' classical planning problems that may also be enriched with the knowledge to create goal-networks.In our specific instance of ECHO the resolution of these instances is delegated to the solvers presented in Section 3.2.Nonetheless, these solvers can be easily swapped with alternatives from the literature, e.g.Fast Downward [30], if needed.(ii)Epistemic planning domains where there is no component that must be refined, i.e. all those problems where the classical planning component of ECHO is not employed.The domains can be solved whether it is enriched with the knowledge to 'create' macros (formally presented in Section 3.3), to speed up the solving process, or not.(iii)Finally, as expected, ECHO is able to solve problems defined in the epistemic settings that have 'abstracted' ontic actions that need to be refined by a classical solving process.This type of problem can contain the information to create goal-networks or epistemic macros even if it is not necessary.LISTING 1.1 Pseudo-code for the main steps of ECHO.

Answers Set Classical Planning
The idea of Answer Set Programming is to represent a given computational problem by a logic program whose answer sets correspond to solutions, and then use an answer set solvers, such as clingo [23], to effectively compute a solution.Answer set planning was first introduced in [39] and has soon received particular interest among researchers in recent years as illustrated in [48].In what follows we will present a high-level description of the ASP encoding used to define the models to compute plans for single-agent actions.We will begin by introducing the plain ASP planner and then the one that uses the goal-networks formalism.

Plain ASP Planner
We begin the description of the encoding by highlighting the most important predicates and describing their meaning: holds(F,T): where F is the f luent F, or its negation, true at step T; -A(T): the action A occurs at step T; -A(T): the action A is executable at time step T.
In order to make the encoding more readable, we divide it into two sets of rules: the first one is domain-dependent, while the second one is present in each encoding.From a notation point of view, P = D, Γ , G is the instance written in A V , and π(P) is its encoding in ASP.To avoid unnecessary clutter we will report simplified versions of the rules and only those that have not-straightforward meaning.For example, we will not report the types of variables when clear from the context.
where t i is the type of the variable and limits the range of values it can be instantiated with.
-For each initially f in Γ s.t.f is a positive f luent literal we have the rule holds(f, 0).-For the goal achievement, for each finally(f), we add the fact: goal(f )

Domain-Independent Rules
-We define the opposite of a f luent as -The actions executability is expressed with the constraint: -The inertia is guaranteed with the rules: -Only one action occurs at each step: -We satisfy executability with the following constraint: Note that all the rules that contain the term t belong to the subprogram #program step(t).We also add the following subprogram #program check(t) to check the satisfaction of the goal: The length for the trajectory solution is t.This compilation exploits clingo multi-shot features [23].

Goal-Network ASP Planner
In order to integrate the goal-network concept into the ASP planner we enrich the previous compilation with the following predicates: -selected_goal(G,T1,T2): which is the selected goal among those minimal.
relevant(M,T): the method M is relevant for the selected goal at time step T.
relevant(A,T): the action A is relevant for the selected goal at time step T.
-goal_to_sat(G,T1,T2): where T1 is the time of the creation of the goal, T2 is the current time and G is the id of the goal.We need the creation time because the same goal may appear more times in the same goal-network.-prec_to_sat(G1,T1,G2,T2): where T1 and T2 are creation times, this predicate encodes the precedence relation between introduced goal (G1,T2) and (G2,T2).
We also enriched the sets of domain-dependent and -independent rules with the following rules.

Domain-Dependent Rules
-For each goal G labelled with a set of literals {l 1 , . . ., l n }, we introduce the fact: goal_labeled(Gl i ) for each i. -For each method M that, upon execution, enriches the current poset with new goals and precedence relationships to be fulfilled, we introduce -For each method m(X 1 , . . ., X n ) we introduce the fact: method(m(X 1 , . . ., X n)) , and specify its executable conditions.

Domain-Independent Rules
-We exploit the ASP's expressive features to identify the goals that are presently minimal concerning the precedence relation within the current poset of goals to be satisfied: -We introduce a predicate to select a goal among the minimal ones: -In order to execute a method or an action, we must guarantee that it is relevant for the selected_goal.
• For methods we introduce the following rules: • For actions we introduce the following rules: -For the sake of readability we omit the rules for the goal_to_sat inertia.We just mention that when a goal is satisfied in a state, it is no more propagated.-We have to model also what happens when a method occurs: -We guarantee that at most one operation among methods and actions is executed each time.
It may happen that no operation occurs at all when we select a goal such that it is already satisfied in its current state.In this case, we do not propagate this goal to the next step.
As in the previous encoding, all the rules that contain the term t belong to the subprogram #program step(t).To check the satisfaction of the goal we also add the following subprogram #program check(t): This states that after the execution of the last action or method, we do not want any goal in the current goal-network, i.e. there is no goal to satisfy.

Macros in MEP
As already mentioned, a significant contribution of this work is the formalization, and consequent employment, of macros in the MEP setting.A macro, which can be informally described as 'an encapsulated sequences of elementary planning operators', is formally defined as follows: DEFINITION 5 (Macro).Let D(i), D(j) ∈ D(AI) be two action instances, and s ∈ D(AG) be an e-state of a given a domain .A macro m i,j ∈ D representing the subsequent execution of j after i can be defined as Φ(j, Φ(i, s)).Let us note that i) we assume that if action a is not executable in a state s, then the result of the update Φ(a, s) is ⊥; and ii) the execution of any action a over ⊥ results in ⊥ as well.The introduction of macros is justified by the fact that, often, patterns of actions performed in sequence are repeated in the same domain.For example, it often happens that an ontic action is followed by an announcement action, because an agent may want to communicate the results of the former to another (oblivious) agent.For now, as pointed out in the Dagstuhl seminar [3], there have been many proposals in the classification of epistemic actions.As pointed out above, the community mostly agrees that the basic classification considers ontic, announcement and sensing as possible categories.Among these three types of actions, only the ontic one has some effect on the physical world.That is why we consider the remaining two, i.e. sensing and announcement, as purely epistemic actions.Consequently, we say that a task is purely epistemic if it involves only announcement or sensing actions, or a macro aggregating these two types.In Listing 1.1 the function call is_pure_epistemic_task(task) checks exactly this property.

Case study
We can now present an application of ECHO in a two-agent scenario.The framework here presented has been validated in a simple multi-agent scenario in which two Franka Emika robots have been involved.Franka Emika is a robotic arm with 7 Degrees Of Freedom (DOF) with torque sensors at each joint.It is also equipped with a gripper that allows the automated 'arm' to handle objects.The client side of the Franka Control Interface is called libfranka.At a higher level we find franka_ros, which is an ROS [42] package that contains the description of the robot and the end-effector in terms of kinematics, joint limits, visual surfaces and collision space, a hardware abstraction of the robot for the ROS control framework based on the libfranka API, and a set of services to expose the full libfranka API in the ROS ecosystem.Let us note that, although the target robots are two Franka Emika, the architecture can be adapted to another collection of manipulators with only minor changes.The diverse problem instances contain several f luents and actions, as usual in the robotics domain.
The two agents involved are the two robotic arms, called robot1 and robot2.In front of each of them, three stacks of coloured blocks are placed on a private table.Figure 2a presents an example of a state.Initially, the two robots may ignore what blocks are placed in front of them, but they can execute a sensing action to check if they have a particular block.They can also move a block from the top of a stack to the top of another stack or to a shared table in between the two robots.We assume that only once a robot has placed a block on the shared table, it is able to communicate the colour of such a block to the other arm.Examples in Figure 2 are obtained using the RViz simulator [42] integrated into MoveIt!.

Modelling the Problem Instance
When we model a scenario as a planning problem in order to eventually effectively execute it, we need to address the 'anchoring' problem.i.e. we need to connect the model description to the physical objects in the real (or simulated) world.Let us take the process of grasping, executed by an automated arm, as an example.Grasping is usually composed of different phases: (i) pre-grasping, in which the end-effector approaches the object to be grasped with a given direction and orientation.(ii) Actual grasping, in which the final joints of the gripper close.Finally, (iii) post-grasping, in which the end-effector moves away from the position in which it grasped the object in a given direction and orientation.During each of these sub-processes, we must ensure that no collisions occur between the robotic arm and the objects in its workspace.In the approach we adopted, we hard-coded the execution of the three grasping sub-processes in the ROS program, modelling at the FIGURE 2. Examples of the robotic test environment states.These images have been generated using RViz.A video of the 'real' execution with the Panda Robots is available: http://clp.dimi.uniud.it/sw/.action description level only the 'act of grasping'.Since this is not the focus of this article, we just mention that we used the MoveIt!library to calculate collision-free trajectories.MoveIt! is provided with information about the location of objects to calculate trajectories with fast algorithms, such as Rapidly-exploring Random Tree.The initial state of the domain, the effect of the action and some static information of the environment, like the position of the shared table, lead us to update the current state of the MoveIt!setting before and after the execution of an action under a closed world assumption.
The two modelling action domains setting in ECHO allows the modeller to decide what to model at an MEP level and what to a classical level.The main concept is that the classical action description domain can be viewed as a refinement of the ontic actions of the MEP action description domain.For instance, we are not interested in modelling the picking and placing actions at an epistemic level as ontic actions, but we just want an ontic action that models the fact of moving a block from the private table to the shared table and viceversa.Furthermore, the set of MEP f luents is a subset of the classical f luents.Fluents that appear at the MEP level are the only ones over which we want to characterize the beliefs of agents.However, to make the agent effectively act in the real world, we may enrich the classical description with more f luents.With respect to this study case, we just want to express the fact that an agent owns a block, and we push down to the classical level the fact that a block may be on top of another block.
For the sake of readability, we will not report all the actions' descriptions, rather we will show only some meaningful examples.Since ECHO provides an intuitive and user-friendly python API interface, we show how to describe a few actions both of an MEP and classical domain, and how they are then compiled, respectively, in E-PDDL and ASP.The initial state description and the f luents encoding are also presented.Examples predicates.We can express both classical actions and epistemic tasks with the provided ECHO API.Since actions are a fundamental part of the domain description, let us show some examples of their encoding in Listing 1.4.In particular, we will present the following actions: pick, a classical action that describes the grasping of a block situated on the top of a stack.
-from_private_to_shared, which is an ontic task that describes the process of moving an object from a private table to the shared table.
check, a sensing task in which the robot checks if there is a block of a specific colour in front of it.
As previously mentioned, during the execution of ECHO these actions are compiled into ASP and E-PDDL.We show in Listing 1.5 a fragment of the compiled representation of the actions defined in Listing 1.4.For a detailed explanation of the syntax of E-PDDL, we address the reader to [20].
Let us now consider what happens when an epistemic ontic action appears in the epistemic plan.Consider a configuration in which the red block is stacked under a black block and the epistemic ontic action suggests moving the red block on the shared table.The robot cannot take directly the red block, but it should first move the black one at the top of another stack and then take the red LISTING 1.4 Examples of actions.block to put it on the shared table.Therefore, such an epistemic ontic task may be translated into the sequence of actions presented in Listing 1.6.

Experimental results
To further assess and demonstrate the capabilities of the developed framework, we tested our architecture on a wide spectrum of scenarios that stem from the configuration described in Section 4. Experiments showed the positive impact of using macros in the MEP setting.Furthermore, we were also able to prove the feasibility of the solving procedure in real-world situations.
All the experiments have been conducted on an Intel i7-8565U CPU at 1.80GHz and Ubuntu 18.04 OS.For each problem instance, a time limit of 120 s (2 min) was applied.Clingo [23] was used to solve the ASP encoding, and EFP [17] to tackle the resolution of MEP problems.The benchmarks used in this manuscript and other encoding examples can be found in http://clp.dimi.uniud.it/sw/.Other than the domain presented in Section 4, we also have formalized the (i) 'pure' classical Blocksworld domain [29]; (ii) the Coin in the Box [34,37] domain, which does not have actions that require refinement; (iii) the Table domain, which is used to show an example of a domain enriched with the knowledge to form a goal-network; and (iv) the well-known Gossip Problem [15].
Table 1 reports the confrontation of three solving approaches for an MEP domain.In particular, we confront the solving times of EFP, the solver that is used as the epistemic planner in ECHO, with LISTING 1.5 Examples of ontic actions.LISTING 1.6 List of single-arm actions.two variations of ECHO itself.These two configurations are the plain version of ECHO and the one enriched with the concept of macros.This will permit us to highlight how the use of macros can help in reducing the solving time.
The test cases are obtained with a cross-product between goals of increasing difficulty with an increasing number of blocks inserted in the domain (rows and columns of Table 1, respectively).In particular, we have that: α has the objective to let one agent know the colour of at least one block initially located in the table in front of the other robot; β has the same goal as α while also requesting that the shared table must be free at the end of the plan; γ encodes the scenario where the goal is for a robot to know two different colours of blocks initially placed in front of the other arm; δ shares the goal of γ with an 'extra' condition imposing that the shared table must be free when the planning process is concluded.
For the sake of readability we will use the following notations in Table 1: -ND that stands for Not Defined.This is used e.g. to indicate that instance γ cannot be performed with just one colour as it required that a robot learns two different colours while only one is available.-TO that stands for Time Out.As said before, after 110 s the planning procedure is forcefully stopped if it could not find a solution.
From the domain it is evident that when an agent takes a block from the shared table, to place it on top of one of its stacks, it often happens that the agent announces that the table has been freed.This small sequence of actions constitutes the macro announce:from_private_to_shared.As mentioned before, in Table 1, we compare the solving process when this macro is not activated (ECHO) and when it is (ECHO-M) as well as when we only employ EFP (EFP).
To summarize, Table 1 shows that EFP has always outperformed and that macros improve considerably the performance.This shows that ECHO, especially when coupled with the concept of macros, permits tackling epistemic domains in reasonable times.The use of ECHO makes it possible to have better scalability both in terms of the number of f luents and of plan lengths, thereby introducing the possibility to reason on an epistemic level in complex multi-agent environments.
For the sake of completeness, let us now address the initial state generation.This is the first step performed during the calculation of the epistemic plan.We report evaluations for this task as it is well established in the epistemic planning literature that generating the initial state from its description is a very resource-heavy problem that also requires specific formalization constraints [17,49].This process is empirically almost independent of the use of macros, but it is affected by the number of f luents and actions used in the planning model.To better exemplify this, let us report the average times needed to compute the initial states, increasing the number of f luents, in our case the number of coloured blocks: (1) for one colour 0.003 s;(2) for two colours 0.031 s;(3) for three colours 0.126 As a design choice, we decided to run the classical planner at run-time, each time its use is required to break down an ontic task in a sub-plan of classical actions.Justification for this choice is provided by Table 2, in which, for each instance, the cumulative time required by the A V solver and the times it was called are reported.Let us note that the times required by the epistemic planner without the support of the classical one, i.e. the times required by the solver if the domain was entirely described at the epistemic level were always higher than the timeout.
Finally, let us justify the introduction of the goal-network formalism that could take the place of the 'plain' classical solver as sub-routing to break down ontic tasks into simpler actions executable by a robot.In what follows we illustrate the main characteristics of scenarios in which goal-network formalism provides advantages with respect to the 'plain' solver.(i) The actions that are (usually) needed to satisfy the goals are just a small subset of the total number of available actions.(ii) Specific domain knowledge is present and can be formalized and injected into the solving process with the goal-network.(iii) Specific order of execution between the action needs to be enforced.In what follows, we will provide some experimental results to support (i) and (ii).For condition (iii) we argue that, while it might be met by modifying the actions preconditions or iterating over successful plans produced by the 'plain' solver (eliminating those plans that do not respect the right action order), the use of a partially ordered set of goals is far more elegant and clean.
To better highlight the comparison between the 'plain' and the goal-network formalism, we decided to eliminate the epistemic components from the problem description.This means that the study case is a restricted version of the main one previously discussed, where we require that a robot moves the block at the bottom of a stack to the shared table.
We compared the 'plain' classical planner with the goal-network planner enriched with different types of knowledge.For the sake of readability let us introduce some notations.
plain represents the 'plain' classical planner that does not exploit any addition domain or instance knowledge.domn represents the goal-network planner used in combination with knowledge at the domain level.This extra information, shown in Listing ??, specifies the following method: in order to pick a block C2 that is currently under block C1, you have to (i) pick C1 first, (ii) place C1 to another location and (iii) finally pick C2. inst represents the goal-network planner used in combination with knowledge at the instance level.This knowledge is encoded in Listing ?? and suggests a sequence of sub-goals that needs to be satisfied in order to place the red block on the shared table.Note that such a sequence depends strongly on the configuration of the stacks.
LISTING 1.7 Encoding of the domain knowledge exploitable by the goal-network planner.
LISTING 1.8 Encoding of the instance knowledge exploitable by the goal-network planner.
Encoding of the domain knowledge exploitable by the goal-network planner.Encoding of the instance knowledge exploitable by the goal-network planner.Ultimately, we also aimed to test our classical planning solver when the 'useful' actions constitute only a subset of all the possible actions.To evaluate this scenario, we incorporated dummy actions, which lack any significant impact on the given planning problem, and their sole purpose is to increase the size of the planning description.Table 3 shows the execution times (in seconds) of plain, domn and inst on the case study (stripped of its epistemic component) when also enriched with 0, 500, 1000 and 1500 dummy actions.
Table 3 illustrates that the inclusion of knowledge regarding the planning domain and the single instance is advantageous in terms of computational time when the size of the action description is significant.Conversely, in situations where the action description is small, the plain planning method outperforms the goal-network approach due to the additional overhead introduced by the inclusion of methods and goal rules in the ASP program.Our interest in potentially huge planning domains stems from the fact that our framework is designed to refine ontic actions, even if it can only be utilized for classical planning problems.In principle, the action description could be significantly large, and for each refinement task, only a fraction of it may be necessary.Notably, the inst section yields superior results compared with the other methods due to the guided search for the plan, which requires sub-goals to be satisfied in a specific order.Although one may argue that this approach is unrealistic in a planning scenario, we emphasize that the addition of instance-based knowledge is not obligatory but can be beneficial if utilized, and our approach facilitates its integration.Additionally, we observe that methods are more effective when they are not parameterized by numerous values due to the grounding phase, and we suggest a maximum of three variables would be a suitable modelling choice.

Conclusions
While the field of cognitive robotics has made significant progress over the last few years, there are still some open questions regarding how to integrate new AI components.Moreover, multi-agent scenarios where the acting entities can perform low-level sensing and control tasks are becoming more and more available both in the industrial and academic environments.And, at the same time, epistemic planning is receiving a lot of interest.
In [13] and in [14] Capitanelli et al. proposed a set of PDDL+ formulations that allows modelling the problem of manipulating articulated objects in a three-dimensional workspace with a dual-arm robot.Instead, [6] addresses the same domains while encoding the planning problem directly in ASP making strong use of macros.Another similar tool that inspired our research is the ROSoClingo [2] ROS package.This framework integrates Clingo [23] into the ROS service and actionlib architecture, providing a high-level ASP-based interface to control the behaviour of a robot.While these works provided the foundation for our research and are far more complete tools, they do not consider either the information f lows between agents, or their knowledge.This aspect is key in every multi-agent scenario, where reasoning from the perspective of others should be taken into consideration.
That is why we proposed ECHO, a framework that offers the possibility of modelling MEP problems in 'rich' multi-agent environments built upon the architecture firstly presented in [47].This tool stems from the aforementioned approaches and integrates the latest MEP techniques to tackle the planning problems also considering the information f lows.While integrating the epistemic aspect of the multi-agent domain is, in our opinion, of utmost importance, it requires high computational resources.That is why ECHO also introduces two methods to improve the planning times and to have feasible solving processes: (i) a hierarchical usage of epistemic and classical planners to abstract and simplify the problems when considered by epistemic solvers; and (ii) the employment of macros in epistemic planning.
While the presented results and the performances of ECHO are very promising, we are currently working on some aspects to improve even further our architecture.First of all, we are defining how to automatically define macros, action refinements and other components that are now part of the input definition.We are planning on doing that by extrapolating experience from previous executions of ECHO that would highlight which actions can form a macro or which one of them can be easily abstracted into an epistemic task.Another idea that we are currently investigating is the one to decouple ECHO from the idea of epistemic planning and use it to 'hierarchically' solve diverse types of planning, exploiting the extremely optimized techniques derived from classical planning as a base.Finally, we are also working on improving the ASP planners themselves, to make them faster and more f lexible e.g. by permitting the definition of custom heuristics.

FIGURE 1 .
FIGURE 1.A class-based representation of the ECHO system.
Let us start, by showing how to define types, f luents and then variables in Listing 1.2.Examples of types, f luents and variables.Then, in Listing 1.3, we present how to define predicates by providing the following examples: -literals -disjunction and conjunction of predicates -belief and common knowledge formulas LISTING 1.2 Examples of types, f luents and variables.LISTING 1.3 Examples of predicates.

TABLE 1 .
Time, in seconds, to find a goal, given an initial state.Each instance varies the number of available colours.In boldface is highlighted the fastest solving process w.r.t. the activation of the from_private_to_shared_announce and from_shared_to_private_announce macros.Let us note that all the values were obtained by averaging the times of 5 iterations on the same instance.

TABLE 2 .
Cumulative time, in seconds, to break down ontic tasks into classical actions.The number of calls to the classical planner is reported in the column 'calls'.) for four colours 2.117 s.As a consequence of this observation, it is a good rule of thumb to limit the number of f luents considered to the strict necessary required for epistemic planning.

TABLE 3 .
Execution times, in seconds, to solve the case study (stripped of its epistemic component) domain with the diverse configurations of the classical planner.