Middleware for providing activity-driven assistance in cyber-physical production systems

Assistance is becoming increasingly relevant in carrying out industrial work in the context of cyber-physical production systems (CPPSs) and Industry 4.0. While assistance in a single task via a single interaction modality has been explored previously, crossdevice interaction could improve the quality of assistance, especially given the concurrent and distributed nature of work in CPPSs. In this paper, we present the theoretical foundations and implementation of MiWSICx (Middleware for Work Support in Industrial Contexts), a middleware that showcases how multiple interactive computing devices such as tablets, smartphones, augmented/virtual reality glasses, and wearables could be combined to provide crossdevice industrial assistance. Based on activity theory, MiWSICx models human work as activities combining multiple users, artifacts, and cyber-physical objects. MiWSICx is developed using the actor model for deployment on a variety of hardware alongside a CPPS to provide multiuser, crossdevice, multiactivity assistance.


Introduction
The production environment of the future is marked by an increasing use of cyber-physical systems (CPSs), in which physical processes and the means of their control (sensors, actuators, and processors) are both interconnected and distributed. In manufacturing, the term cyber-physical production systems (CPPSs) or Industry 4.0 (Broy & Schmidt, 2014) is used. It is postulated that the advancement of communication technologies and machine-learning algorithms could automate many human activities in manufacturing, resulting in a Smart-CPS (Tepjit, Horváth, & Rusák, 2019). Nonetheless, the history of automation shows that while introducing new technologies may compensate for human weakness, these often give rise to new strengths and weaknesses in often unanticipated ways (Strauch, 2018), rendering human involvement unavoidable, even necessary. Expectedly, Becker and Stern (2016) predict the following five distinct qualities of future production work: r Humans will be absolutely necessary in the factories of the future. r New tasks will be more complex. r New tasks will be intensely connected to computational devices. r Easy and repetitive tasks will be automated. r Unique human abilities will play a more significant role in human task design.

Scenario 2: activity assistance
Employee Y is assembling an order at a projection based assembly station, where the tablet on his station is at the moment running an assembly application. Y is alerted to a fault on the manufacturing line on the smartwatch Y is wearing, which occurs only when events of certain severity are reported. The smartwatch also reports the location of the line and a QR code to allow access to the machine where the fault may have occurred, which Y transfers to the tablet. Y goes to the machine and scans the QR code that activates a troubleshooting activity for the machine, and uses a pair of mixed reality (MR) glasses that show him the location of various sensors on the machine and their current readings. Besides, Y can also see "augmented" notes made by other operators about points to look out for in the form of textual information on the tablet and as associated 3D locations on the MR glasses, local minutiae that cannot be found in the generic troubleshooting guide. Y fixes the issue as it occurs when this particular product configuration is manufactured, and writes an augmented note for future knowledge. After resolving the issue, Y takes off the glasses, goes back to the assembly station with the tablet, switches back to the assembly application, and resumes work on assembling the product.
Scenario 2 is reimagined as a crossdevice, multiuser scenario where assistance adapts to the users' information needs and respects the situational view of problem solving. It takes an activity-centric view of work, where activities overlap, and sometimes, override one another. Further, it also acknowledges the need to capture the evolving nature of human understanding and knowledge at work and the fact that activities on the shop floor can span across many machines. The devices here play a pivotal role in helping the user switch between different modes of work, each involving inputs from different devices and possibly, other users as well. In short, scenario 2 describes a situation in which assistance is activity-driven: r it presents relevant information through a combination of interactive interfaces and devices most suited to the nature of underlying cyber-physical entities, and r it can be adapted to the flow of work at the workplace.
In order to realize the second scenario of activity-driven assistance in a CPPS context, an intermediary is needed, a system that can r communicate with the different, interconnected low-level system components, and simultaneously r manage crossdevice interaction as operators initiate, stop, and switch between activities.
The most commonly used term to define such a solution is middleware. In this paper, we elaborate on the design and prototype implementation of such an intermediary -MiWSICx (Middleware for Work Support in Industrial Contexts). MiWSICx is a middleware solution that mediates the interaction across industrial systems and multiple devices and users to support them in both informationally and spatially fluid manner.
The rest of the paper proceeds as follows: Section 2 describes how the nature of work in a CPPS also affects the design of interactive solutions for operators, leading to requirement formulation. Section 3 discusses the role of activity theory (AT) in providing a basis for comprehending these requirements, and presents the fundamentals of AT to derive a framework that applies it in the CPPS domain. Further, in Sections 4 and 5 this framework is applied to create an ontology and metamodel for MiWSICx, and the design and deployment of MiWSICx to support CPPS activities are demonstrated. Section 6 elaborates on some of the unresolved issues and sketches themes for further exploration, and Section 7 concludes this paper.

Nature of Work in a CPPS
Industrial control systems, until now, have been developed to realize distributed, real-time system control, and long-term stability. Manufacturing and process control applications utilize several independent industrial communication technologies (Modbus, Ethernet, and Profinet to name a few) and the possibilities of human-machine interaction have also been traditionally tightly coupled to these systems, in the form of dedicated industrial human-machine interfaces (HMIs) located on site or in control rooms that provide an overview of the state of the system. As automation increases, the idea of industry 4.0 envisions a future consisting of smart factories as opposed to deserted factories (Spath et al., 2013). According to Gorecky, Schmitt, Loskyll, and Zühlke (2014), in smart factories human workers are seen "as the most flexible entity in CPPSs," as they will be "faced with a large variety of jobs ranging from specification and monitoring to verification of production strategies." In the following subsections, the link between technical qualities of future production and its impact on human work is explored.

Scalability, concurrency, and operational flexibility
Moving beyond the principles of cellular manufacturing and group technology, models of manufacturing in industry 4.0 propose distributed, flexible, scalable, and hence complex production environments. In such smart factories (Lucke, Constantinescu, & Westkämper, 2008), on-demand and completely customizable smart products (Abramovici, 2015) can be manufactured on the same manufacturing lines. Based on their capabilities, components can be arranged in many configurations: sequential, parallel, or hierarchical in the form of distributed services (Bettenhausen & Kowalewski, 2013) or agent-based systems (Vogel-Heuser, Lee, & Leitão, 2015).
While technical flexibility may imply that technology can be arranged and made to operate in different modes and across various hierarchical levels, it does not naturally follow that technology itself is capable of handling the variability -it will be passed on to the human operators in the form of task complexity.

Increase in human task complexity
ElMaraghy, ElMaraghy, Tomiyama, and Monostori (2012) provide a comprehensive overview of the term and its manifestation in manufacturing, assembly, and enterprise. According to the authors, a system could be complex in functional, structural, spatial, and temporal domains. For instance, the structurally complexity in the design of a product is guided by the functional complexity required. Spatial complexity refers to the distribution of components in different locations, while temporal complexity refers to how events in time affect system behavior and lead to complex behavior. In the future workplace, human operators will have to confront all four aspects of complexity in a CPPS. Tasks will become increasingly distributed and concurrent, requiring information about the structure and function of the underlying system from a multitude of sources, increasing cognitive demands and the need for both explicit and tacit knowledge (Rasmussen & Lind, 1981;Byström & Järvelin, 1995;Vakkari, 1999;Becker & Stern, 2016).
The use of HMIs in the manufacturing industry has been shown to reduce the negative effect of system complexity for human operators (Guimaraes, Martensson, Stahre, & Igbaria, 1999). A combination of various interaction possibilities such as tablets, headmounted displays, AR systems, and wearable technology could reduce the cognitive complexity of tasks in an Industry 4.0 workplace (Lucke et al., 2008;Valdez, Brauner, Schaar, Holzinger, & Ziefle, 2015), but how this combination in an industrial context could be achieved remains unexplored.

Multidevice versus crossdevice interaction
We could use multiple interactive devices to carry out tasks in two ways: simultaneous or sequential use (Brudy et al., 2019). In simultaneous use, single/multiple activities are distributed on multiple devices and executed in parallel. In sequential use, single/multiple activities are carried out in sequence on different devices. Whereas multidevice interaction could refer to either scenario, crossdevice interaction concerns the coordinated, fluid use of multiple devices by simultaneously or sequentially distributing interactive elements across these devices. The distribution strategy can be logical, spatial, or temporal (Brudy et al., 2019).
Since multitasking on multiple devices has been shown to reduce performance and increase strain, indiscriminate use of multiple interactive devices could be detrimental to occupational health and safety (Paridon & Kaufmann, 2010). Consequently, a lot of research effort in recent years has been devoted to constructing crossdevice workspaces, which comes with several challenges (Dearman & Pierce, 2008;Santosa & Wigdor, 2013;Jokela, Ojala, & Olsson, 2015). First, devices vary in their modalities, and while different modality combinations could be used to reduce cognitive overload (Elting, Zwickel, & Malaka, 2002), the best suitable device for each task element has to be chosen. Second, since most interactive devices nowadays support multiple applications, a related issue involves switching among activities using the same device ecosystem depending on user needs, and maintaining this configuration over sessions in the background. Third, with use, data tend to spread across different devices and cloud platforms, resulting in fragmentation. Further, any multidevice ecosystem also needs to cater for combinations of individual or collaborative use of devices (Srensen, Raptis, Kjeldskov, & Skov, 2014): one-user-one-artifact, one-user-many-artifacts, many-users-one-artifact, and many-users-manyartifacts. Therefore, designing applications for crossdevice interaction requires an approach to handle challenges at different levels. First, this approach needs to tackle the aspects of distributed interaction: parallel versus sequential use, specialized versus redundant use, and individual versus collaborative use. Second, it needs to address the maintenance of such a configuration of devices based on spatial, logical, and temporal strategies, and lastly, it has to manage sources of information in the background. In the next section, we describe how AT could inform this approach.

Figure 2:
The mediated action triad, as conceptualized by Vygotsky (1980). S, A, and O denote the subject, artifact, and object, respectively.

Activity theory
Initially developed as a theory of psychology, AT shares some similarities with other goal-oriented approaches to human psychology, such as action regulation theory (Zacher & Frese, 2018). While the latter has been applied in the field of job and work design, AT has informed research and design in the HCI domain.
AT is a socio-cultural framework that offers an analysis of human behavior as the realization of human motives and goals as activities. In the model developed by Leontiev (Leont'ev, 1978), an activity is what links any subject, human or nonhuman, to artifacts in the world in which this subject exists. Leontiev proposes a three-level hierarchy to describe human activity, shown in Fig. 1. At the first level, an activity accomplishes a motive by reflecting on an object. For instance, a meeting at the office could be a motive for going to the office. At the second level, actions are carried out to realize conscious goals. In our example, that corresponds to the action of travelling to work, and in some cases, consciously deciding on a means of transport to do so. The object at this level would be the means of transport. At the third and final level, actions are accomplished by means of operations that are internalized patterns of behavior acquired through learning or social interactions (Baerentsen & Trettvik, 2002). For example, if driving a car, an experienced driver is able to operate the steering wheel and shift gears without them being the focus of attention. Operations can exist only within the structure and condition of actions; for example, the car's condition itself determines how well a driver is able to drive it.
An activity can be differentiated from another only when it is intended toward a different object, even though some actions may be common to both activities. In the previous example, the motive for going to the supermarket differs from going to work; however, as long as the second level object (car) and the conditions (route, weather, etc.) stay the same, similar actions and operations would be involved.
An essential distinction between AT and other HCI theories is that AT tries to explain how humans transfer between the different levels of activity via mediation. Vygotsky (1980) introduced a tool or an artifact as a mediator of the interaction between the subject and the object in an asymmetrical relationship (Fig. 2). Most human interaction with objects in the environment rarely takes place without the use of physical and cognitive tools -the process of learning itself involves mastering the use of tools to achieve goals, be it solving a mathematical problem or preparing a meal.
In a complex activity system, numerous varieties of mediating artifacts may be involved. Wartofsky (2012) proposes a three-tiered hierarchy of artifacts. Primary artifacts are the most apparent in everyday operations, for example, pens, keyboards, etc. Secondary artifacts are representations of tools, as well as plans, explanatory models, and hypotheses. Tertiary tools are mobilized at the most abstract level of an activity system to comprehend it and to shape its course. In Fig. 3, we use the terms why, how/what, and which to denote the categories of mediating artifacts. "Why" artifacts represent externalized reasons to engage in activities, for example maintenance schedules. "What" artifacts represent the statement of goals and "how" artifacts show how to achieve them. The final category is "which," pointing to the physical artifacts used to achieve goals. The use of the term "how" at this level is not needed, since operations are internalized in the subject.
Nowadays, the process of digitalization is transferring many forms of physical artifacts, such as documents, into the digital domain; hence, all digitalized why/how/what artifacts are represented by an interactive "which" artifact (for example, desktops, tablets, wearable devices, etc.). This relationship is also shown in Fig. 3.

An activity-centric CPPS
While the notion of activity-based computing (Norman, 1986) and activity-centric computing (Bardram, 2011) has long entered the desktop computing environment (KDE.org, 2018), the potential of AT remains relatively unexplored in the industrial domain. While we are unsure of the reason behind this state of affairs, we postulate that this may have something to do with the socioeconomic dynamics of research in the HCI domain that has traditionally focused more on the consumer segment. In this section, we establish the link between AT and CPPS.
We start with a human-centric (or anthropomorphic) CPS architecture for a smart factory proposed by Zamfirescu, PâRvu, Schlick, and ZûHlke (2013), shown in Fig. 4. In this architecture, a CPS is divided into three components: the physical component (PC), the cyber/computational component (CC), and the human component (HC). Each of these components is connected outside the CPS to a specific physical, computational, and social dimension. Adaptors transfer information between pairs of these components. The model identifies the components of a CPPS and is appreciative of the relationships between them, but does not commit to any particular computational or interactive model for putting the framework to use. In this model, both the PC and CC can support interactions, the former through "special displays" and the latter via "classic HCI devices," both of which are forms of adaptors.
On a closer look, the parallels between the models shown in Figs 4 and 2 are observable. Borrowing the terminology from AT, the intentional object is the CPS, while the coupling, or mediation, is supported by the adapters. This mediating nature of the adaptors becomes apparent when one turns the model "inside out," and replaces the HC, CC, and PC in a CPPS by their activity-centric counterparts, as shown in Fig. 5. In an activity-centric CPPS, cyber-PCs are the objects of all control applications in a CPPS; they are  information producers and carriers that support interaction, mediated by interactive artifacts. Using this model as an entry point, the physical and informational relationships between these entities can be better understood.
As the granularity of the activity changes from activity to actions and operations, so does the granularity of the objects and artifacts. For example, to carry out the maintenance of machine, at an activity level, the object is the machine itself. To have successfully carried out the maintenance means that the machine is in running order. The "why" artifact at this level is the maintenance plan that helps initiate the activity. At the second level, actions represent the work to be done, for example cleaning a part, checking the physical or electrical condition of parts or replacing them, etc. The "what and how" artifact under use here is the maintenance manual, along with cognitive tools such as interactive devices. At the final level, no "how" artifact exists, since operations are internalized; they are not described but assumed to be carried out through a "which" artifact, for example unscrewing a part or a subpart. The table in Fig. 6 summarizes these levels.
In a CPPS scenario, we use the word resource to denote the digital representation of why/what/how artifacts in the form of uniquely identifiable and locatable object representations as well as the plans to manipulate, maintain, and modify them. A what resource could be a digital twin (Uhlemann, Lehmann, & Steinhilper, 2017) to a CPPS object in the industry 4.0 context, whose related how artifacts provide additional information to the operator to carry out the actions supported by this object via an interactive artifact. A change in the state and properties of the associated CPPS object would be reflected in the what and how resources. Why resources are represented by plans and events that form the motivation behind carrying out an activity. The actions that are allowed depend on a combination of the capability of the mediating device and the properties of the resource -for instance, a tablet can allow text manipulation; doing so with a smartwatch would be cumbersome.

Activity contexts
In a crossdevice scenario, each interactive artifact may be delegated a different role, that is, to act as a mediator for different artifacts, feedback, or action, depending on its instrumental (what it helps the user achieve) and operational (how it helps the user achieve it) capabilities (Bødker & Klokmose, 2011). Figure 7 shows a more practical way of looking at an activity. The motivation of performing actions here is the completion of a workpiece assembly. Interactive artifacts represent how/what artifacts, or mediate actions, or both, each playing to its own strengththe tablet is best suited for representing a how/what artifact, i.e., instructions, while in situ projection draws worker's attention to the objects of these instructions, that is, bins on the workstation. A smartwatch provides immediate, haptic feedback, suited to a noisy environment to relay why events, along with natural interaction capabilities.
Viewed from an abstract perspective, an activity context could be viewed as the distribution of interaction with resources among different devices as illustrated by Fig. 8. Each device, therefore, due to its instrumental and operational capabilities, affords different actions under the same activity. Individual what/why resources are assigned to a device, but could also be shared between some of the devices. The first approach prefers specialization, while the second is more redundant, both in physical (Vernier & Nigay, 2000) and informational aspects (Moore, Chrysanthakopoulos, & Nielsen, 2007).  The same concept could be extended to collaborative activities, where users, by definition, work together toward a common objective. It does not imply that the user needs to be involved in the same activity, only that the object of their activities is common. For example, two workers may carry out the maintenance of the same machine, but carry out different tasks. In the simplest case, the users only share the object, and have their own individual interactive and resource artifacts. In the second, more complex case, they may share the resource artifacts, as shown in Fig. 9.
Until this point, AT has been utilized to create an abstract representation of an activity context consisting of a combination of users, artifacts, activities, goals, and actions. In the following section, an ontology of the activity context is derived to implement the data model for the middleware (MiWSICx), following which a suitable technical framework is chosen to create a software architecture that can meet the technical requirements.

Summary and requirement formulation
Based on the discussions in the previous sections, the following technical and functional requirements were derived for MiWSICx. Technically, the middleware needs to r be distributed; r support concurrency; r be scalable; r reduce implementation complexity.
As for functional requirements, an MiWSICx will have to r communicate with various artifacts and objects over different channels; r provide access to resources and services; r support activity-centric computing for multiple users and on various devices.
Based on these requirements, the following sections elaborate on the design of MiWSICx. Figure 10 shows the activity ontology in MiWSICx. (Bardram, 2011) and Moran, Cozzi, and Farrell (2005) have also developed similar ontologies, but our ontology differs in two aspects. First, the concept of an object is realized as a networked CPPS object, and second, the concept of a resource is introduced as the facilitator of this interaction.  Figure 11 shows the derived entity structure for MiWSICx. An activity context entity aggregates different activities for a user, whereby a user is associated with an activity context, defined as a persistent entity that can be saved and reloaded when a user leaves and reenters an activity context.

Ontology
An artifact, or a device, is what mediates user action, and is uniquely identifiable by its description, name, and location. Most digital devices support various modalities and communication interfaces, termed capabilities through which they can exchange information. An activity can contain a combination of such devices.
An activity consists of resources currently under use along with the devices a user is interacting with. To adapt to the contextual constraints in an environment, devices are not persistently saved with the activity or the activity context, but added to the activity each time a new device connects.
Further, the term action encapsulates the corresponding changes of state that an abstract entity affords. For instance, an activity context supports actions to process user login and switching between different activities, whereas an activity entity supports change in activity states and corresponding assignment to devices. Resources support actions that provide a service as a change in the resource itself or its underlying objects states, for which the objects also need to support actions themselves. Therefore, in conjunction with the action entity, a state too is contained in almost every entity. The state here signifies the semantic representation of an entity's state brought about by an action.

Description entities and types
The description object encapsulates basic entities through whose combination an entity can be fully described, as shown in Fig. 12: r when: describes a point in time, in the form of a timestamp in combination with a timeout value. r where: describes a location in space, consisting of a reference point, and a relative coordinate. A where object also contains a representation for this point in space, for example in the form of an image.
r representation: contains additional information about an entity that is helpful in representing this entity via some modality. A representation consists of either a text description or a resource.

MiWSICx: Implementation
Section 3.4 listed the following technical requirements for MiWSICx: concurrent behavior, scalability, and ease of deployment. In this section, the different approaches available for implementing such a middleware solution are compared.

Services, agents, or actors?
In choosing a foundation for implementing MiWSICx, it was necessary not only to take into consideration the management of activities and the associated user interaction, but also the possibility that these activities may involve distributed computational tasks such as image processing, data analysis, and machine learning in the background. Therefore, both asynchronous messaging and computational concurrency would have to be managed such that a deployment could manage multiple users, activities, and associated background processes at the same time.
The conventional approach would involve a decoupling of communication from computation, where communication would be handled either by a shared-data-space/broker/publish-subscribe model such as data distribution service (Pardo-Castellote, 2003), MQTT (Banks & Gupta, 2014), and ZeroMQ (Hintjens, 2013) in conjunction with a client-server model, implemented as a serviceoriented architecture (SOA).
While this decoupling would more than adequately support communication between systems and user devices, the problem of distributed computing would still remain unsolved at an individual system level. Computational concurrency would still have to be managed by each application at a local level, and applications would have to use a different interprocess communication protocol to exchange data between the local (within the application itself as well as within applications on the same system) and global (between systems) levels. This lack of transparency between the global and local scale means added complexity as different mechanisms for managing communication and concurrency would have to be maintained (function calls, threads, processes, interprocess communication, and messages between systems using a broker), which goes against our design aim to exert minimal effort in maintaining and deploying the system.
Searching further for a suitable foundation for MiWSICx, two approaches emerged as promising candidates -multiagent systems and actor model. At a broad level, both frameworks share some similarities. They both offer a means for concurrent computation by implementing communicating autonomous entities. The roots of both these concepts can be traced back to the 70s; however, they were both designed to address different requirements. The actor model is a mathematical theory of computation that treats actors as the universal primitives of concurrent digital computation (Hewitt, Bishop, & Steiger, 1973). The actor model was formulated to address some of the core issues of computing, namely, shared, mutable state (shared resources whose state changes with time), concurrency (that the activities could be carried out in parallel if possible), and reconfigurability (new objects are created and can be communicated with after their creation; Agha, 1986). Multiagent systems, on the other hand, were conceived to harness ubiquity and interconnection to delegate tasks to intelligent agents that would be capable of independent, autonomous action and achieve particular goals (Wooldridge, 2009). Over the years, multiagent systems have enjoyed widespread adoption in the industrial sector, whereas the actor model has only gained popularity more recently in the web services sector.
In our view, the multiagent paradigm and actor model address different levels of abstraction. While the actor model provides a solution to concurrent, distributed, and reconfigurable computation, multiagent systems are designed to intelligently and autonomously solve problems. In other words, a multiagent system could use the actor model as its foundation (and many agent frameworks have incorporated actor like behavior), but the converse may not hold. Similarly, there is no inherent incompatibility between the actor model and an SOA -just that the former can replicate the latter in a simpler way without the need for an additional messaging subsystem.
Given the functional requirements, described in Section 4, whereas an agent-based model would also be feasible, the actor model offers a simpler, more elegant solution, as described in the following sections.

Actor model
The actor model itself does not prescribe any programming language. It is a programming abstraction that only specifies the requirements, which fall into three categories: the structure of an actor model, the behavior of actors, and the communication between them. Specifically, an actor is a computing abstraction that contains a local, immutable state, does not share this local state with other actors, and is responsible for updating its local state.
Structurally, as shown in Fig. 13, in an actor model, systems comprise of concurrent, autonomous entities, called actors and messages; an actor requires an immutable name or an address to send messages to it, and actors communicate exclusively by sending asynchronous messages to one another (Agha, 1986). Within an actor system, each actor is assigned a unique address upon its creation, and an actor is free to share this address with other actors. This address follows the principle of location transparency, which means that an actor may be based on the same core, processor, or on a different node on the network (Karmani, Shali, & Agha, 2009). Unlike typical applications, actors are also mobile,  which means that they can be updated and moved across nodes or reconfigured to handle varying loads without recompiling the application.
Therefore, the messaging mechanism in actors provides an abstraction that goes beyond the constraints imposed by publishsubscribe, remote-procedure call or a request-response service pattern, and yet, it provides a uniform interface on both global and local scales. Actors will communicate in the same manner, be it on the same hardware or the same network.
In summary, actors offer a level of abstraction that combines message passing and computation in a single, elegant package. In addition, actors could not only be used to deploy the middleware for managing activities as presented in this paper, but also run the associated data intensive tasks, such as image processing, data analysis, and filtering applications, many of which could constitute the computational side of activities. Depending on one's choice of programming language, several flavors of actor models are available: the most popular being the Java Virtual Machine-based Akka Framework (Thurau, 2012), the C++ Actor Framework (Charousset, Hiesgen, & Schmidt, 2014), the C#-based Orleans Framework (Bernstein & Bykov, 2016), and the Python-based Thespian Actor Framework (Quick, 2018).

MiWSICx actors
MiWSICx has been developed in Python, chosen because of its crossplatform compatibility, its dynamic typing system, and its rich repertoire of stable libraries serving both scientific and engineering needs. The Thespian Actor Framework (Quick, 2018) provides the actor model. In the following sections, the different actors in MiWSICx are explained. Figure 14 shows the core actors in MiWSICx. The root or top level actor in MiWSICx is called by the same name. Its job is to start up the MiWSICx base system, which consists of r the comms actor that maintains communication with devices; r the resource manager actor, responsible for handling persistent data storage, activity templates, and loading activities on runtime; r the activity context manager actor that manages instantiating activities based on events and user requests; r the activity manager actor that runs the activity at a given time for a particular user; r the discovery actor that allows devices to discover the MiWSICx node; r the external comms actor that creates respective actors to communicate with other nonactor-based services, such as RS485 networks or an OPC-UA server.

Core actors
The core actors are informed of each others actor addresses at startup. Upon shutdown, the MiWSICx actor sends an exit message to all the child actors, who are then responsible for shutting down their respective child actors. The communication between different actors during startup, initialization, various stages of activity driven interaction and shutdown is detailed with the help of sequence diagrams in the appendix (Fig. A1 to A14).
Each MiWSICx node runs the actors in an actor system. Thespian combines multiple actor systems in a convention, with one actor system acting as the convention leader. Each actor system lists specific capabilities, which determines the kind of actors it can run. Upon starting up, an actor system registers itself with the convention leader, allowing actor systems to communicate with each other and, if needed, instantiate actors on other actor systems that can support a specific capability.

Device handlers
When a device connects to an MiWSICx node, it needs to announce its capabilities to the corresponding comms actor, which then creates a device handler actor that represents this device to MiWSICx. Once the device disconnects, the corresponding device handler actor is also destroyed.

Activity contexts
The activity context actor itself is responsible for routing messages within the context of a user activity, and contains a composite entity. The state of devices, for example, in an activity context is handled by the device handler actor, and the resources themselves are handled by the resource handler actor.
An activity context is started when a user logs in on a particular device. First, the activity context manager directs the request to the resources manager, which searches for an existing context for this user. In case no context is found, a new activity context is created for the user. Any subsequent device connections with the same user name will be directly added to the same activity context.
Once an activity context is created, the user has access to the services and resources available for either restoring previously paused activities or creating new ones. Only one activity will be active at one time. As new users connect, each user is assigned its own activity context, and when a user logs out of the system, it is saved by the resources manager. In case of multiple users, the same activity could be assigned to two different contexts.

MiWSICx: communication
Thespian communicates within actor systems by pickling the Python MiWSICx event entity. In order to communicate with devices that do not support actors or use a different implementation of the actor model, the MiWSICx messaging protocol is used.

Events
Between the different actors, artifacts, and objects, the abstraction that suitably covers conveying prompts and actions is that of an event.
Events can be enriched semantically by defining a subtype, adding a payload, and prioritized depending on the specific importance of the message. A prompt relating to an emergency situation, for example, is evidently more important than a prompt signaling the finishing of a process. Routing and filtering events require that events specify their producers and consumers (Etzion, Niblett, & Luckham, 2011). This leads to the definition of an event in MiWSICx, and it consists of the following attributes: r message type that denotes the type of an event. r message subtype specifies the event for a particular activity. r id that represents a uniquely identifiable event. r priority that signifies event importance. For example, an error event may be of a higher importance than other messages. r payload refers to the content that elaborates the context of an event. r source sender's address, represented as a hierarchical path to a particular task or an application on a device. For artifacts and components that implement the TCP-IP stack, the address begins with the IP address. In case of lower level communication networks such as RS485, it begins with the master or slave address. r sink the receiver's address, represented in the same way as the source.

Messaging protocol
Communication both within MiWSICx actors and with external devices is maintained via a messaging protocol, divided into a header and payload, shown in Fig. 15. The header contains all the information needed to route events to their destination, while the payload contains information regarding the activity at hand. For networked devices relying on TCP/IP communications, a header format similar to HTTP and JSON for message payloads is used; for machine level communications with field bus devices such as using RS485, a specific hex code is used for each event field, followed by a value.   Figure 16 shows the example implemented in our smart-factory demonstrator. There are two activity contexts supported at the moment: The first consists of a laundry folding machine and the second consists of an assembly station. There are three actor systems in use, each running on its own hardware (Raspberry Pi). The actor system with the "MiWSICx" capability acts as the convention leader, and serves as the entry point to the entire network of actor systems. The respective activity contexts are run on the other two actor systems. Activity context 1 runs two instructional activities -explorer, which shows the user the location of the various sensors on the machine in a 3D environment, and a maintenance manual, which guides the user through a maintenance process. Activity context 2 is an assembly instructor that delivers step-by-step instructions to the user on assembling diverse products. The change in the user's context is detected by a positioning device linked to a portable arduino board with Wi-Fi capability, connected to the root actor system. Another possibility would be to use a smartwatch to exchange nearfield communication data with a sensor located in the specific activity context.

Implementation example
Since MiWSICx is implemented in Python that also supports scripting, the metamodel is instantiated directly without a domainspecific/descriptive language. The script instantiates the entities defined in the ontology (Section 4.1) as Python objects that are then serialized and stored for future use. A snippet from an activity hosted by activity context 1 is shown below. Creating an activity also creates the associated messages that the actor will act on; the behaviors that these messages activate are already supplied by the activity template.
template dir = PersistentStorage.get template dir() image dir = PersistentStorage.get image dir() " " "Example of a multi object, multi step activity" " " activity 1 = MultiStepActivity() activity 1.set name("Cleaning Machine A") activity 1.set behavior("locate", activity 1.locate) activity 1.set behavior("next", activity 1.next) activity 1.set behavior("prev", activity 1.prev) These activities are inflated back to objects by an actor whenever a user joins a particular activity context through a device and wishes to work on an activity. The user's own activity context (hence the state including any changes the user made) is stored back to a serialized format after logging out, available for future use. Since the source code itself does not have to be compiled, new activities can be added as serialized objects without recompiling the rest of the system.
The actor system itself exchanges the event object among actors, but for communicating with artifacts outside the actor system, this object is encoded into the protocol specified in Section 5.4.2, with a JSON payload (supported natively by Python) along with the MiWSICx protocol header.

Challenges and Issues for Further Exploration
The scenario presented in the previous section represents our attempt at realizing the metamodel introduced in Section 4. In our view, it only presents one possible approach to the problem of activity-driven assistance, since the kind of activities to be supported will vary based on the needs of operators, the tools, and the environment on each shop floor. To this end, we propose some suggestions and themes for future research.

Activity dynamics: behavior versus content
The middleware conveys the ordering and actions supported by resources and objects in activities, but this information becomes actionable only if the frontend also replicates the ordering and affords the actions. So, for example, if a resource's text supports editing, the frontend application has to provide the user the tools to edit the text. Unlike dynamic webpages generated by servers, mobile applications have to be programmed such that a symmetrical relationship between the actions supported by the backend and the frontend can be maintained. In our prototype, we use the same frontend application to support different activities because of their similar nature. This approach will not work for activities with distinct presentation and interaction requirements, since separate mobile applications would have to be programmed for each platform/device. Future research efforts could possibly focus on reducing this effort by coupling activity instantiation to the creation of frontend for multiple devices using crossplatform frameworks.

User created, managed, and enriched activities
Over time, as the nature of activities changes or new activities are introduced, the best case scenario would allow users to compose and edit their own activity workflows and reduce their reliance on others to program and instantiate activities for them. We use the term "best case scenario" because of the following reason: The users themselves are the first hand observers of their activities in comparison to designers, who can infer user intent only during the design phase. Applications allow users to add/remove/edit content but changing their structure requires a new design iteration and compilation. Future research efforts could explore methods (such as visual programming) for users to compose and maintain their own workflows on a combination of mobile devices by modifying both content and structure. Likewise, users could also enrich these activities by creating and adding their own combinations of images, annotations, and 3D coordinates.

Conclusion
The only way of handling complexity is to embrace it, a theme that is recurrent in recent research on CPPS. The human operator is seen as the final stop on the path to handling complex systems; nonetheless, the vision of operator 4.0 remains far from realized -as a first step, there is a need to think beyond static human-machine interfaces, from providing assistance to providing activity-driven assistance -delivering timely, adaptive, and hence, activity-centric information on a combination of multiple interactive devices. Our approach reimagines work as activities composed of the human operator, his/her artifacts as devices and informational resources, and the objects that are the recipients of this activity.
This paper presented the ontology, information model, implementation, and application of MiWSICx, a distributed middleware implemented to provide activity-driven assistance in industrial contexts. MiWSICx utilizes the actor model to maintain a high level of responsiveness in handling multiple users, activities, and crossdevice interaction simultaneously, while also avoiding the drawbacks of thread-based concurrency approaches. By modeling the constituents of human activity -users, artifacts, resources, and objects, MiWSICx manages activities across various interactive device configurations. Through the work presented in this paper, we hope to motivate the design of systems that are adaptable and extensible by the human operators themselves as their own creative problemsolving tools.