A hybrid case-based reasoning approach to detecting the optimal solution in nurse scheduling problem

Demand for healthcare is increasing due to a growing and ageing population. Choosing an adequate schedule for medical staff can be a difﬁcult dilemma for managers. The goal of nurse scheduling is to minimize the cost of the staff while maximizing their preferences and the overall beneﬁts for the unit. This paper is focused on a new hybrid strategy based on detecting the optimal solution in nurse scheduling problem. The new proposed hybrid approach is obtained by combining case-based reasoning and general linear empirical model with arbitrary coefﬁcients. The model is tested with original real-world data set obtained from the Oncology Institute of Vojvodina in Serbia.


Introduction
Demand for healthcare is increasing due to a growing and ageing population, which makes access to medical care more difficult. Medical staff performance represents a significant determinant of public healthcare quality. There is an excessive pressure for cost reduction, which negatively influences work-life balance for a small number of employed physicians and nurses and often results in a decrease of demanded quality of services. Moreover, due to the challenging economic conditions, not only has ever greater number of physicians and nurses from public hospitals moved to live and work abroad, but a lot of them are also employed in private healthcare organizations to earn higher salaries. This tendency has caused the critical issue in medical staff preferences; medical staff satisfaction is a fundamental part in providing the necessary care for patients. There are two aspects to this problem: (i) physician scheduling problem (PSP) and, on the other hand, (ii) nurse scheduling problem (NSP). This PSP is more complex than the NSP since residents still need an educational praxis to get licensed as physicians. Whereas many physicians generally have individual contracts with their hospital with specific and limited details, it is more challenging for the scheduling process to involve these intricate agreements and these physicians will not be scheduled with other hospital staff.
Choosing an adequate schedule for nursing staff can be a difficult dilemma for nurse managers. They need to obtain balance between the staff's individual preferences and the overall benefits for the unit and, consequently, the patients. Therefore, traditional approaches to addressing the challenges of clinical staff organization and scheduling are not always effective in modern complex healthcare environment [23]. More state legislatures are mandating specific nurse-staffing levels, and many nurses are dissatisfied with their work schedules. Optimal solutions derived from techniques with high computing times are usually less valuable than the ones based on a flexible algorithm or userintuitive application [2]. This paper is focused on a new strategy based on hybrid approach to detecting the optimal solution in NSP. The new proposed hybrid approach is obtained by combining case-based reasoning (CBR) and general linear empirical model with arbitrary coefficients. The model is tested with original real-world data set obtained from the Oncology Institute of Vojvodina (OIoV) in Serbia. Also, this paper continues the authors' previous research in nurse decision-making, scheduling and rostering healthcare organizations which are presented in [12,15,[17][18][19].
The rest of the paper is organized in the following way: Section 2 provides an overview of the basic idea in NSP, related work with solution approaches based on general methods, classical heuristics methods and metaheuristics and subsection about CBR reasoning method. Section 3 presents the NSP proposed in this paper, based on hard/soft constraints, CBR representation of empirical data set and the proposed algorithm for NSP. Experimental results and its verification are presented in Section 4. Section 5 provides conclusions and some points for future work.

NSP and related work
The NSP is a well-known non-polynomial (NP)-hard scheduling problem that aims to allocate the required workload to the available staff nurses at healthcare organizations to meet the operational requirements and a range of preferences. The NSP is a 2D timetabling problem that deals with the assignment of nursing staff to shifts across a scheduling period subject to certain constraints.
In general, there are two basic types of scheduling used for the NSP: cyclic and non-cyclic scheduling. In cyclic scheduling, each nurse works in a pattern which is repeated in consecutive scheduling periods; whereas, in non-cyclic scheduling, a new schedule is generated for each scheduling period: weekly, fortnightly or monthly. Cyclic scheduling was first used in the early 1970s due to its low computational requirements and the possibility for manual solution [5].

Related work in NSP
Studies of NSPs date back to the early 1960s. Despite decades of research into automated methods for nurse scheduling and some academic success, it may be noticed that there is no consistency in the knowledge that has been built up over the years and that many healthcare institutions still resort to manual practices. One of the possible reasons for this gap between the nurse scheduling theory and practice is that oftentimes academic community focuses on the development of new techniques rather than developing systems for healthcare institutions [3].
In the past decades, many approaches have been proposed to solve NSP as they are manifested in different models. The three commonly used general methods are mathematical programming, heuristics and artificial intelligence approaches. Many heuristics approaches were straightforward automation of manual practices, which have been widely studied and documented [7,22].
For combinatorial problems, exact optimization usually requires large computational times to produce optimal solutions. In contrast, metaheuristic approaches can produce satisfactory results in reasonably short times. In recent years, metaheuristics including tabu search algorithm (TS), genetic algorithm (GA) and simulated annealing, have been proven as very efficient in obtaining near-optimal solutions for a variety of hard combinatorial problems including the NSP [4].
Some TS approaches have been proposed to solve the NSP. In TS, hard constraints remained fulfilled, while solutions move in the following way: calculate the best possible move which is not tabu, perform the move and add characteristics of the move to the tabu list. The TS with strategic oscillation used to tackle the NSP in a large hospital is presented in [10].
GA, which is stochastic metaheuristics method, has also been used to solve the NSP. In GA, the basic idea is to find a genetic representation of the problem so that 'characteristics' can be inherited. Starting with a population of randomly created solutions, better solutions are more likely to be selected for recombination into novel solutions. In addition, these novel solutions may be formed by mutating or randomly changing the old ones [8].

CBR
CBR is a technique that has its origins in knowledge-based systems. CBR systems learn from previous situations. The main element of a CBR system is the CASE BASE. It is a structure that stores problems, elements (cases), and their solutions. So, a case base can be visualized as a database that stores a collection of problems with some sort of relationship to solutions to every new problem, which gives the system the ability to generalize to solve any new problem.
The learning capabilities of CBR system rely on their own structures, which consist of four main phases: retrieval, reuse, revision and retain. Figure 1 shows a graphical representation of those four phases. The retrieval phase consists of finding the cases in the CASE BASE that most closely resemble the proposed problem. Once a series of cases have been extracted from the CASE BASE, they must be reused by the system. In the second phase, the selected cases are adapted to fit the current problem. After offering a solution to the problem, it is then revised, to check whether the proposed alternative is in fact a reliable solution to the problem. If the proposal is confirmed, it is retained by the system, modifying some knowledge containers and could eventually serve as a solution for problems in the future.
CBR has been used to solve a variety of problems in health care sciences [24], financial predictions [13,14,16], an agent system for detecting Structured Query Language (SQL) injection attacks [11], solving the oil spill problem [9] and everywhere where images can play a key role [6].

Modelling the NSP
This research is focused on cyclic scheduling on NSP in planning period in intensive care unit (ICU) at the OIoV. Cyclic scheduling is used here, where each nurse follows a pattern repeated in consecutive scheduling periods.

Hard constraints
Recently, duty rosters are generated manually by head nurse for ICU, which enables the nurses to express their requests and preferences for working/or not working certain shifts, holidays and days off. Nurses in the unit have different skills categories, meaning different qualifications, specialization training, experience and gender, presented in Table 1. Regular work days are 5 days per week, from Monday to Friday. Regular working hours are 7 hours and 12 minutes. Full-time nurses are defined by: multiple regular work days * regular working hours. When this number is rounded, it represents the total number of shifts allowed per month. Nurses can work in three On-duty shifts:  Table 2.
Hard requests define a constraint that must be respected in the roster and Soft requests define the preferred option expressed by a nurse which is desirable but can be violated in the roster if needed. Some typical values for a few of the constraints are given below: • Min (max) nurses on shifts: In the OIoV, three nurses in Day shifts, three nurses in Night shifts; • It is not desirable to work a Night shift followed by a Day shift; • After 5 Morning shifts, 2 days off must be assigned; • After a break of more than 7 days, (annual leave, sick leave) Day shift must be assigned; • Maximum differences between Day shifts and Night shifts per nurse could be no greater than five; • At least one of the members of Shift must be shift leader, which is for every nurse defined in Table 1; • Max (min) days: Full-time nurses may not work more than predetermined number of days.
The ideal and proposed work shift dynamic is Day-Night-Off-Off-Off (DNOOO). Day-Night, meaning that 2 work shifts and 3 days off in 5 days is ideal shift dynamic. This is recommended by the OIoV management. Also, the DONOO dynamic is allowed, where there are 2 work shifts and 3 days off in 5 days, other combinations of two work shifts and 3 days off in 5 days are allowed as well. But, in the real world, when creating the nurse scheduling, it is impossible to have ideal work shift dynamic. For that reason, a more difficult shift dynamic is allowed, e.g. 3 working days and 2 days off (DDNOO) (DNNOO). Other dynamics of 3 working days and 2 days off are allowed as well. After five Morning shifts (XXXXX) 2 days off must be assigned to create (XXXXXOO) dynamic.

Empirical data set-CBR representation
For this experiment original real-world data set between 1 January and 31 January 2014, from ICU of OIoV is used.
The part of experimental data set is presented in Table 2 where the columns are presented: There is no Solution for that case, and it will be calculated when the system calculates schedule for nurse N-01 for 01.02. In CBR basic representation, Case No. = 26 presents NEW CASE, and the hybrid system will try to find Optimal Solution for it. All cases (data set) stored in CASE BASE can be described in the same manner as for the previous nurse. All the cases in CASE BASE which have Solution will be used in reused and revised CBR phases for detecting the best Solution in NSP.

The algorithm for NSP
The proposed hybrid model is obtained by combining CBR and general linear empirical model with arbitrary coefficients. The basic steps of the proposed hybrid algorithm for NSP are summarized by the pseudo code shown in Algorithm 1 and the most important CBR phases and general linear empirical model with arbitrary coefficients are presented in blue bold colour. Our algorithm is inspirited by integration of CBR method, which is discussed in Section 2.2, CASE BASE representation is shown in Section 3.2. The general linear empirical model with arbitrary coefficient defined from (1-4) is in detail presented and discussed in [20].
Equation (1) presents calculation of weighted value Va(t,d) Day shift occurrence d for nurse t, while S 7(t,d) presents frequency of next letter = D when it is calculated for nurse t for pattern. The same logic applies to S 6(t,d), S 5(t,d), S 4(t,d), S 3(t,d). Also, (2) calculation of weighted value Va(t,n) Night shift occurrence n for nurse t, while S 7(t,n) presents frequency of next letter = N when it is calculated for nurse t. The same logic applies to S 6(t,n), S 5(t,n), S 4(t,n), S 3(t,n).

Equation (3) presents calculation of weighted value Va(t,o) Day off shift occurrence o for nurse t, while S 7(t,o) presents frequency of next letter = O when it is calculated for nurse t. The same logic applies to S 6(t,o), S 5(t,o), S 4(t,o), S 3(t,o). The values of Va(t,d), Va(t,n)
and Va(t,o) then must be normalized and as such represent probability for shift occurrence for a specific worker for the next day.
Va(t,n) = S_7(t,n) + 0.8 * S_6(t,n) + 0.6 * S_5(t,n) + 0.4 * S_4(t,n) + 0.2 * S_3(t,n) Considering that the letters, candidates for the next shift occurring after the pattern string of cases with different lengths, do not have the same significance, the weighted factor for each of them is introduced. Thus, the weighted factor of cases for the frequency of pattern String-7 is 1, the weighted factor cases for the frequency of pattern String-6 is 0.8, the weighted factor cases for the frequency of pattern String-5 is 0.6, the weighted factor cases for the frequency of pattern String-4 is 0.4 and the weighted factor cases for the frequency of pattern String-3 is 0.2, as shown in (1-3).
Partially target function, defined by arguments of the maxima (arg max), which occurs in (1-3) is presented in (4), where arg max refers to the inputs, or arguments, at which the function outputs are as large as possible. The arg max are the points of the domain of some function at which the function values are maximized. Table 3 presents calculation for the nurse candidates for the 1 February based on the range from (1)(2)(3). For every workday it is necessary to select three nurses for Day shift, and three nurses for Night shift.
Revision process considers additional calculations which are control hard constraints, corrects and modifies calculated data Day and Night shift candidates for Final solutions for the Day and Night shift. Therefore, it is interesting to see Table 3 where, in N-09, there is a great imbalance between Day shift -9 and Night shift -4. The system allows the greatest difference between Day shifts and Night shifts to be <= 2 in the same month. Therefore, N-09 cannot work Day Shift, because shifts type imbalance would be even greater. N-09 becomes Night shift candidate.
Also, revision, Day shift candidate, will consist of the following nurses in the following order: N-17, N-06, N-15, N-11 and N-12. In the first three Day shift candidates the constraint that at least one member of Shift must be shift leader is satisfied, even two of them N-06 and N-15. Finally, Day shift for 1 February could be completed: N-17, N-15 and shift leader is N-06.
Following the rules and constraints in CBR revise phase, N-09 is Night shift candidate, but on the other side N-03 has a great imbalance between Night shift-8 and Day shift-4. Therefore, N-11 and N-12 are added from Day shift candidate, also because both of them have higher working percent 16.41 then N-13 with working percent 12.50. And now Night shift candidates are as follows: N-09, N-07, N-08, N-16, N-11, N-12 and N-13. Finally, Night shift for 1 February could be completed: shift leader is N-09, and the members are N-07, N-08. Now, the whole shifts for the 1 February are completed.
In CBR retained phase, now is time to update some cases stored in CASE BASE, which are empty and are shown in grey in Table 2. It is necessary to fill the Field 7 Solution with appropriate Shift

Begin
Step 1  The data store in CASE BASE after retained phase is presented in Table 4. The rest of the schedule for the whole period continues as previously described: algorithm, hard constraints, soft constraints and established rules, and in the 2D timetabling is presented in Table 5.

Verification of experimental results
To verify our research methods, experimental results are compared with real-world data set obtained from the OIoV. A cumulative Workload is calculated for every nurse and summarized for every month, and after that Coefficient of variation (CV) is calculated and compared with original data set and experimental data set. Cumulative workload for February presents workload from the beginning of the year until February. The same logic applies to cumulative workload for March. Coefficient of variation is dimensionless and is defined as the ratio of the standard deviation (SD) to the mean. Hence, coefficient of variation is a useful quantity for comparing the variability in data sets having  [21]. Then Correlation between Means and statistical Significant Difference F-test between data sets is calculated, and results are presented in Table 6. Comparison means of real-world data set and experimental data set can be shown as high correlations, and CV is lower in every month in experimental data set as presented in Table 6. Experimental model is, also, verified by Univariate Analysis of Variance, by the Tests of Between-Subjects Effects, and interaction effect between real-world data set, and experimental data set for June was not statistically significant, F = 0.108, Sig = 0.744, which is shown in Table 6. This result shows that the differences between real-world and experimental data set do not exist and it could be concluded that experimental results fit quite well when they are compared in Workload aspect.
The nurses' satisfaction level in original (previous) real-world shift planning and in new nurse scheduling timetable are compared, and on one hand, it can be concluded that nurses are, in general, much more satisfied, but on the other hand a small ward group of nurses are not working together, as in previous timetable. Taken generally, it is much better, for ward work quality, when employees are combined according to different criteria: years of service-experience, specialization training, shift leader, sex and age. Also, it is important to mention that nurse scheduling is now generated by 'machine-computer' as unknown person, and employee nurses cannot react as on personal subjective activity of head nurse.

Conclusion and future work
The aim of this paper is to propose the new hybrid strategy for detecting the optimal solution in NSP. The new proposed hybrid approach is obtained by combining CBR and general linear empirical model with arbitrary coefficients. The model is tested with original real-world data set obtained from the OIoV in Serbia.
The data set is represented in CASE BASE as a database structure where problems, elements (cases), and their solutions are stored. All the cases in CASE BASE which have Solution will be used in reused and revised CBR phases for detecting best Solution in NSP, using hard/soft constraints, rules and general linear empirical model with arbitrary coefficients. In CBR retained phase, some cases stored in CASE BASE are updated, and the new cases are added.
Preliminary experimental results encourage further research because data set is stored in database and it is easily manipulated. Our future research will focus on creating new hybrid model combined by intuitive thinking style, to solve problems logically, considering different options until discover the best solution, which will efficiently solve NSP. The new model will be tested with original realworld data set for longer periods, including the year 2017, obtained from the OIoV in Serbia.