A hybrid fuzzy clustering approach for diagnosing primary headache disorder

Clustering is one of the most fundamental and essential data analysis tasks with broad applications. It has been studied in various research fields: data mining, machine learning, pattern recognition and in engineering, economics and biomedical data analysis. Headache is not a disease that typically shortens one’s life, but it can be a serious social as well as a health problem. Approximately 27 billion euros per year are lost through reduced work productivity in the European community. This paper is focused on a new strategy based on a hybrid model for combining fuzzy partition method and maximum likelihood estimation clustering algorithm for diagnosing primary headache disorder. The proposed hybrid system is tested on two data sets for diagnosing headache disorder collected from Clinical Centre of Vojvodina in Serbia.


Introduction
Clustering is one of the most fundamental and essential data analysis tasks with broad applications. It is a process in which a group of unlabeled patterns are partitioned into several sets so that similar patterns are assigned to the same cluster, and dissimilar patterns are assigned to different clusters. The purpose of clustering is to identify natural groupings of data from a large data set to produce a concise representation of a system's behaviour.
The unsupervised nature of the problem implies that its structural characteristics are not known, except in the case of domain knowledge available in advance. There are some goals for clustering algorithms: (1) estimating the optimal number of clusters, (2) determining good clusters and (3) doing so efficiently. One of the main difficulties for cluster analysis is estimating the optimal and the correct number of clusters of different types of data sets.
Modern medicine generates a great deal of information stored in the medical database. Extracting useful knowledge and making scientific decision for diagnosis and treatment of disease from the database becomes increasingly necessary. Medical field is primarily directed at patient care activity and only secondarily as a research resource. The only justification for collecting medical data is to benefit the individual patient.
Headache disorders are the most prevalent of all the neurological conditions and are among the most frequent medical complains seen in general practice. More than 90% of the general population report experiencing a headache during any given year, which is a lifetime history of head pain [1]. Headache is not a disease that typically shortens one's life, but it can be a serious social as well as a health problem. Approximately 27 billion euros per year are lost through reduced work productivity in the European community. The diagnostic criteria developed by the International Headache Society (IHS) have been extensively used in the epidemiological research [2] and some automatic methods, expert systems, knowledge-based systems such as the tools that help physicians to make diagnoses are developed.
This research is focused on diagnosing certain primary headache types in different population: age, type of employment, hospitalized or outpatients. Two different data sets for diagnosing headache disorder from Clinical Centre of Vojvodina in Serbia are collected. This paper presents hybrid clustering approach for diagnosing primary headache disorder combining fuzzy partition method and maximum likelihood estimation clustering algorithm. Also, Calinski-Harabasz index is used to estimate the optimal and correct number of clusters. The proposed hybrid system is tested on these data sets and facilitated by the application of the IHS criteria for diagnosing primary headache disorder.
This paper is an extension of our previous research [3] and continues the authors' previous research in computer-assisted diagnosis methods [4][5][6] and industrial applications for clustering methods presented in [7,8].
The rest of the paper is organized in the following way: Section 2 provides an overview of the basic idea on clustering and related work. Primary headache classification is shown in Section 3. Section 4 presents model for fuzzy clustering approach for diagnosing primary headache. The preliminary experimental results are presented in Section 5. Section 6 provides conclusions and some points for future work.

Clustering, classification and related work
Clustering and classification are basic scientific tools used to systematize knowledge and analyze the structure of phenomena. Both techniques refer to the process of partitioning a set of objects into groups as dissimilar as possible from one another. The conventional distinction made between clustering and classification is the following. Clustering is a process of partitioning a set of items into a set of categories. Classification is a process of assigning a new item or observation to its proper place in an established set of categories [9]. In clustering, little or nothing is known about category structure, and the objective is to discover a structure that fits the observations. Classification is used mostly as a supervised learning method, but on the other hand clustering is used for unsupervised learning. The goal of clustering is descriptive, that of classification is predictive.

Clustering
Clustering groups data instances into subsets in such a manner that similar instances are grouped together, while different instances belong to different groups. The instances are thereby organized into an efficient representation that characterizes the population being sampled.
Formally, the clustering structure is represented as a set of subsets C = C 1 ,..., C k of S, such that: Consequently, any instance in S belongs to exactly one and only one subset.
Clustering of objects is as ancient as the human need for describing the salient characteristics of men and objects and identifying them with a type. Therefore, it embraces various scientific disciplines: from mathematics and statistics to biology and genetics, each of which uses different terms to describe the topologies formed using this analysis. From biological 'taxonomies' to medical 'syndromes' and genetic 'genotypes' to manufacturing 'group technology'-the problem is identical: forming categories of entities and assigning individuals to the proper groups within it [10].
Cluster analysis, an important technology in data mining, is an effective method of analyzing and discovering useful information from numerous data. Cluster algorithm groups the data into classes or clusters so that objects within a cluster have high similarity in comparison to one another but are very dissimilar to objects in other clusters.
General references regarding data clustering is presented in [11]. A very good presentation of contemporary data mining clustering techniques can be found in the textbook [12].

Related work in primary headache
In the past decades, many approaches have been proposed to solve clustering problem in medical data to help physicians to make decision regarding patient illness and future treatments. One of the first studies that contains the analyses of 726 headache patients dates back to 1982, and it is presented in [13]. In that research paper, cluster analysis was used to find groups (clusters) of patients with similar symptoms. After the clusters were formed, the frequency of each symptom in each cluster was tabulated. Twelve physicians: eight internists, a neurologist, a pediatrician, a cardiologist and a pathologist were then asked to use the symptom frequencies to give a name to each cluster and to prescribe a typical therapy for each cluster. The 'IF-THEN' rule algorithm was developed and tested, and 92.3% of the headache sufferers are in the correct cluster. Only 56 (7.7%) patients are misclassified.
The estimation of the prevalence of headaches and relationship of headache symptoms with severity and duration of attacks are presented in research paper [14]. The data is collected by telephone interviews that were carried out among 10,169 subjects and analyzed to determine whether symptoms clustered into specific types of headache. The reports of symptoms are all binary (present/absent); therefore, the statistical requirements for parametric analyses precluded strategies based on correlations and commonly used clustering algorithms. An iterative solution was used to represent proximity clustering matrices for solution in n dimensional space, and the proximity clustering matrices are described in detail in [15].
Responses of 150 headache-prone subjects: 49 migraine and 101 tension-type headache (TTH) were examined [16]. Using a cluster analysis, the adjectives were grouped into 7 clusters including 5 sensory and 2 affective sub-groups. Headache was most commonly described in terms of clusters that ref lected discomfort and aching pain sensations. Migraine and TTH sufferers did not differ markedly in pain quality but the intensity of pain differentiated these groups. A complete linkage cluster analysis, one of several methods of agglomerative hierarchical clustering, was carried out to examine the relationship between descriptors. Experimental result shows correspondence of migraine and TTH in two new cluster groups derived using cluster analysis: migraine, 93 of 101 (92.1%); TTH, 49 of 49 (100%); and misclassified, 8 (5.33%) patients, and diagnosis average accuracy is 94.67%.
Structured diagnostic interviews were conducted on 443 headache sufferers from a community sample and hierarchical cluster analysis of symptoms in both sub-samples revealed two distinct clusters: (1) unilateral pulsating pain, photophobia and phonophobia; (2) bilateral pressing/tightening pain, mild to moderate intensity and absence of nausea/vomiting [17]. These clusters indicate that headache symptoms cluster empirically in a manner consistent with IHS criteria for migraine and TTH, respectively. Also, criterion overlap problems regarding pain intensity and duration were identified.
A new migraine analysis method was proposed by using electroencephalography signals under f lash stimulation in time domain. These types of signals are commonly pre-processed before the analysis procedure, and pre-processing techniques affect the analysis results. Histogram differences in the case of f lash stimulation calculated and used as features for the healthy subjects and migraine patients. These features are applied to a k-means clustering algorithm to see clustering results of the proposed technique. Silhoutte clustering results show that a good clustering performance is evaluated as 86.6% correct clustering rate in migraine patients [18].
The ant colony optimization classification algorithm for the diagnosis of primary headaches using a website questionnaire expert system has shown overall diagnosis accuracy with 96.9% [19]. To evaluate diagnosis accuracy of artificial immune systems algorithms for classification of migraine, TTH and cluster headache have the maximum accuracy of 71% [20].
On the other hand, there are many applied statistical methods in diagnosis primary headache disorder for analyzing different features and variables: for using descriptive statistics with categorical data presented using frequencies and percentages, continuous descriptive data are presented using means and standard deviations, which were compared using the Chi-square test or Fisher's exact test to compare two groups categorical variables, the Student 't' test, Mann-Whitney 'U' test for continuous variables, logistic regression analysis to determine binary dependent variables [21]- [24].
According to previous related work, it can be concluded that accuracy is higher in some implementation classification methods than clustering methods, but in the real-world setting, the physicians do not know the type of primary headache in advance.

Primary headache classification
The International Classification of Headache Disorders-the Third Edition (ICHD-3) established the uniform terminology and consistent operational diagnostic criteria for a wide range of the headache disorders around the world [2]. The ICHD-3 provides a hierarchy of diagnoses with varying degrees of specificity. Headache disorders are identified with three or sometimes five-digit codes which is presented in detail in short identification for just two important digit codes in Table 1. All headache disorders are classified into two major groups: (A) primary headaches from ICHD-3 code 1. to 4. and (B) secondary headaches ICHD-3 code from 5. to 12. The first digit specifies the major diagnostic categories (i.e. migraine). The second digit indicates a disorder within the category (i.e. migraine without aura). Each category is then subdivided into groups, types, subtypes and subforms. Subsequent digits permit more specific diagnosis for some headache types.
When first meeting a patient, physicians who are more concerned with the detailed anamnesis and clinical examinations apply ICHD-3 criteria and can easily establish the primary headache diagnosis. If the criteria are not satisfied, the physicians will have to suggest an additional examination to a patient. The study [25] analyzes different studies that are all based on IHS recommendations. These studies deal with different approaches to attribute selection based on automatic methods, expert systems, knowledge-based systems and physicians' expert knowledge as well, as shown in Table 2. Feature selection could be divided on Stochastic and no-Stochastic feature selection methodology, a refinement of an initial stochastic feature selection task with a no-stochastic method to reduce a bit more the subset of features to be retained [26]. The study [25] shows that the most important features are: (4), (5), (6), (7), (8), (10), (12), (13) and (15) that are signed with black bold colour. These 9 features are used in the rest of the research.

Modelling the fuzzy clustering approach
In general, clustering algorithms can be grouped on given data set into clusters in 2 main different approaches: • Hard clustering: each object belongs to a specific cluster or not • Soft clustering also named-fuzzy clustering-each object belongs to each cluster to a certain degree.
The primary representative hard clustering partitioning methods are: k-means, k-medoids, kmedians and k-means++. On the other hand, the representative fuzzy partitioning methods are: fuzzy c-means clustering method, fuzzy Gustafson-Kessel clustering method and fuzzy Gath-Geva clustering method (FGGC). In this research, fuzzy maximum likelihood estimation (FMLE) with a direct distance norm based on the FGGC is used. FMLE with a direct distance norm belongs to fuzzy partitioning methods [27].

Optimal number of clusters
The concept of dense and well-separated clusters, Calinski-Harabasz index is used to estimate the optimal number of clusters by using two measures known as: the variance ratio criterion and total within sum of squares, for choosing the suitable c, number of clusters. To build Calinski-Harabasz index, it is first necessary to define the inter cluster dispersion [28]. When N the total number of observations is known, (data points), c number of clusters with their relative centroids and the global centroid, the inter-cluster dispersion B(c) (between cluster variations) is defined as: In the above expression, n t is the number of elements belonging to the cluster c, μ is the global centroid, μ i is the centroid of cluster i, and μ j is the centroid of cluster j.
The intra-cluster dispersion W(c) is defined as, within cluster variation: The Calinski-Harabasz index is defined as the ratio between B(c) and W(c): The Calinski-Harabasz index is based on comparing the weighted ratio of the between cluster sum of squares (the measure of cluster separation) and the within cluster sum of squares (the measure of how tightly packed the points are within a cluster). For low intra-cluster dispersion and a high intercluster dispersion, it is needed to find the number of clusters that maximizes this index. Ideally, the clusters should be well separated, so the between cluster sum of squares value should be large, but points within a cluster should be as close as possible to one another, resulting in smaller values of the within cluster sum of squares measure [28].
The decision to assign a point to a cluster depends only on its features and sometimes on the position of a set of other points. But also, there are different algorithms that are based on alternative strategies to solve this problem and can yield very different results. The technique improved covariance estimation for Gustafson-Kessel clustering algorithm is employed in the extraction of the rules from data and estimation of the optimal number of clusters for fuzzy partitioning methods. It calculates 7 different coefficients to estimate the optimal number of clusters: partition coe'fficient, classification entropy, partition index, separation index, Xie and Beni index, Dunn index, alternative Dunn index [29].

Fuzzy partition method
The data set is typically an observation of some physical process. Each observation consists of n measured variables, grouped into an n-dimensional row vector x k = [x k1 , x k2 ,..., x kn ] T , x k ∈ R n . A set of N observations is denoted by X = {x k | k = 1, 2,..., N} and is represented as an N x n matrix, a data set. Since clusters can formally be viewed as subsets of the data set, the number of subsets (clusters) is denoted by c. Fuzzy partition can be seen as a generalization of hard partition, it allows μ ik to attain real values in [0,1]. An N x c matrix U = [μ ik ] represents the fuzzy partitions, its conditions are given by: . Let X = [x 1 , x 2 ,..., x N ] be a finite set and let 2 ≤ c < N be an integer. The fuzzy partitioning space for X is the set The i-th column of U contains values of the membership function of the i-th fuzzy subset of X . The equation (5) constrains the sum of each column to 1, and thus the total membership of each x k in X equals one. The distribution of memberships among the c fuzzy subsets is not constrained.

FMLE clustering algorithm
The basic steps of the proposed hybrid algorithm for the FMLE clustering algorithm, which employs a distance norm based on FMLE proposed in [30], are summarized by the pseudo code shown in Algorithm 1.
In consistence with the theory, notice in previous subsection, fuzzy partition method, there is a set of data X specify c, choose a weighting exponent m > 1 and a termination tolerance ε> 0. Initialize the partition matrix with a more robust method. It is important to mention that in Step 3, the distance to the cluster center (centroid) is calculated on the basis of the fuzzy covariance matrices of the cluster.

Implemented model for diagnosing primary headache disorder
The proposed hybrid model for diagnosing primary headache disorder implemented in this research is presented in Figure 1. It has 2 phases. First phase includes: (1) estimate the optimal and correct number of clusters of input data set; (2) fuzzy partitioning step where input data set is divided into two classes, but only one of them is appropriate for further analysis, and it is called selected data.
Our previous research [3] has shown that FGGC method-for FMLE fits much better when there are only 2 clusters to distinguish. Therefore, it could be considered that selected data has only 2 clusters.
Selected data are given according to the appropriate value in questions 6 and 8, which are marked with black bold, from Table 2. The second phase is realized in 2 steps of fuzzy Gath-Geva clustering. In the first step, the patients whose diagnosis undoubtedly confirms types of primary headache are selected and they are called first time match patients marked with red bold on Figure 1. On the other hand, there are first time no-match patients and it is input in second step of fuzzy Gath-Geva clustering. The second step also creates new clusters for same types of primary headaches, but
number of these patients is much smaller than the number of patients in the first phase, and they are called second time match patients marked with red bold. The rest of patient are 'unclassified' and they are marked with green bold. Both match patients are 'summarized' and they present diagnosed patients with primary headache with an appropriate type.

Experimental results
The proposed hybrid FMLE clustering algorithm was further on, in our research, tested on 2 data sets for diagnosing headache disorder collected from Clinical Centre of Vojvodina in Serbia. Headache data set 1 is a part of large study [31] encompassing adult working population. Headache data set 2 presents a part of the study encompassing student population [32].  As mentioned before, this research uses the most important features: (4), (5), (6), (7), (8), (10), (12), (13) and (15) defined in Table 2. All headache data sets have 9 features-attributes; 4 classestypes of primary headache: migraine without aura (MWoA), migraine with aura (MWA), tensiontype headache (TTH), other primary headaches (other); missing data-no.

Experimental results headache data set 1
The input data set is headache data set 1 and consists of 579 instances. Calculated maximum Calinski-Harabasz value is 1250 and optimal number of clusters is 4 ( Figure 2). After fuzzy partition process, there are 289 selected data. Pairwise comparison classes (MWoA-other) for headache data set 1 is used in the first step of FGGC. After the first step FGGC, 205 instances are first time match, 78 patients suffer from MWoA, 127 are categorized as other headaches, and the remaining   Table 3. And finally, 480 instances out of 579 instances in headache data set 1 have been correctly evaluated. Average accuracy is 82.91%.

Experimental results headache data set 2
The input data set is headache data set 2 and consists of 132 instances. After fuzzy partition process, there are selected 97 selected data. Pairwise comparison classes (MWoA-other) for headache data set 2 is used in the first step of FGGC, after 190 iterations, 74 instances are first time match, 33 patients suffer from MWoA, 41 are categorized as other headaches, and the remaining 23 are first time no-match. The cluster centroids are: centroid 1: 2.25, 1.32; centroid 2: 1.41, 1.74 given in Figure 4 (a).
After the second step of FGGC, 4 instances are second time match patients, 0 patients are found to suffer from MWoA, 4 are categorized as other headaches, and rest 19 instances are unclassified. After 35 iterations, the cluster centroids are: centroid 1: 2.65, 1.47; centroid 2: 1.70, 1.57; Figure 4 (b). Accuracy in pairwise comparison classes (MWoA-other) is 80.41%. These presented experimental results are only for pairwise comparison between MWoA and other headaches, and the rest could be discussed in the same manner according to Table 4. And finally, 109 instances out of 132 instances in headache data set 2 have been correctly evaluated. Average accuracy is 85.68%.

Conclusion and future work
The aim of this paper is to propose the new hybrid strategy for fuzzy clustering approach for diagnosing primary headache disorder. First, the algorithm employs the model to estimate the optimal number of clusters using Calinski-Harabasz index. The new proposed hybrid approach is obtained by combining fuzzy partition method and FGGC algorithm with a distance norm based on FMLE.
Preliminary experimental results encourage further research by the authors because both experimental data sets in domain of primary headache have accuracy: the first has the average accuracy of 83% and the second has the average accuracy of 86%. Our future research will focus on creating new hybrid model combined with evolutionary techniques which will efficiently solve different wellknown data sets and also real-world medical data sets.