The k-means algorithm, a straightforward and widely used clustering algorithm. Given a set of objects (records), the goal of clustering or segmentation is to divide these objects into groups or “clusters” such that objects within a group tend to be more similar to one another as compared to objects belonging to different groups. In other words, clustering algorithms place similar points in the same cluster while placing dissimilar points in different clusters.Note that,in contrast to supervised tasks such as regression or classiﬁcation where there is a notion of a target value or class label, the objects that form the inputs to a clustering procedure do not come with an associated target. Therefore, clustering is often referred to as unsupervised learning. Because there is no need for labeled data, unsupervised algorithms are suitable for many applications where labeled data is difﬁcult to obtain. Unsupervised tasks such as clustering are also often used to explore and characterize the dataset before running a supervised learning task. Since clustering makes no use of class labels, some notion of similarity must be deﬁned based on the attributes of the objects. The deﬁnition of similarity and the method in which points are clustered differ based on the clustering algorithm being applied. Thus, different clustering algorithms are suited to different types of datasets and different purposes. The “best” clustering algorithm to use therefore depends on the application. It is not uncommon to try several different algorithms and choose depending on which is the most useful.
Here are the list of top 10 algorithms used in the data mining :
- C4.5 Classification
- k-means Clustering
- SVM Statistical learning
- Apriori Association analysis
- EM Statistical learning
- PageRank Link mining
- Adaboost Ensemble learning
- kNN Classification
- Naïve Bayes Classification
- CART Classification
When it comes to evaluate the financial condition of the government, the process is focused in 4 dimension which are
For the data mining of the crisis, use of rule based induction algorithm called CHAID is famous. This algorithm provides the efficient data mining technique for segmentation or tree growing.For the levels of budgetary sustainability the most significant variables are related to the current margin, together with the importance of capital expenditure in the budgetary structure.The short-term solvency directly depends on the liquid funds possessed by the entity body. The flexibility depends mainly on the financial load per inhabitant of the municipality, on the total sum of fixed charges. Finally, financial independence depends fundamentally on the transfers that the entity receives and on the fiscal pressure, among other elements.
The detection and rectification of financial crises in local authorities is found to be fundamental interest for public-sector managers. To decide if a local authority is well managed or ill managed, it seems to be necessary to take into account a series of external factors that are dependable in this respect. In general, a control system of these characteristics makes it possible to advise different types of users of the existence of financial tensions. And users may be interested in including public-sector managers in authorities responsible for supervising the financial situation of town and city councils, or senior officers, who need to know how resources are being managed, and how this is done in comparable councils. In order to determine whether a local authority is experiencing a financial crisis, we consider the concept of financial condition, which is measured by means of different elements, listed out above. This concept can be divided into other, more specific, aspects, such as those of flexibility, sustainability and vulnerability (Greenberg and Hillier 1995; CICA 1997). Finally, long-term solvency is measured through the incorporation of a considerable period of time into the indicators considered. However, there is a problem concerning the measurement of the elements that constitute financial condition, namely the non-existence of an instrument that can be used to measure the different aspects that make it up, bearing in mind the large body of variables that can be applied for this purpose, as well as the need to take a long- term view. We propose a means of overcoming this problem, by applying data mining using the CHAID algorithm. This methodology enables us to create non-binary decision trees, with multiple branches for each node, providing occurrence probabilities via exclusionary rules, and is especially suitable for large sample sizes, for which, in principle, no model has yet been established for the phenomenon in question. The financial condition, in the terms defined in the present chapter, provides the characteristics necessary for such an application. The results obtained from applying the above methodology to evaluating financial independence, short-term solvency, flexibility and budgetary sustainability are highly satisfactory. The models derived, for all the Spanish local authorities analyzed, produced a success rate of over 63%, while in the case of financial independence, over 80% accuracy was achieved. Clearly, the model developed presents a high degree of explanatory and predictive capacity. For the specific cases of the worst and best values, i.e. the first and fourth quartiles for each of the elements of financial condition analyzed, an even higher rate of accuracy was recorded, ranging from 86% (for the case of the local authorities with the worst situation regarding short-term solvency) to 99.9% (for those authorities with the highest levels of financial independence). The results also suggest that the characterization of the financial condition by means of four models is a good method, as the main rules created by means of the different decision trees are made up of variables that differ depending on the element to be analyzed. In order to maintain the levels of budgetary sustainability, the most significant variables are those related to the current margin (gross savings), together with the importance of capital expenditure in the budgetary structure, while on the other hand, the short-term solvency depends on the liquid funds possessed by the entity, on the time elapsing before payments are made and received, and on the fixed charges to be met. The flexibility, however, depends on the financial load per inhabitant of the municipality, on the total sum of fixed charges, and on certain variables related to the implementation of current receipts. Finally, financial independence depends fundamentally on the transfers that the entity receives (an aspect that is predictable) and on the fiscal pressure, among other elements. It would be useful in the future to include other lines of research based on the introduction of variables concerning the social and economic context, as well as variables related to the way in which public services are managed, as these factors influence the characterization of local authorities’ financial behavior. Furthermore, it is recommended the consideration of other algorithms, within the data mining method, that could make it possible to achieve higher success rates and thus reduce the risks involved. Considering all the local authorities in question in order to classify and predict financial behavior in local government. As per from the methodological point of view, it would be more appropriate to apply other algorithms to compare the stability and prediction power of the model created, in particular, the advanced version C5.0 (Chesney, 2009), which improves how missing values are dealt with. In addition, we are aware that the automatic discretization of the continuous explanatory variables could represent a strongly impacting pre-processing statement, one that might not be necessary in certain other tree algorithms. However, since our goal is to measure the four elements of the financial condition, such an extension of the study would lead to the word limits of a book chapter being exceeded. Therefore the focusing part is on the implementation of the CHAID method to each of the above four elements, to obtain preliminary results as a starting point for future research on which we are currently working.