# k-means Clustering

The k-means algorithm, a straightforward and widely used clustering algorithm. Given a set of objects (records), the goal of clustering or segmentation is to divide these objects into groups or “clusters” such that objects within a group tend to be more similar to one another as compared to objects belonging to different groups. In other words, clustering algorithms place similar points in the same cluster while placing dissimilar points in different clusters.Note that,in contrast to supervised tasks such as regression or classiﬁcation where there is a notion of a target value or class label, the objects that form the inputs to a clustering procedure do not come with an associated target. Therefore, clustering is often referred to as unsupervised learning. Because there is no need for labeled data, unsupervised algorithms are suitable for many applications where labeled data is difﬁcult to obtain. Unsupervised tasks such as clustering are also often used to explore and characterize the dataset before running a supervised learning task. Since clustering makes no use of class labels, some notion of similarity must be deﬁned based on the attributes of the objects. The deﬁnition of similarity and the method in which points are clustered differ based on the clustering algorithm being applied. Thus, different clustering algorithms are suited to different types of datasets and different purposes. The “best” clustering algorithm to use therefore depends on the application. It is not uncommon to try several different algorithms and choose depending on which is the most useful.

`   1: Input: Dataset D, number clusters k`

`   2: Output: Set of cluster representatives C, cluster membership vector m`

`   3:     /* Initialize cluster representatives C */`

`   4:     Randomly choose k data points from D`

`   5: 5: Use these k points as initial set of cluster representatives C`

`   6:     repeat`

`   7:         /* Data Assignment */`

`   8:         Reassign points in D to closest cluster mean`

`   9:         Update m such that mi is cluster ID of ith point in D`

`  10: 10: /* Relocation of means */`

`  11:         Update C such that cj is mean of points in jth cluster`

`  12: until convergence of objective function summation(i=1 to N)(argminj||xi −cj||2 2)`

# C4.5 Classification

C4.5 is a suite of algorithms for classiﬁcation problems in machine learning and data mining. It is targeted at supervised learning: Given an attribute-valued dataset where instances are described by collections of attributes and belong to one of a set of mutually exclusive classes, C4.5 learns a mapping from attribute values to classes that can be applied to classify new, unseen instances.C4.5, designed by J. Ross Quinlan, is so named because it is a descendant of the ID3 approach to inducing decision trees,which in turn is the third incarnation in a series of “iterative dichotomizers.” A decision tree is a series of questions systematically arranged so that each question queries an attribute and branches based on the value of the attribute. At the leaves of the tree are placed predictions of the class variable. The algorithm is given below.

`   1: Input: an attribute-valued dataset D `

`   2: Tree ={}`

`   3: if D is “pure” OR other stopping criteria met then`

`   4: terminate`

`   5: end if`

`   6: for all attribute a ∈ D do`

`   7: Compute information-theoretic criteria if we split on a`

`   8: end for`

`   9: abest = Best attribute according to above computed criteria`

`  10: Tree = Create a decision node that tests abest in the root`

`  11: Dv = Induced sub-datasets from D based on abest`

`  12: for all Dv do`

`  13: Treev = C4.5(Dv)`

`  14: Attach Treev to the corresponding branch of Tree`

`  15: end for`

`  16: return Tree`

# How to define intelligence?

Well its been a very long time since my last blog. I have been finding the answers on how do we define intelligence? And here is what I have learnt. Its been a very long time on which the debate is going on the necessities of intelligence, but sadly, there is little sign of consensus. Lets have a brief talk on it :

“AI is concerned with methods of achieving goals in situations in which the information available has a certain complex character. The methods that have to be used are related to the problem presented by the situation and are similar whether the problem solver is human, a Martian, or a computer program.”

Intelligence usually means “the ability to solve hard problems”.

“By ‘general intelligent action’ it seems to be the same sort of intelligence as we see in human action.In any real situation behavior appropriate to the ends of the system and adaptive to the demands of the environment can occur, within some limits of speed and complexity.”
And after all these years of study, we still don’t know very much about it. There are a lot more queries than answers. We all know that a well-founded deﬁnition is usually the result, rather than the starting point, of scientiﬁc research. However, there are still reasons for us to be concerned about the deﬁnition of intelligence at the current time. Though clarifying the meaning of a concept always helps communication, this problem is especially important for AI. Without a clear idea of what intelligence is, it is very hard to say why AI is diﬀerent from computer science or psychology. More importantly, the researcher in this ﬁeld needs to justify his/her research plan according to such a deﬁnition. Anyone who wants to work on artiﬁcial intelligence has to face a two-phase assignment:

• to choose a working deﬁnition of intelligence
• to produce it in a computer.

A working deﬁnition is a deﬁnition concrete enough that you can directly work with it. By accepting a working deﬁnition of intelligence, it does not mean that you really believe that it fully captures the concept “intelligence”, but that you will take it as a goal for your current research project. Therefore, the lack of a consensus on what intelligence is does not prevent each researcher from picking up a working deﬁnition of intelligence. Actually, the thing is, unless we keep one deﬁnition, we wont be able to claim that we are working on artiﬁcial intelligence. By accepting a working deﬁnition of intelligence, the most important commitments a researcher makes are on the acceptable assumptions and desired results, that helps us binding all the concrete work that follows. Before studying concrete working deﬁnitions of intelligence, we need to set up a general standard for what makes a deﬁnition better than others. Carnap meets the same problem when he tried to clarify the concept of “probability”. The task “consists in transforming a given more or less inexact concept into an exact one or, rather, in replacing the ﬁrst by the second”, where the ﬁrst may belong to everyday language or to a previous stage in the scientiﬁc language, and the second must be given by explicit rules for its use. According to him, the working deﬁnition, must fulﬁll the following requirements:

1. It is similar to the concept to be deﬁned, as the latter’s vagueness permits.
2. It is deﬁned in an exact form.
3. It is fruitful in the study.
4. It is simple, as the other requirements permit.

All these requirements are very much reasonable and suitable for the current purpose. And let us have a look what they mean concretely to the working deﬁnition of intelligence:

• Similarity: Though “intelligence” has no exact meaning in everyday language, it does have some common usages with which the working deﬁnition should agree. If we consider that a normal human beings are intelligent, but most animals and machines are either not intelligent at all or much less intelligent than human beings.
• Exactness: Given the working deﬁnition, whether a system is intelligent should be clearly decidable. For this reason, intelligence cannot be deﬁned in terms of other ill-deﬁned concepts, such as mind, thinking, cognition, intentionality, rationality, wisdom, consciousness, and so on, though these concepts do have close relationships with intelligence.
• Fruitfulness: The working deﬁnition should provide concrete point for the research based on it, for instance, what assumptions can be accepted, what phenomena can be ignored, what properties are desired, and so on. Most importantly, the working deﬁnition of intelligence should contribute to the solving of fundamental problems in AI.
• Simplicity: As intelligence is surely a complex mechanism, the working deﬁnition should be simple. Theoretically, a simple deﬁnition makes it possible to explore a theory in detail; and practically a simple deﬁnition is easy to implement.

For our current purpose, there is no exactly “right” or “wrong” working deﬁnition for intelligence, but there are comparative ones. When comparing proposed deﬁnitions, the four requirements may conﬂict. For example, one deﬁnition is more fruitful, while another is simpler, other may be exact and may be the other one is similar. In such a condition, some weighting and trade-oﬀ becomes necessary factor. However, there is no evidence showing that in general the requirements cannot be satisﬁed at the same time.