I would try to explain step by step what clustering is(discussed only with contest to web) without going into cruel mathematics behind it
1)What is clustering? :In web clustering ,pages of similar categories are grouped togther.Not clear? go on reading
2)How is it useful:Easiest way to understand this is by looking at clustered information.
Like if you search "Agglomerative" on google, you would obtain something like:
Searches related to: agglomerative
agglomerative cluster analysis agglomerative hierarchical agglomerative algorithm agglomeration thesaurus
agglomeration wikipedia
Now these related searches are hyperlinks and when you click at a specific link you would obtain
a more specific result for seach related to entered query, thus helping you to "drill down" on a specific topic!
Now that really looks helpful, isn't it ;)
Clustering is also helpful in data analysis,social networking and datamining(how ,will be discussed in later post).
3)How is clustering done? :Clustering is the most common form of unsupervised learning(i.e.almost none intervention from human is required).
Classification(will be discussed later) is different from clustering in that classification requires supervised learning.
Anyways I must make out this that clustering and classification (simultaneously)is best implemented by manual construction e.g.http://dmoz.org/ and also Yahoo Directory.
Clustering is done by various algorithms, but before going through them lets look at few terms:
1)Flat Clustering: Creates a fat set of clusters without sny explicit structure that would relate clusters to each other.
Now lets understand what it means:Go to the search: agglomerative
Click n the first cluster ,i.e. agglomerative cluster analysis (subject to change :P)
Now on clicking the hyperlink you get the query page such that it contains the clusters of form:agglomerative cluster analysis
hierarchical agglomerative cluster analysis euclidean distance cluster analysis cluster analysis dendrogram multivariate cluster analysis
cluster analysis algorithms cluster analysis matlab
If google had been using flat cluster we wouldn't have got this sub-cluster(a structure), therefore wat we r looking at is hierarical cluster :),not flat.
2)Hard Cluster:Each document is a member of exactly one cluster.
3)Soft cluster:Document may have fractional membership in several clusters
4)Non-Exaustive cluster may assignno cluster to a particular document.
No comments:
Post a Comment