Abstract
In our study we presented an effective method for clustering
of Web pages. From flat HTML files we extracted keywords,
formed feature vectors as representation of Web pages and
applied them to a clustering method. We took advantage of the
Fuzzy C-Means clustering algorithm (FCM). We demonstrated an
organized and schematic manner of data collection. Various categories
of Web pages were retrieved from ODP (Open Directory Project) in
order to create our datasets. The results of clustering proved that
the method performs well for all datasets. Finally, we presented a
comprehensive experimental study examining: the behavior of the algorithm
for different input parameters, internal structure of datasets and
classification experiments.