Abstract
The web pages are heterogeneous and unstructured. The heterogeneity is due to the hybrid nature of the documents. The unstructuredness is due to either multilingual or multimedia content in the web page.. The mining should be independent of the language and software. The objective is when any data or content mining is done on a set of data is chosen to form the basis as done with keywords. If the base data is chosen arbitrarily, it is automatic; whereas some 'knowledge' or 'background' is put in the choice it is adaptive. Statistical features of the images are extracted from the pixel map of the image. The extracted features are presented to clustering algorithms, Fuzzy C Means and Subtractive clustering algorithm. The algorithm classifies the given image as a text or image representation. The accuracy of classification is compared and presented.