Abstract
Tasks of image recognition become important components for multi-modal interface. For developing feasible components, problems of huge dimensionality and non-linearity must be resolved. Image recognition consists of three stages: calibration stage, feature extraction (or representation) stage and recognition stage. For recognition stage, state of the art methods including nonlinear methods were proposed. On the other hand, linear methods, such as principle component analysis and linear discriminant method, are commonly used yet for feature extraction stage. Self-organizing feature map and spectral clustering are candidates of the non-linear feature extraction. Both methods have many empirical successes because of their simplicity and non-linearity. In this paper, we analyze characteristic of those methods. A summary of their characteristics shows the possibility to combine the both methods into a new approach. To clarify the importance of this topic, we also describe an overview of our multi-modal interface including lip-reading.