Abstract
This paper proposes a novel semisupervised regression framework for estimating chlorophyll concentrations in subsurface waters from remotely sensed imagery. This framework integrates multiobjective optimization and Gaussian processes (GPs) for boosting the accuracy of the estimation process when conditioned by limited labeled-sample availability. To this end, the labeled samples are exploited in conjunction with unlabeled ones (available at zero cost from the image under analysis) for learning the regression model. The estimation of the target of these unlabeled samples is handled by the simultaneous optimization of two different criteria expressing the generalization capabilities of the GP estimator. The first is the empirical risk quantified in terms of the mean square error measure, and the second is the log marginal likelihood, which merges two terms expressing the model complexity and the data fit capability, respectively. In order to alleviate the computational burden and, possibly, to improve the estimation process accuracy, two different selection strategies of unlabeled samples are compared to the simple random-sampling procedure. They are based on the estimated variance provided by the GP estimator and the differential entropy measure, respectively. Experimental results obtained on simulated and real data sets are reported and discussed.