Abstract
The problem of learning a similarity measure for cross-view person identification in a disjointed camera network is addressed. Our learning framework is based on the projected gradient approach and is suitable for large-scale applications. The hinge loss with the triplet constraints is used as the objective function. Contrary to other approaches where a dual formulation is used, we propose to optimize the objective function in the primal form, with the help of a minibatch gradient descent algorithm. The choice of learning rate and its schedule are nontrivial tasks in such optimization schemes. We have studied empirically the effect of using different strategies for learning rate schedules. Our experimental study includes conventional strategies such as fixed or diminishing learning rates and recently developed adaptive gradient methods such as root-mean-square propagation (RMSProp) for learning rate schedule. The experimental results are presented on three benchmark datasets of person reidentification, namely VIPeR, CUHK Campus, and CUHK03. Our experimental results demonstrate that the RMSProp is well suited for the proposed learning framework. A comparison with recent methods in the domain presents the justification of the proposed approach. (C) 2019 SPIE and IS&T