Deep Person Re-identification
- Shengyong Ding, Liang Lin, Guangrun Wang, and Hongyang Chao, “Deep Feature Learning with Relative Distance Comparison for Person Re-identification”. Pattern Recognition, 48(10): 2993-3003, 2015. PDF Caffe Model
Identifying the same individual across different scenes is an important yet difficult task in intelligent video surveillance. Its main difficulty lies in how to preserve similarity of the same person against large appearance and structure variation while discriminating different individuals. In this paper, we present a scalable distance driven feature learning framework based on the deep neural network for person re-identification, and demonstrate its effectiveness to handle the existing challenges. Specifically, given the training images with the class labels (person IDs), we first produce a large number of triplet units, each of which contains three images, i.e. one person with a matched reference and a mismatched reference. Treating the units as the input, we build the convolutional neural network to generate the layered representations, and follow with the $L2$ distance metric. By means of parameter optimization, our framework tends to maximize the relative distance between the matched pair and the mismatched pair for each triplet unit. Moreover, a nontrivial issue arising with the framework is that the triplet organization cubically enlarges the number of training triplets, as one image can be involved into several triplet units. To overcome this problem, we develop an effective triplet generation scheme and an optimized gradient descent algorithm, making the computational load mainly depends on the number of original images instead of the number of triplets. On several challenging databases, our approach achieves very promising results and outperforms other state-of-the-art approaches.Experiments
Comparisons with state of the art methods
In this project, we present a scalable deep feature learning model for person re-identification via relative distance comparison. In this model, we construct a CNN network that is trained by a set of triplets to produce features that can satisfy the relative distance constraints organized by that triplet set. To cope with the cubically growing number of triplets, we present an effective triplet generation scheme and an extended network propagation algorithm to efficiently train the network iteratively. Our learning algorithm ensures the overall computation load mainly depends on the number of training images rather than the number of triplets. The results of extensive experiments demonstrate the superior performance of our model compared with the state-of-the-art methods. In future research, we plan to extend our model to more datasets and tasks.
L. Lin, T. Wu, J. Porway, Z. Xu, A stochastic graph grammar for compositional object representation and recognition Pattern Recognit., 42 (7) (2009), pp. 1297–1307
D. Gray, H. Tao, Viewpoint invariant pedestrian recognition with an ensemble of localized features, in: ECCV, Springer, 2008, pp. 262–275.
X. Wang, G. Doretto, T. Sebastian, J. Rittscher, P. Tu, Shape and appearance context modeling, in: ICCV, IEEE, 2007, pp. 1–8.
R. Layne, T.M. Hospedales, S. Gong, Towards person identification and re-identification with attributes, in: ECCV, Springer, 2012, pp. 402–412.
M. Farenzena, L. Bazzani, A. Perina, V. Murino, M. Cristani, Person re-identification by symmetry-driven accumulation of local features, in: CVPR, IEEE, 2010, pp. 2360–2367.