Discovering Human Interactions in Videos with Limited Data Labeling
We present a novel approach for discovering human interactions in videos. Activity
understanding techniques usually require a large number of labeled examples, which
are not available in many practical cases. Here, we focus on recovering semantically
meaningful clusters of human-human and human-object interaction in an unsupervised
fashion. A new iterative solution is introduced based on Maximum Margin Clustering
(MMC), which also accepts user feedback to refine clusters. This is achieved by
formulating the whole process as a unified constrained latent max-margin clustering
problem. Extensive experiments have been carried out over three challenging datasets,
Collective Activity, VIRAT, and UT-interaction. Empirical results demonstrate that
the proposed algorithm can efficiently discover perfect semantic clusters of human
interactions with only a small amount of labeling effort.