Training Object Class Detectors with Click Supervision [cvpr 2017]


Training object class detectors typically requires a large set of images with objects annotated by bounding boxes. However, manually drawing bounding boxes is very time consuming. In this paper we greatly reduce annotation time by proposing center-click annotations: we ask annotators to click on the center of an imaginary bounding box which tightly encloses the object instance.

作者在已有的标注图片的bounding box的中心点击一下作为人机交互。然后与已有的多实例模型相融合来完成弱 监督学习。We then incorporate these clicks into existing Multiple Instance Learning techniques for weakly supervised object localization, to jointly localize object bounding boxes over all training images. Extensive experiments on PASCAL VOC 2007 and MS COCO show that: (1) our scheme delivers high-quality detectors, performing substantially better than those produced by weakly supervised techniques, with a modest extra annotation effort, (2) these detectors in fact perform in a range close to those trained from manually drawn bounding boxes, (3) as the center-click task is very fast, our scheme reduces total annotation time by 9x to 18x.


Object detectors can also be trained under weak supervision using only image-level labels. While this is substantially cheaper, the resulting detectors typically deliver only about half the accuracy of their fully supervised counterparts. In this paper, we aim to minimize human annotation effort while producing high-quality detectors. To this end we propose annotating objects by clicking on their center.

特定类物体实例识别在全监督学习中有不错的效果,但是这是基于大量的人工标注数据集上的。该识别任务当然也可以用image-level 的标注数据集来进行监督学习,但这样获得的模型精度大概是全监督模型的一半。本篇论文通过加入用户交互模式来实现精度人为标注作业成本双赢。

For the purpose of image annotation, clicking on an object is therefore a natural choice. Clicking offers several advantages over other ways to annotate bounding boxes:
(1) is substantially faster than drawing bounding boxes ,
(2) requires little instructions or annotator training compared to drawing  or verifying bounding boxes , because it is a task that comes natural to humans,
(3) can be performed using a simple annotation interface (unlike bounding box drawing ), and requires no specialized hardware (unlike eye-tracking ).

clicking主要有三大好处,1. 比画bounding box快很多。2. 比画bounding box简单。3. 写一个简单接口程序就可以了,也不需要特殊的硬件(不同于眼球追踪)

Note that the scheme we propose does not require a human-in-the-loop setup [12, 46, 47, 72, 24]: clicks can be acquired separately, independently of the detector training framework used.

本文提出的方法不需要在loop 中进行用户点击,点击与框架训练分离。

Moreover, we can also ask two different annotators to provide center-clicks on the same object. As their errors are independent, we can obtain a more accurate estimate of the object center by averaging their click positions. Interestingly, given the two clicks, we can even estimate the size of the object, by exploiting a correlation between the object size and the distance of the click to the true center (error). As the errors are independent, the distance between the two clicks increases with object size. This enables to estimate size based on the distance between the clicks


Share this to:


邮箱地址不会被公开。 必填项已用*标注