Training object class detectors typically requires a large set of images with objects annotated by bounding boxes. However, manually drawing bounding boxes is very time consuming. In this paper we greatly reduce annotation time by proposing center-click annotations: we ask annotators to click on the center of an imaginary bounding box which tightly encloses the object instance.
作者在已有的标注图片的bounding box的中心点击一下作为人机交互。然后与已有的多实例模型相融合来完成弱 监督学习。We then incorporate these clicks into existing Multiple Instance Learning techniques for weakly supervised object localization, to jointly localize object bounding boxes over all training images. Extensive experiments on PASCAL VOC 2007 and MS COCO show that: (1) our scheme delivers high-quality detectors, performing substantially better than those produced by weakly supervised techniques, with a modest extra annotation effort, (2) these detectors in fact perform in a range close to those trained from manually drawn bounding boxes, (3) as the center-click task is very fast, our scheme reduces total annotation time by 9x to 18x.
Object detectors can also be trained under weak supervision using only image-level labels. While this is substantially cheaper, the resulting detectors typically deliver only about half the accuracy of their fully supervised counterparts. In this paper, we aim to minimize human annotation effort while producing high-quality detectors. To this end we propose annotating objects by clicking on their center.
For the purpose of image annotation, clicking on an object is therefore a natural choice. Clicking offers several advantages over other ways to annotate bounding boxes:
(1) is substantially faster than drawing bounding boxes ,
(2) requires little instructions or annotator training compared to drawing or verifying bounding boxes , because it is a task that comes natural to humans,
(3) can be performed using a simple annotation interface (unlike bounding box drawing ), and requires no specialized hardware (unlike eye-tracking ).
clicking主要有三大好处，1. 比画bounding box快很多。2. 比画bounding box简单。3. 写一个简单接口程序就可以了，也不需要特殊的硬件（不同于眼球追踪）
Note that the scheme we propose does not require a human-in-the-loop setup [12, 46, 47, 72, 24]: clicks can be acquired separately, independently of the detector training framework used.
Moreover, we can also ask two different annotators to provide center-clicks on the same object. As their errors are independent, we can obtain a more accurate estimate of the object center by averaging their click positions. Interestingly, given the two clicks, we can even estimate the size of the object, by exploiting a correlation between the object size and the distance of the click to the true center (error). As the errors are independent, the distance between the two clicks increases with object size. This enables to estimate size based on the distance between the clicks