Deep Interactive Object Selection摘要

Abstract

Previous algorithms require substantial user interactions to estimate the foreground and background distributions. In this paper, we present a novel deep-learning-based algorithm which has a much better understanding of objectness and thus can reduce user interactions to just a few clicks. Our algorithm transforms user-provided positive and negative clicks into two Euclidean distance maps which are then concatenated with the RGB channels of images to compose (image, user interactions)pairs.

1.Introduction

The advantage of our approach over the others is the capability to understand objectness and semantics by leveraging deep learning techniques.

A seemingly plausible transformation of those approaches to interactive segmentation is that we first perform semantic segmentation on the whole image and then select the connected components which contain user-provided selections. However, there exists at least three problems with this approach. First, it is not always clear how to response to use inputs. For example, if the user places a foreground click and background click inside the same class label, this approach cannot response to that. Second, current semantic segmentation methods do not support instance-level segmentation while that is often the user’s desire. Last but not the least, current semantic segmentation approaches do not generalize to unseen objects. This means that we have to train a model for every possible object in the world, which is obviously impractical.

2.Related works

Other work has looked at improving the boundary localization of CNN semantic segmentation approaches. Chen etal.[3]combine the outputs of FCNs with fully connected CRF. Zheng et al. [26] formulate mean-field approximate inference as RNNs and train with FCNs end-to-end. They improve the mean intersection over union (IU) accuracy of FCNs from 62.2% to 71.6% and 72% respectively. Although our FCN models are quite general to be combined with their approaches, their segmentation results are far less acceptable for the interactive segmentation task. Therefore, we propose a simple yet effective approach that combine graph cut optimization with our FCN output maps, which enables our algorithm achieve high IU accuracy with even a single click.

:在目标检测的评价体系中,有一个参数叫做IoU(交并比),即模型产生的目标窗口与原来标记窗口的交叠率。可以简单理解为:检测结果(Detection Result)与Ground Truth的交集比上它们的并集,即为检测的准确率。

如下图所示:
蓝色的框是:GroundTruth
黄色的框是:DetectionResult
绿色的框是:DetectionResult GroundTruth
红色的框是:DetectionResult GroundTruth

这里写图片描述

3.1.Transforming user interactions

Further detailed information refer paper. https://arxiv.org/pdf/1603.04042.pdf

3.2.Simulating user interactions

Let O be Let O be the set of ground truth pixels of the object and let us define a new set G = {pij|pij ∈ O or f(pij|O) ≥ d}.  Let O be the set of ground truth pixels of the object and let us define a new set G = {pij|pij ∈ Oor f(pij|O) ≥ d}(此处笔者存疑,若取满足 f(pij|O) ≥ d的点加入G,那么似乎背景中很多大于d的点都满足,所以是不是应该改成 f(pij|O) 《d). Let Gc denote the complementary set of G. It is easy to see that the pixels in Gc have two properties: 1) they are background pixels and 2) they are within a certain distance range to the object.

 

Share this to:

发表评论

电子邮件地址不会被公开。 必填项已用*标注