microsoft/AI-For-Beginners

Public

mirrored fromhttps://github.com/microsoft/AI-For-BeginnersAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
6ec4c57be15223de213e5d506ac99ec417875580

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

lessons/4-ComputerVision/11-ObjectDetection/README.md

176lines · modecode

1# Object Detection
2
3The image classification models we have dealt with so far took an image and produced a categorical result, such as the class 'number' in a MNIST problem. However, in many cases we do not want just to know that a picture portrays objects - we want to be able to determine their precise location. This is exactly the point of **object detection**.
4
5## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/111)
6
7![Object Detection](images/Screen_Shot_2016-11-17_at_11.14.54_AM.png)
8
9> Image from [YOLO v2 web site](https://pjreddie.com/darknet/yolov2/)
10
11## A Naive Approach to Object Detection
12
13Assuming we wanted to find a cat on a picture, a very naive approach to object detection would be the following:
14
151. Break the picture down to a number of tiles
162. Run image classification on each tile.
173. Those tiles that result in sufficiently high activation can be considered to contain the object in question.
18
19![Naive Object Detection](images/naive-detection.png)
20
21> *Image from [Exercise Notebook](ObjectDetection-TF.ipynb)*
22
23However, this approach is far from ideal, because it only allows the algorithm to locate the object's bounding box very imprecisely. For more precise location, we need to run some sort of **regression** to predict the coordinates of bounding boxes - and for that, we need specific datasets.
24
25## Regression for Object Detection
26
27[This blog post](https://towardsdatascience.com/object-detection-with-neural-networks-a4e2c46b4491) has a great gentle introduction to detecting shapes.
28
29## Datasets for Object Detection
30
31You might run across the following datasets for this task:
32
33* [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) - 20 classes
34* [COCO](http://cocodataset.org/#home) - Common Objects in Context. 80 classes, bounding boxes and segmentation masks
35
36![COCO](images/coco-examples.jpg)
37
38## Object Detection Metrics
39
40### Intersection over Union
41
42While for image classification it is easy to measure how well the algorithm performs, for object detection we need to measure both the correctness of the class, as well as the precision of the inferred bounding box location. For the latter, we use the so-called **Intersection over Union** (IoU), which measures how well two boxes (or two arbitrary areas) overlap.
43
44![IoU](images/iou_equation.png)
45
46> *Figure 2 from [this excellent blog post on IoU](https://pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/)*
47
48The idea is simple - we divide the area of intersection between two figures by the area of their union. For two identical areas, IoU would be 1, while for completely disjointed areas it will be 0. Otherwise it will vary from 0 to 1. We typically only consider those bounding boxes for which IoU is over a certain value.
49
50### Average Precision
51
52Suppose we want to measure how well a given class of objects $C$ is recognized. To measure it, we use **Average Precision** metrics, which is calculated as follows:
53
541. Consider Precision-Recall curve shows the accuracy depending on a detection threshold value (from 0 to 1).
552. Depending on the threshold, we will get more or less objects detected in the image, and different values of precision and recall.
563. The curve will look like this:
57
58<img src="https://github.com/shwars/NeuroWorkshop/raw/master/images/ObjDetectionPrecisionRecall.png"/>
59
60> *Image from [NeuroWorkshop](http://github.com/shwars/NeuroWorkshop)*
61
62The average Precision for a given class $C$ is the area under this curve. More precisely, Recall axis is typically divided into 10 parts, and Precision is averaged over all those points:
63
64$$
65AP = {1\over11}\sum_{i=0}^{10}\mbox{Precision}(\mbox{Recall}={i\over10})
66$$
67
68### AP and IoU
69
70We shall consider only those detections, for which IoU is above a certain value. For example, in PASCAL VOC dataset typically $\mbox{IoU Threshold} = 0.5$ is assumed, while in COCO AP is measured for different values of $\mbox{IoU Threshold}$.
71
72<img src="https://github.com/shwars/NeuroWorkshop/raw/master/images/ObjDetectionPrecisionRecallIoU.png"/>
73
74> *Image from [NeuroWorkshop](http://github.com/shwars/NeuroWorkshop)*
75
76### Mean Average Precision - mAP
77
78The main metric for Object Detection is called **Mean Average Precision**, or **mAP**. It is the value of Average Precision, average across all object classes, and sometimes also over $\mbox{IoU Threshold}$. In more detail, the process of calculating **mAP** is described
79[in this blog post](https://medium.com/@timothycarlen/understanding-the-map-evaluation-metric-for-object-detection-a07fe6962cf3)), and also [here with code samples](https://gist.github.com/tarlen5/008809c3decf19313de216b9208f3734).
80
81## Different Object Detection Approaches
82
83There are two broad classes of object detection algorithms:
84
85* **Region Proposal Networks** (R-CNN, Fast R-CNN, Faster R-CNN). The main idea is to generate **Regions of Interests** (ROI) and run CNN over them, looking for maximum activation. It is a bit similar to the naive approach, with the exception that ROIs are generated in a more clever way. One of the majors drawbacks of such methods is that they are slow, because we need many passes of the CNN classifier over the image.
86* **One-pass** (YOLO, SSD, RetinaNet) methods. In those architectures we design the network to predict both classes and ROIs in one pass.
87
88### R-CNN: Region-Based CNN
89
90[R-CNN](http://islab.ulsan.ac.kr/files/announcement/513/rcnn_pami.pdf) uses [Selective Search](http://www.huppelen.nl/publications/selectiveSearchDraft.pdf) to generate hierarchical structure of ROI regions, which are then passed through CNN feature extractors and SVM-classifiers to determine the object class, and linear regression to determine *bounding box* coordinates. [Official Paper](https://arxiv.org/pdf/1506.01497v1.pdf)
91
92![RCNN](images/rcnn1.png)
93
94> *Image from van de Sande et al. ICCV’11*
95
96![RCNN-1](images/rcnn2.png)
97
98> *Images from [this blog](https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e)
99
100### F-RCNN - Fast R-CNN
101
102This approach is similar to R-CNN, but regions are defined after convolution layers have been applied.
103
104![FRCNN](images/f-rcnn.png)
105
106> Image from [the Official Paper](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf), [arXiv](https://arxiv.org/pdf/1504.08083.pdf), 2015
107
108### Faster R-CNN
109
110The main idea of this approach is to use neural network to predict ROIs - so-called *Region Proposal Network*. [Paper](https://arxiv.org/pdf/1506.01497.pdf), 2016
111
112![FasterRCNN](images/faster-rcnn.png)
113
114> Image from [the official paper](https://arxiv.org/pdf/1506.01497.pdf)
115
116### R-FCN: Region-Based Fully Convolutional Network
117
118This algorithm is even faster than Faster R-CNN. The main idea is the following:
119
1201. We extract features using ResNet-101
1211. Features are processed by **Position-Sensitive Score Map**. Each object from $C$ classes is divided by $k\times k$ regions, and we are training to predict parts of objects.
1221. For each part from $k\times k$ regions all networks vote for object classes, and the object class with maximum vote is selected.
123
124![r-fcn image](images/r-fcn.png)
125
126> Image from [official paper](https://arxiv.org/abs/1605.06409)
127
128### YOLO - You Only Look Once
129
130YOLO is a realtime one-pass algorithm. The main idea is the following:
131
132 * Image is divided into $S\times S$ regions
133 * For each region, **CNN** predicts $n$ possible objects, *bounding box* coordinates and *confidence*=*probability* * IoU.
134
135 ![YOLO](images/yolo.png)
136
137> Image from [official paper](https://arxiv.org/abs/1506.02640)
138
139### Other Algorithms
140
141* RetinaNet: [official paper](https://arxiv.org/abs/1708.02002)
142 - [PyTorch Implementation in Torchvision](https://pytorch.org/vision/stable/_modules/torchvision/models/detection/retinanet.html)
143 - [Keras Implementation](https://github.com/fizyr/keras-retinanet)
144 - [Object Detection with RetinaNet](https://keras.io/examples/vision/retinanet/) in Keras Samples
145* SSD (Single Shot Detector): [official paper](https://arxiv.org/abs/1512.02325)
146
147## ✍️ Exercises: Object Detection
148
149Continue your learning in the following notebook:
150
151[ObjectDetection.ipynb](ObjectDetection.ipynb)
152
153## Conclusion
154
155In this lesson you took a whirlwind tour of all the various ways that object detection can be accomplished!
156
157## 🚀 Challenge
158
159Read through these articles and notebooks about YOLO and try them for yourself
160
161* [Good blog post](https://www.analyticsvidhya.com/blog/2018/12/practical-guide-object-detection-yolo-framewor-python/) describing YOLO
162 * [Official site](https://pjreddie.com/darknet/yolo/)
163 * Yolo: [Keras implementation](https://github.com/experiencor/keras-yolo2), [step-by-step notebook](https://github.com/experiencor/basic-yolo-keras/blob/master/Yolo%20Step-by-Step.ipynb)
164 * Yolo v2: [Keras implementation](https://github.com/experiencor/keras-yolo2), [step-by-step notebook](https://github.com/experiencor/keras-yolo2/blob/master/Yolo%20Step-by-Step.ipynb)
165
166## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/211)
167
168## Review & Self Study
169
170* [Object Detection](https://tjmachinelearning.com/lectures/1718/obj/) by Nikhil Sardana
171* [A good comparison of object detection algorithms](https://lilianweng.github.io/lil-log/2018/12/27/object-detection-part-4.html)
172* [Review of Deep Learning Algorithms for Object Detection](https://medium.com/comet-app/review-of-deep-learning-algorithms-for-object-detection-c1f3d437b852)
173* [A Step-by-Step Introduction to the Basic Object Detection Algorithms](https://www.analyticsvidhya.com/blog/2018/10/a-step-by-step-introduction-to-the-basic-object-detection-algorithms-part-1/)
174* [Implementation of Faster R-CNN in Python for Object Detection](https://www.analyticsvidhya.com/blog/2018/11/implementation-faster-r-cnn-python-object-detection/)
175
176## [Assignment: Object Detection](lab/README.md)
177