microsoft/AI-For-Beginners

Public

mirrored fromhttps://github.com/microsoft/AI-For-BeginnersAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
6ec4c57be15223de213e5d506ac99ec417875580

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

lessons/4-ComputerVision/11-ObjectDetection/lab/README.md

63lines · modecode

1# Head Detection using Hollywood Heads Dataset
2
3Lab Assignment from [AI for Beginners Curriculum](https://github.com/microsoft/ai-for-beginners).
4
5## Task
6
7Counting number of people on video surveillance camera stream is an important task that will allow us to estimate the number of visitors in a shops, busy hours in a restaurant, etc. To solve this task, we need to be able to detect human heads from different angles. To train object detection model to detect human heads, we can use [Hollywood Heads Dataset](https://www.di.ens.fr/willow/research/headdetection/).
8
9## The Dataset
10
11[Hollywood Heads Dataset](https://www.di.ens.fr/willow/research/headdetection/release/HollywoodHeads.zip) contains 369,846 human heads annotated in 224,740 movie frames from Hollywood movies. It is provided in [https://host.robots.ox.ac.uk/pascal/VOC/](PASCAL VOC) format, where for each image there is also an XML description file that looks like this:
12
13```xml
14<annotation>
15 <folder>HollywoodHeads</folder>
16 <filename>mov_021_149390.jpeg</filename>
17 <source>
18 <database>HollywoodHeads 2015 Database</database>
19 <annotation>HollywoodHeads 2015</annotation>
20 <image>WILLOW</image>
21 </source>
22 <size>
23 <width>608</width>
24 <height>320</height>
25 <depth>3</depth>
26 </size>
27 <segmented>0</segmented>
28 <object>
29 <name>head</name>
30 <bndbox>
31 <xmin>201</xmin>
32 <ymin>1</ymin>
33 <xmax>480</xmax>
34 <ymax>263</ymax>
35 </bndbox>
36 <difficult>0</difficult>
37 </object>
38 <object>
39 <name>head</name>
40 <bndbox>
41 <xmin>3</xmin>
42 <ymin>4</ymin>
43 <xmax>241</xmax>
44 <ymax>285</ymax>
45 </bndbox>
46 <difficult>0</difficult>
47 </object>
48</annotation>
49```
50
51In this dataset, there is only one class of objects `head`, and for each head, you get the coordinates of the bounding box. You can parse XML using Python libraries, or use [this library](https://pypi.org/project/pascal-voc/) to deal directly with PASCAL VOC format.
52
53## Training Object Detection
54
55You can train an object detection model using one of the following ways:
56
57* Using [Azure Custom Vision](https://docs.microsoft.com/azure/cognitive-services/custom-vision-service/quickstarts/object-detection?tabs=visual-studio&WT.mc_id=academic-57639-dmitryso) and it's Python API to programmatically train the model in the cloud. Custom vision will not be able to use more than a few hundred images for training the model, so you may need to limit the dataset.
58* Using the example from [Keras tutorial](https://keras.io/examples/vision/retinanet/) to train RetunaNet model.
59* Using [torchvision.models.detection.RetinaNet](https://pytorch.org/vision/stable/_modules/torchvision/models/detection/retinanet.html) build-in module in torchvision.
60
61## Takeaway
62
63Object detection is a task that is frequently required in industry. While there are some services that can be used to perform object detection (such as [Azure Custom Vision](https://docs.microsoft.com/azure/cognitive-services/custom-vision-service/quickstarts/object-detection?tabs=visual-studio&WT.mc_id=academic-57639-dmitryso)), it is important to understand how object detection works and to be able to train your own models.