microsoft/AI-For-Beginners

Public

mirrored fromhttps://github.com/microsoft/AI-For-BeginnersAvailable

Watch0 Fork0 Star0

Code Commits Issues Pull requests Actions Insights Security

6ec4c57be15223de213e5d506ac99ec417875580

Find a branch or tag

Branches

6ec4c57be15223de213e5d506ac99ec417875580

Clone

HTTPS

Download ZIP

AI-For-Beginners/lessons/4-ComputerVision/11-ObjectDetection/lab

lessons/4-ComputerVision/11-ObjectDetection/lab/README.md

63lines · modecode

Raw Download

Latest commit unavailable.

unknown

1	`# Head Detection using Hollywood Heads Dataset`
2
3	`Lab Assignment from [AI for Beginners Curriculum](https://github.com/microsoft/ai-for-beginners).`
4
5	`## Task`
6
7	`Counting number of people on video surveillance camera stream is an important task that will allow us to estimate the number of visitors in a shops, busy hours in a restaurant, etc. To solve this task, we need to be able to detect human heads from different angles. To train object detection model to detect human heads, we can use [Hollywood Heads Dataset](https://www.di.ens.fr/willow/research/headdetection/).`
8
9	`## The Dataset`
10
11	`[Hollywood Heads Dataset](https://www.di.ens.fr/willow/research/headdetection/release/HollywoodHeads.zip) contains 369,846 human heads annotated in 224,740 movie frames from Hollywood movies. It is provided in [https://host.robots.ox.ac.uk/pascal/VOC/](PASCAL VOC) format, where for each image there is also an XML description file that looks like this:`
12
13	```xml
14	`<annotation>`
15	`<folder>HollywoodHeads</folder>`
16	`<filename>mov_021_149390.jpeg</filename>`
17	`<source>`
18	`<database>HollywoodHeads 2015 Database</database>`
19	`<annotation>HollywoodHeads 2015</annotation>`
20	`<image>WILLOW</image>`
21	`</source>`
22	`<size>`
23	`<width>608</width>`
24	`<height>320</height>`
25	`<depth>3</depth>`
26	`</size>`
27	`<segmented>0</segmented>`
28	`<object>`
29	`<name>head</name>`
30	`<bndbox>`
31	`<xmin>201</xmin>`
32	`<ymin>1</ymin>`
33	`<xmax>480</xmax>`
34	`<ymax>263</ymax>`
35	`</bndbox>`
36	`<difficult>0</difficult>`
37	`</object>`
38	`<object>`
39	`<name>head</name>`
40	`<bndbox>`
41	`<xmin>3</xmin>`
42	`<ymin>4</ymin>`
43	`<xmax>241</xmax>`
44	`<ymax>285</ymax>`
45	`</bndbox>`
46	`<difficult>0</difficult>`
47	`</object>`
48	`</annotation>`
49	```
50
51	In this dataset, there is only one class of objects `head`, and for each head, you get the coordinates of the bounding box. You can parse XML using Python libraries, or use [this library](https://pypi.org/project/pascal-voc/) to deal directly with PASCAL VOC format.
52
53	`## Training Object Detection`
54
55	`You can train an object detection model using one of the following ways:`
56
57	`* Using [Azure Custom Vision](https://docs.microsoft.com/azure/cognitive-services/custom-vision-service/quickstarts/object-detection?tabs=visual-studio&WT.mc_id=academic-57639-dmitryso) and it's Python API to programmatically train the model in the cloud. Custom vision will not be able to use more than a few hundred images for training the model, so you may need to limit the dataset.`
58	`* Using the example from [Keras tutorial](https://keras.io/examples/vision/retinanet/) to train RetunaNet model.`
59	`* Using [torchvision.models.detection.RetinaNet](https://pytorch.org/vision/stable/_modules/torchvision/models/detection/retinanet.html) build-in module in torchvision.`
60
61	`## Takeaway`
62
63	`Object detection is a task that is frequently required in industry. While there are some services that can be used to perform object detection (such as [Azure Custom Vision](https://docs.microsoft.com/azure/cognitive-services/custom-vision-service/quickstarts/object-detection?tabs=visual-studio&WT.mc_id=academic-57639-dmitryso)), it is important to understand how object detection works and to be able to train your own models.`

microsoft/AI-For-Beginners

Branches

Tags

Clone