microsoft/AI-For-Beginners

Public

968117204e3d0612b3a8a8c7b6eab42ecc1ea926

Find a branch or tag

HTTPS

lessons/X-Extras/X1-MultiModal/README.md

6lines · modecode

unknown

1	`# Multi-Modal Networks`
2
3	`After the success of transformer models for solving NLP tasks, there were many attempts to apply the same or similar architectures to computer vision tasks. Also, there is a growing interest in building models that would combine vision and natural language capabilities. One of such attempts was done by OpenAI, which is called CLIP.`
4
5	`## Contrastive Image Pre-Training (CLIP)`
6
7