microsoft/AI-For-Beginners
Publicmirrored fromhttps://github.com/microsoft/AI-For-BeginnersAvailable
lessons/X-Extras/X1-MultiModal/README.md
6lines · modecode
| 1 | # Multi-Modal Networks |
| 2 | |
| 3 | After the success of transformer models for solving NLP tasks, there were many attempts to apply the same or similar architectures to computer vision tasks. Also, there is a growing interest in building models that would *combine* vision and natural language capabilities. One of such attempts was done by OpenAI, which is called CLIP. |
| 4 | |
| 5 | ## Contrastive Image Pre-Training (CLIP) |
| 6 | |
| 7 | |