microsoft/AI-For-Beginners

Public

mirrored fromhttps://github.com/microsoft/AI-For-BeginnersAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
968117204e3d0612b3a8a8c7b6eab42ecc1ea926

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

lessons/X-Extras/X1-MultiModal/README.md

6lines · modepreview

# Multi-Modal Networks

After the success of transformer models for solving NLP tasks, there were many attempts to apply the same or similar architectures to computer vision tasks. Also, there is a growing interest in building models that would *combine* vision and natural language capabilities. One of such attempts was done by OpenAI, which is called CLIP.

## Contrastive Image Pre-Training (CLIP)