Ilya Gusev, Nov 30, 2022
In this post, I highlight influential papers about matching texts and images. OpenAI published an original CLIP model in March 2021, and many things have changed since then. I will try to answer the following questions:
"one robot standing back to camera, staring into two screens, left screen displays colorful image, right screen displays some document, futuristic", Midjourney V4****
https://embed.notionlytics.com/wt/ZXlKd1lXZGxTV1FpT2lKaU5ETmxaVFUwWW1ZM09UYzBNVFExWVRSaU1ERTBaR0ZsTURSaVlqRXlaU0lzSW5kdmNtdHpjR0ZqWlZSeVlXTnJaWEpKWkNJNklsUnRjR293VTA1RVpFSmxPSEZtU0VKMFdHdzJJbjA9
Table of contents
Paper: Radford et al., 2021
Date: February 2021
PR post: https://openai.com/blog/clip/
Organization: OpenAI
Availability: Models and code are open, and the dataset is closed
Model: https://huggingface.co/openai/clip-vit-base-patch32
Code: https://github.com/openai/CLIP