Community Computer Vision Course documentation
Supplementary Reading and Resources 🤗
Unit 0 - Welcome
Unit 1 - Fundamentals
Unit 2 - Convolutional Neural Networks
Unit 3 - Vision Transformers
Unit 4 - Multimodal Models
Exploring Multimodal Text and Vision Models - Uniting Senses in AIA Multimodal WorldIntroduction to Vision Language ModelsMultimodal Tasks and ModelsCLIP and RelativesLossesContrastive Language-Image Pre-training (CLIP)Multimodal Text Generation (BLIP)Multimodal Object Detection (OWL-ViT)Transfer Learning of Multimodal ModelsSupplementary Reading and Resources
Unit 5 - Generative Models
Unit 6 - Basic CV Tasks
Unit 7 - Video and Video Processing
Unit 8 - 3D Vision, Scene Rendering and Reconstruction
Unit 9 - Model Optimization
Unit 10 - Synthetic Data Creation
Unit 11 - Zero Shot Computer Vision
Unit 12 - Ethics and Biases
Unit 13 - Outlook
Supplementary Reading and Resources 🤗
We hope that you found the unit on multimodal models exciting. If you’d like to learn and explore in detail about multimodal learning and models, here is a list of resources for your reference:
- Hugging Face Tasks offers an overview of various tasks under domains like Computer Vision, Audio, NLP, Multimodal Learning and Reinforcement Learning. The tasks contain demos, use cases, models, datasets, etc.
- 11-777 MMML course on multimodal machine learning by CMU. You can find the video lectures here.
- Blog on Multimodality and LLMs by Chip Huyen provides a comprehensive overview of multimodality, large multimodal models, systems like BLIP, CLIP, etc.
- Awesome Multimodal ML, a GitHub repository containing papers, courses, architectures, workshops, tutorials etc.
- Awesome Multimodal Large Language Models, a GitHub repository containing papers and datasets related to multimodal LLMs.
- EE/CS 148, Caltech course on Large Language and Vision Models.
In the next unit we will take a look at another kind of Neural Network Models that were revolutionized by multimodality in the last years: Generative Neural Networks Get your paint brush ready and join us on another exciting adventure in the realm of Computer Vision 🤠
Update on GitHub