Supplementary Reading and Resources 🤗

We hope that you found the unit on multimodal models exciting. If you’d like to learn and explore in detail about multimodal learning and models, here is a list of resources for your reference:

Hugging Face Tasks offers an overview of various tasks under domains like Computer Vision, Audio, NLP, Multimodal Learning and Reinforcement Learning. The tasks contain demos, use cases, models, datasets, etc.
11-777 MMML course on multimodal machine learning by CMU. You can find the video lectures here.
Blog on Multimodality and LLMs by Chip Huyen provides a comprehensive overview of multimodality, large multimodal models, systems like BLIP, CLIP, etc.
Awesome Multimodal ML, a GitHub repository containing papers, courses, architectures, workshops, tutorials etc.
Awesome Multimodal Large Language Models, a GitHub repository containing papers and datasets related to multimodal LLMs.
EE/CS 148, Caltech course on Large Language and Vision Models.

In the next unit we will take a look at another kind of Neural Network Models that were revolutionized by multimodality in the last years: Generative Neural Networks Get your paint brush ready and join us on another exciting adventure in the realm of Computer Vision 🤠

Update on GitHub

Community Computer Vision Course

Supplementary Reading and Resources 🤗