Info
Model creates its own supervision signals (pseudo-labels) from the raw, unlabeled data to learn representations No human-labeled data is required for the pretext task (the task required to train the model) The model generates tasks based on the structure or properties of the data
- predict missing parts of an input
- identify whether two views of data belong to the same instance
Example 1: Predict Missing Parts of Data
- Data Structure: For an image, the pixels are naturally arranged spatially in a grid.
- Pretext Task: Mask part of the image (e.g., a patch) and train the model to predict the missing part using the rest of the image.
- This is the basis of Masked Autoencoders or the BERT model in NLP.
- Supervision: The unmasked parts of the image or text provide the “supervision.” The model learns to understand the context of the image/text to make predictions about the missing part.
Example 2: Contrastive Learning (SimCLR)
-
Data Structure: Different augmentations of the same image should represent the same underlying object or scene.
-
Pretext Task: Create “positive pairs” (two augmentations of the same image) and “negative pairs” (augmentations of different images). Train the model to make the embeddings of positive pairs close and those of negative pairs far apart.
-
Supervision: The assumption that augmentations of the same image should remain similar is what provides the supervision.
- For example, if you take an image of a cat, crop it, rotate it, or change the colors, it’s still a cat. These augmentations are treated as “positive pairs.”