for object detection and instance segmentation
- Identify multiple objects in an image, locate them with bounding boxes, provide pixel-level segmentation, giving each detected object a detailed mask that outlines its shape
Tldr
- Backbone Network (Feature Extraction)
- Typically a CNN (ResNet)
- Region Proposal Network
- Takes in the feature map from the backbone, and generates region proposals
- torch has
*torchvision.models.detection.rpn*
- RPN slides over feature map, proposing regions with object-like features
- Region of Interest Alignment
- Resizes the regions of interest to a fixed side while preserving spatial details (important for segmentation)
- Heads for Detection
Object classification and bounding box regression
Region Proposal:
- divide the input image into multiple regions that are likely to contain objects. This is done by external methods such as Selective Search or edgeBoxes
Region Classification: For each proposed region, a CNN is used to extract features and classify the object. Uses separate networks for region proposal and classification.