for object detection and instance segmentation

  • Identify multiple objects in an image, locate them with bounding boxes, provide pixel-level segmentation, giving each detected object a detailed mask that outlines its shape

Tldr

  • Backbone Network (Feature Extraction)
    • Typically a CNN (ResNet)
  • Region Proposal Network
    • Takes in the feature map from the backbone, and generates region proposals
      • torch has *torchvision.models.detection.rpn*
      • RPN slides over feature map, proposing regions with object-like features
  • Region of Interest Alignment
    • Resizes the regions of interest to a fixed side while preserving spatial details (important for segmentation)
  • Heads for Detection
    • Object classification and bounding box regression

Region Proposal:

  • divide the input image into multiple regions that are likely to contain objects. This is done by external methods such as Selective Search or edgeBoxes

Region Classification: For each proposed region, a CNN is used to extract features and classify the object. Uses separate networks for region proposal and classification.