--- toc: true title: Swin Transformer tags: ['temp'] --- # Swin [Transformer](Transformer.md) - [Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows](https://arxiv.org/abs/2103.14030) - [Vision Transformer](Vision%20Transformer.md) - general-purpose backbone for computer vision - hierarchical feature representation - linear computational complexity with respect to input image size - shifted window based [Self Attention](Self%20Attention.md) - address the challenges in adapting [Transformer](Transformer.md) from language to vision - limiting self-[Attention](Attention.md) computation to non-overlapping local windows while also allowing for cross-window connection - flexibility to model at various scales - linear computational complexity with respect to image size - [ImageNet](ImageNet.md) - [COCO](COCO.md) - [ADE20K](ADE20K.md) - The hierarchical design and the shifted window approach also prove beneficial for all [Perception](Perception.md) [Architectures](Architectures). - Ratio of 1:1:3:1