Vision Transformers Deep Learning models for Image Processing. A Survey

Sana Cheema; Ali Daud; Tariq Alsahfi; Hikmat Ullah Khan; Kainat Naeem

doi:10.61503/Ijmcp.v2i1.190

Authors

Sana Cheema The Islamia University Bahawalpur, Pakistan Author
Ali Daud The Islamia University Bahawalpur, Pakistan Author
Tariq Alsahfi The Islamia University Bahawalpur, Pakistan Author
Hikmat Ullah Khan The Islamia University Bahawalpur, Pakistan Author
Kainat Naeem The Islamia University Bahawalpur, Pakistan Author

DOI:

https://doi.org/10.61503/Ijmcp.v2i1.190

Keywords:

Vision Transformers, Deep Learning Image Processing, Architectural Innovations, Performance Analysis

Abstract

Transformers have emerged as a novel and transformative technology in deep learning, providing an alternative to image processing which was historically based on Convolutional Neural Network. The present study demonstrates the adaptability of Vision Transformers from natural Language processing to a variety of image analysis applications. VIT incorporates elements from the architecture, such as the self-attention mechanism, that permits the model to capture global dependencies in the images. The performance of ViT is also evaluated compared to CNN. The researchers emphasize the advantages that ViT have concerning the processing of complex visual information and the disadvantages they face because of their computational and data requirements. The paper also briefly considers the wide spectrum of ViT applications in medical imaging, object detection, and others. Thus, the paper discusses the current trends and expected developments and states that Vision Transformers may be the key to the intensification of image processing technologies

Vision Transformers Deep Learning models for Image Processing. A Survey

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Sidebar

Latest publications

Information