Vision Transformers Deep Learning models for Image Processing. A Survey
DOI:
https://doi.org/10.61503/Ijmcp.v2i1.190Keywords:
Vision Transformers, Deep Learning Image Processing, Architectural Innovations, Performance AnalysisAbstract
Transformers have emerged as a novel and transformative technology in deep learning, providing an alternative to image processing which was historically based on Convolutional Neural Network. The present study demonstrates the adaptability of Vision Transformers from natural Language processing to a variety of image analysis applications. VIT incorporates elements from the architecture, such as the self-attention mechanism, that permits the model to capture global dependencies in the images. The performance of ViT is also evaluated compared to CNN. The researchers emphasize the advantages that ViT have concerning the processing of complex visual information and the disadvantages they face because of their computational and data requirements. The paper also briefly considers the wide spectrum of ViT applications in medical imaging, object detection, and others. Thus, the paper discusses the current trends and expected developments and states that Vision Transformers may be the key to the intensification of image processing technologies