Image Segmentation Based on Transformer-Class Methods
DOI:
https://doi.org/10.61173/ygcq9v41Keywords:
Transformer, U-Net, Image SegmentationAbstract
Convolutional neural networks (CNNs) have many problems, including the inability to encode long-range details of images, failure to capture the global information, and other problems, like vanishing or exploding gradients during image segmentation tasks. Since its introduction in 1985, the Transformer model has proven to be very beneficial in natural language processing and computer vision. This paper reviews the application and development of Transformer models in image segmentation tasks within the biomedical, industrial manufacturing and agricultural environments in order to promote joint development of Transformer architecture and image segmentation research. The article explains the base and relevance of this study and dwells on the discussion of the three key challenges as follows: increasing contour accuracy, generalization ability and strength, and lowering the computation cost. It is also important to mention that in the future, the Transformer model will be more versatile by optimizing its architecture and algorithms, and decreasing the rate at which the parameters are updated during the process of adapting to new tasks. Moreover, combining the Transformer with additional network architectures is likely to create a dynamic tradeoff between local and long-range dependencies.