Intelligent Data Annotation Based on Generative Artificial Intelligence: Techniques, Analysis, and Future Opportunities

Authors

  • Yanzi Guo Author

DOI:

https://doi.org/10.61173/x0qzyp45

Keywords:

Generative AI, Data annotation, Large language models (LLMs), Human-AI collaboration, Synthetic data

Abstract

The explosive development of AI is strongly dependent on the availability of large-scale, high-quality annotated datasets. However, manual labeling is becoming increasingly unsustainable because of high costs and limited scalability, which has led to the incorporation of Generative AI (e. g. LLMs and LMMs) to automate and augment data engineering workflows. This review provides a systematic analysis of this shift, identifying three primary methodologies based on their target application scenarios: Generation-Annotation Integration, Understanding-Annotation, and Interaction-Annotation Augmentation. The paper systematically curates the literature across several domains, including Remote sensing, clinical psychology, and creative arts, to summarize current status on the “Quality-Efficiency-Credibility” triad of generative annotation. Significant cost savings and increases in efficiency are possible with these new methodologies, while associated drawbacks and limitations include Model hallucinations, domain knowledge gaps, and lack of standardized evaluation metrics for generative annotation processes. Finally, the paper proposes a research roadmap to address these fundamental problems with an emphasis on: human-AI collaboration ecosystems, multi-agent architectures, and privacy-preserving local inference. The aim of this work is to create a foundational perspective to develop the next generation of intelligent, trustworthy, and scalable data annotation frameworks.

Downloads

Published

2026-04-24

Issue

Section

Articles