Abstract:
This paper introduces a methodology to create a dataset designed to address
the challenges of Named Entity Recognition (NER) in Algerian Dialectal Arabic
(ADA). While significant advancements have been made in NER for Modern
Standard Arabic (MSA), dialectal varieties like ADA remain largely
underrepresented in natural language processing (NLP) resources. To bridge this
gap, we propose a systematic method for collecting and annotating ADA texts.
The dataset will be curated from informal sources, including social media
platforms, where ADA is predominantly used