ArabicNLP 2026 Shared Task

ArA-DF 2026

Arabic speech deepfake detection: build systems that distinguish bonafide human speech from spoofed Arabic audio across dialect and robustness conditions.

Task: Binary audio classification
Metric: Equal Error Rate
Venue: ArabicNLP 2026 @ EMNLP

Participant resources are live.

Dataset live
Baseline live
Development phase
Codabench open

Overview

A focused benchmark for Arabic audio authenticity.

ArA-DF 2026 is listed by ArabicNLP 2026 as Shared Task 14. Participating systems assign each Arabic speech sample to one of two labels: bonafide, for genuine human speech, or spoofed, for synthetic speech produced by text-to-speech or voice-conversion systems.

The task covers Modern Standard Arabic and Arabic dialects, with evaluation designed to reward detectors that generalize beyond narrow speakers, synthesis systems, and recording conditions.

Tracks

Two participant tracks on Codabench.

Both tracks use Equal Error Rate as the primary ranking metric, with Accuracy and macro-F1 reported for additional interpretation.

Track 1

Dialect Generalization

Evaluate whether systems remain reliable on Arabic dialects and speakers that are not represented in the training data.

Open Codabench Track 2

Acoustic Robustness

Test robustness under practical audio conditions such as compression artifacts, background noise, and re-recording effects.

Open Codabench

Data & Baseline

Live dataset and baseline resources.

The ArA-DF 2026 dataset and baseline are hosted by ArabicSpeech on Hugging Face. Full loading commands and repository instructions remain on the Hugging Face cards.

Dataset on Hugging Face

Labeled train/dev data plus unlabeled development-test and final-test configs.

Open dataset

Baseline on Hugging Face

wav2vec2 XLS-R 300M frontend with an AASIST backend and helper scripts.

Open baseline

Codabench leaderboards

Competition pages for the two released tracks; leaderboards and submission phases are maintained there.

Track 1 Track 2

ArabicNLP shared-tasks page

Conference-wide dates, contact listing, and paper-submission updates.

Open page

Dataset snapshot

Labels use 0 for bona fide speech and 1 for spoofed speech. Released audio is 16 kHz mono PCM, packaged as lossless FLAC in WebDataset TAR shards, with metadata available through Hugging Face Parquet configs.

Dataset split rows and label availability
Split	Rows	Labels
Train	22,500	Yes
Dev	21,000	Yes
Track 1 development-test	16,023	No
Track 1 final-test	144,210	No
Track 2 development-test	14,193	No
Track 2 final-test	127,746	No

Baseline snapshot

The baseline repository includes a trained checkpoint, XLS-R frontend weights, configuration, parquet-generation utility, and published result file. Lower EER is better.

Baseline model-card EER results
Baseline split	EER (%)	Utterances
Track 1 development-test	14.67	16,023
Track 1 final-test	14.54	144,210
Track 2 development-test	27.21	14,193
Track 2 final-test	27.11	127,746

Values are baseline model-card results, not participant rankings.

Submission

Phases and submission limits.

Each track has development-test sets for experimentation and blind final-test sets for official evaluation.

Experimentation June 16 - July 20, 2026

Development phase

Build systems with the released train and dev data, then submit predictions on the development-test sets through Codabench.

Up to 100 submissions per team.
Maximum 10 submissions per day.
Best submission appears on the public leaderboard.
No final commitment required; this phase is for experimentation.

Official scoring July 20 - July 25, 2026

Evaluation phase

Submit predictions on the blind final-test sets. These submissions determine the official challenge results.

Up to 3 system submissions per team.
Best EER across the 3 submissions is the official score.
Submit separately for Track 1, Track 2, or both.
System description papers remain required for final ranking eligibility.

Important Dates

Task timeline.

ArA-DF 2026 milestones for data release, submissions, official scoring, and system-description papers.

June 16, 2026

Resources released

Training, dev and open test data, evaluation scripts, and baseline.

June 16 - July 20, 2026

Development phase

Experimentation and submissions on the development-test sets.

July 20 - July 25, 2026

Evaluation phase

Submissions on the final-test sets and official scoring.

July 25, 2026

Leaderboard freeze

Final leaderboard state is frozen for official results.

August 8, 2026

System papers due

System description paper submissions due.

August 17, 2026

Acceptance notification

Notification of acceptance for system-description papers.

August 17, 2026

Final results released

Official final results are released to participants.

August 22, 2026

Camera-ready due

Camera-ready versions of accepted system papers due.

Complete team registration before requesting Codabench access. Paper-submission instructions will be shared with registered teams when available.

Evaluation

Ranked by Equal Error Rate.

Equal Error Rate is the primary metric because it is threshold-independent and widely used in audio anti-spoofing evaluation. Lower EER is better; official scores come from the blind final-test submissions.

EER Primary ranking metric

Accuracy Secondary leaderboard metric

Macro-F1 Secondary leaderboard metric

Contact & Forum

Where to ask questions and report issues.

Use the public Google Group for task questions, discussions, announcements, and clarifications. Route private, platform, and repository issues to the appropriate channel below.

Registration

Team registration form

Register your team first, then request access on the Track 1 and/or Track 2 Codabench pages using the same team name and contact email.

Public forum

Questions, discussions, and announcements

Use the ArA-DF 2026 Google Group for participant-wide questions, official clarifications, and shared announcements.

Open Google Group Email group

Private issues

Team-specific or sensitive support

Email the organizers for private issues such as team changes, registration mistakes, or sensitive access problems.

Email organizers

Submission platform

Codabench submissions and leaderboards

Use the corresponding Codabench competition page for track-specific submission, leaderboard, and platform issues.

Track 1 Track 2

Dataset and baseline

Hugging Face repository issues

Use the Hugging Face dataset and baseline repositories for data, loading, baseline, and model-card questions.

Dataset Baseline

Organizers

Organizing team.

For public task questions, discussions, and announcements, use the ArA-DF 2026 Google Group: ara-df-2026@googlegroups.com.

Vasista Sai Lodagala

HUMAIN, Saudi Arabia

Yassine El Kheir

DFKI, Germany

Sara Althubaiti

HUMAIN, Saudi Arabia

Pedro Moreno Mengibar

HUMAIN, Saudi Arabia

Ahmed Ali

HUMAIN, Saudi Arabia