Dataset on Hugging Face
Labeled train/dev data plus unlabeled development-test and final-test configs.
ArabicNLP 2026 Shared Task
Arabic speech deepfake detection: build systems that distinguish bonafide human speech from spoofed Arabic audio across dialect and robustness conditions.
Participant resources are live.
Overview
ArA-DF 2026 is listed by ArabicNLP 2026 as Shared Task 14. Participating systems assign each Arabic speech sample to one of two labels: bonafide, for genuine human speech, or spoofed, for synthetic speech produced by text-to-speech or voice-conversion systems.
The task covers Modern Standard Arabic and Arabic dialects, with evaluation designed to reward detectors that generalize beyond narrow speakers, synthesis systems, and recording conditions.
Tracks
Both tracks use Equal Error Rate as the primary ranking metric, with Accuracy and macro-F1 reported for additional interpretation.
Evaluate whether systems remain reliable on Arabic dialects and speakers that are not represented in the training data.
Open Codabench Track 2Test robustness under practical audio conditions such as compression artifacts, background noise, and re-recording effects.
Open CodabenchData & Baseline
The ArA-DF 2026 dataset and baseline are hosted by ArabicSpeech on Hugging Face. Full loading commands and repository instructions remain on the Hugging Face cards.
Labeled train/dev data plus unlabeled development-test and final-test configs.
wav2vec2 XLS-R 300M frontend with an AASIST backend and helper scripts.
Competition pages for the two released tracks; leaderboards and submission phases are maintained there.
Conference-wide dates, contact listing, and paper-submission updates.
Labels use 0 for bona fide speech and 1 for spoofed speech. Released audio is 16 kHz mono PCM, packaged as lossless FLAC in WebDataset TAR shards, with metadata available through Hugging Face Parquet configs.
| Split | Rows | Labels |
|---|---|---|
| Train | 22,500 | Yes |
| Dev | 21,000 | Yes |
| Track 1 development-test | 16,023 | No |
| Track 1 final-test | 144,210 | No |
| Track 2 development-test | 14,193 | No |
| Track 2 final-test | 127,746 | No |
The baseline repository includes a trained checkpoint, XLS-R frontend weights, configuration, parquet-generation utility, and published result file. Lower EER is better.
| Baseline split | EER (%) | Utterances |
|---|---|---|
| Track 1 development-test | 14.67 | 16,023 |
| Track 1 final-test | 14.54 | 144,210 |
| Track 2 development-test | 27.21 | 14,193 |
| Track 2 final-test | 27.11 | 127,746 |
Values are baseline model-card results, not participant rankings.
Submission
Each track has development-test sets for experimentation and blind final-test sets for official evaluation.
Build systems with the released train and dev data, then submit predictions on the development-test sets through Codabench.
Submit predictions on the blind final-test sets. These submissions determine the official challenge results.
Important Dates
ArA-DF 2026 milestones for data release, submissions, official scoring, and system-description papers.
Training, dev and open test data, evaluation scripts, and baseline.
Experimentation and submissions on the development-test sets.
Submissions on the final-test sets and official scoring.
Final leaderboard state is frozen for official results.
System description paper submissions due.
Notification of acceptance for system-description papers.
Official final results are released to participants.
Camera-ready versions of accepted system papers due.
Complete team registration before requesting Codabench access. Paper-submission instructions will be shared with registered teams when available.
Evaluation
Equal Error Rate is the primary metric because it is threshold-independent and widely used in audio anti-spoofing evaluation. Lower EER is better; official scores come from the blind final-test submissions.
Contact & Forum
Use the public Google Group for task questions, discussions, announcements, and clarifications. Route private, platform, and repository issues to the appropriate channel below.
Register your team first, then request access on the Track 1 and/or Track 2 Codabench pages using the same team name and contact email.
Use the ArA-DF 2026 Google Group for participant-wide questions, official clarifications, and shared announcements.
Email the organizers for private issues such as team changes, registration mistakes, or sensitive access problems.
Use the corresponding Codabench competition page for track-specific submission, leaderboard, and platform issues.
Use the Hugging Face dataset and baseline repositories for data, loading, baseline, and model-card questions.
Organizers
For public task questions, discussions, and announcements, use the ArA-DF 2026 Google Group: ara-df-2026@googlegroups.com.
HUMAIN, Saudi Arabia
DFKI, Germany
HUMAIN, Saudi Arabia
HUMAIN, Saudi Arabia
HUMAIN, Saudi Arabia