Annotation team report — Batch 34 + 35

Generated 2026-05-25 from the annotation-data refresh (annotation + review pull through audit + IAA + violations + lifecycle + silent-alif forensics).

Executive summary

Inter-annotator agreement is solid: Cohen κ on hasMistakes = 0.819, Fleiss κ = 0.818 over 63,671 tasks, word-level κ = 0.911. The team labels consistently.
The pipeline bottleneck is the review queue on Batch 34. Batch 35 reviews kick in within 9 minutes of submission; Batch 34 reviews sit for 12.3 days median. End-to-end this makes Batch 34 take 3.5× longer than Batch 35 (77 days vs 22 days).
Silent-alif violations originate in tdreeb's preannotation seed, not annotator behavior. 56,096 preannotations carry the violation, annotators clean up 49,598, and 507 still leak into accepted text. The fix needs to land upstream in tdreeb/jobs/create_tasks.
Sherif Bakry is the single biggest individual signal: heavy producer on both sides (4,927 annotations + 27,339 reviews, 3rd-most-active reviewer), worst silent-alif producer rate in the team (0.97/100w), AND the reviewer responsible for 36% of accepted leaks (182 of 507). Worth a 1:1.
25,817 clips · 95.8 hours of clean, agreed-upon, ≤20 s audio is ready to train on. Manifest is drop-in for tdreeb.
Batch 35 has a duration mismatch: 72% of its tasks are over the 20 s training cap. Either trim before annotation or lift the training cap.

1. Annotation & review audit

What this answers: What's the raw shape of the corpus? How much disagreement are reviewers having to resolve? Where in the RCA taxonomy is the team spending its energy?

annotations

270,472

all states

review events

296,700

distinct tasks

171,717

Findings

270,472 annotations across 171,717 tasks. The largest annotation state is accepted: 141,022 rows (52.1%).
296,700 review events. Reviewers accepted 56.2% of review events and sent back 40.7%. This is event-level review traffic, not a unique-task acceptance rate.
Top RCA on annotations is annotator_focus_issue (75,111 hits, 27.8% of all annotations). This is broad enough that managers should audit whether the label is being used consistently.

Question this plot answers: What is the current state mix of all annotation rows?

Observation: The state mix shows corpus inventory, not final yield. Submitted and draft rows are still work-in-progress, while accepted rows are candidate ground truth.

Question this plot answers: How much review traffic turns into acceptance, rejection, escalation, or rework?

Observation: Sent-back is high at 40.7% of review events, so rework volume is a core operating cost. Because this is event-level, repeated rework on one annotation appears multiple times.

Question this plot answers: Which annotation RCA reasons dominate the audit taxonomy?

Recommendation: The RCA chart is useful for policy review, and the main action is taxonomy design: broad catch-all labels such as annotator_focus_issue should be split into more granular buckets. Otherwise one huge general tag hides multiple operational problems that need different fixes.

2. Inter-annotator divergence — severity patterns

What this answers: When two annotators both annotated the same task, how often do they substantively disagree — versus disagree only on diacritic spelling? Where is the team's real disagreement (vs. normalisation noise)?

What "severity" means — exact thresholds

Every pair of annotators that touched the same task is scored by character-level edit distance between their submissions, ignoring whitespace. These are the exact absolute character-edit bins from src/basirah/divergence.py::_severity, applied independently to the normalised and raw views:

bin	condition	plain English
`identical`	`char_edits == 0`	no character edits in this view
`minor`	`char_edits` is 1-3	tiny orthographic disagreement
`moderate`	`char_edits` is 4-15	bounded text disagreement
`grave`	`char_edits ≥ 16`	large character-level disagreement

Word edits, WER, and CER are still computed and stored as diagnostics, but only absolute character edits decide the severity bucket. Computed twice: once on normalised text (diacritics stripped — only consonants compared) and once on raw text (every character and diacritic counts). The gap between the two separates core consonant-level disagreement from strict orthographic/diacritic correctness.

Findings

After diacritic normalisation, 64,951 of 98,283 pairs (66.1%) are identical. That's the team's true agreement floor — anything below that is real annotation disagreement, not noise.
The diacritic-noise wedge is 8,291 pairs. When you compare raw text instead of normalised, that many pairs slip out of "identical" — and the bumps in minor/moderate/grave sum to exactly the same number (-54 + 6,197 + 2,148 = 8,291). That's how much of perceived disagreement is just diacritic-spelling noise.
"Grave" pairs (severity_norm = grave): 1,675 (1.7%) — now defined by absolute character edits only: 16+ normalised character edits, ignoring whitespace. What drives them: 910 (54.3%) have a reject_* RCA or skipped/rejected state on at least one side; 36 (2.1%) look like wrong-passage/submission mismatches; 24 (1.4%) look one-sided/truncated; 162 (9.7%) disagree on has_mistakes; and 543 (32.4%) are the remaining hard-audio/substantive disagreement cases. Top RCA reasons on grave-pair annotations: reject_non_quranic_content (624), annotator_focus_issue (551), reject_audio_quality_issue (199), reject_multiple_recitations_overlap (128).
Top word confusions are all silent-alif spellings (ذلك ↔ ذالك, هذا ↔ هاذا, الرحمن ↔ الرحمان, ولكن ↔ ولاكن, كذلك ↔ كذالك) — see Rule-1 violations in §3 and the standalone Silent-alif forensics in §8.
Taa-marbuta (ة↔ه), alif-maksura (ى↔ي), and hamza-variant pairs are hidden in the fully normalised table because arabic_encode collapses them. The diacritic-stripped word table below keeps consonant identity intact and surfaces these pairs directly. The same disagreement also shows up at finer granularities: ةْ↔هْ is the #1 phoneme confusion (16,095, §4) and ة↔ه is #9 at the character level (1,455).

Question this plot answers: After diacritic normalisation, how severe are annotator-vs-annotator text disagreements, and which batch contributes them?

Observation: The normalised chart is the one to use for human disagreement: diacritics are stripped first, and the severity bucket comes from absolute character edits.

Question this plot answers: Under the strict quality bar, how much disagreement appears when every raw character and diacritic counts?

Strict-quality view: The raw chart is the right view when the deliverable requires correct diacritics and every character matters. The normalised chart answers “did they agree on the consonant-level text?”; the raw chart answers “did they submit exactly the same fully written text?”.

Severity totals (across all 98,283 pairs)

severity	normalised pairs	raw pairs	Δ (raw − norm)	norm %	raw %
identical	64951	56660	-8291	66.100	57.600
minor	22605	22551	-54	23.000	22.900
moderate	9052	15249	6197	9.200	15.500
grave	1675	3823	2148	1.700	3.900

Definition check: Word edits, WER, and CER are diagnostic columns only. A short pair with one changed word no longer becomes grave just because its WER is high.

Grave-pair root causes (enriched analysis)

From grave_pairs_enriched.parquet. The classifier looks at state, RCA reason, length asymmetry, hasMistakes disagreement, and normalised-token overlap to bucket each grave pair into one of five causes.

root cause	technical label	count	%
Should have been rejected	should_have_been_rejected	910	54.300
Hard audio / substantive disagreement	normal_hard_audio	543	32.400
hasMistakes disagreement	has_mistakes_disagreement	162	9.700
Wrong passage / submission mismatch	wrong_passage_or_submission_mismatch	36	2.100
One side truncated	one_side_truncated	24	1.400

Observation: The grave bucket is now much more manager-actionable, but it should not be read as “all hard audio.” A distinct slice is wrong-passage/submission mismatch: both sides are long enough, the normalised character edit distance is huge, and the two submissions share very little normalised vocabulary. Those cases are more consistent with a wrong-text submission, task/audio mapping issue, copy/paste or UI-selection error, or an early review miss. They should be caught upstream instead of treated as normal ambiguous recitation.

Heuristic: The wrong-passage/submission-mismatch bucket currently means: both sides have at least 8 words, char_edits_norm ≥ 50, and normalised token overlap over the shorter side is ≤20%. This is a triage signal, not proof of a UI bug.

Examples (up to 5 per root cause)

Changed tokens are highlighted. Examples are sorted by largest normalised character edit distance and de-duplicated by task_id within each root cause. Full pairs are in reports/audit/grave_pairs_enriched.parquet.

Should have been rejected task 329533 · Batch 35 · 243 char edits · 63 word edits · token overlap 62%

A — Khaled Hussein (submitted)

لَا تُدْرِكُهُ الْأَبْصَارَ وَهُوَ يُدْرِكُ الْأَبْصَارَ وَهُوَ اللَّطِيفُ الْخَبِيرُ كُلَّ شَيْءٍ لَّطِيفُ الْخَبِيرْ قَدْ جَاءَكُمْ بَصَائِرُ مِنْ رَبِّكُمْ فَمَنْ أَبْصَرَ فَلِنَفْسِهِ وَمَنْ عَمِيَ فَعَلَيْهَا وَمَا أَنَا عَلَيْكُمْ بِحَفِيظٍ وَمَا أَنَا عَلَيْكُمْ بِحَفِيظْ وَذَالِكَ نُصَرِّفُ الْآيَاتِ وَلِيَقُولُوا دَرَسْتَ وَلِنُبَيِّنَهُ وَلِيَقُولُوا دَرَسْتَ وَلِنُبَيِّنَهُ لِقَوْمٍ يَعْلَمُونْ اتَّبِعْ مَا أُوحِيَ إِلَيْكَ مِنْ رَبِّكَ لَا إِلَاهَ إِلَّا هُوَ وَأَعْرِضْ عَنِ الْمُشْرِكِينْ وَلَوْ شَاءَ اللَّهُ مَا وَلَوْ شَاءَ اللَّهُ مَا لم أَشْرَكُوا وَمَا جَعَلْنَاكَ عَلَيْهِمْ حَفِيظًا وَمَا أَنْتَ عَلَيْهِمْ بِوَكِيلْ وَلَا تَسُبُّوا الَّذِينَ يَدْعُونَ رَبَهُمْ بِالْغَدَاةَ اللَّهِ شِيْ

B — Moniem Gamal (skipped, reject_audio_quality_issue)

الْخَبِيرْ وَكُلَّ شَيْءٍ خَبِيرْ قَدْ جَاءَكُمْ مَثَلُكُمْ بَصَارُ عَلَيْهِ وَإِنَّ عَلَيْكُمْ لَحَافِظِينْ وَمَا أَنَا عَلَيْكُمْ بِحَفِيظْ وَذَلِكَ نُبَيِّنُ لَكُمْ لِقَوْمٍ يَعْلَمُونْ مِنْ رَبِّكَ مُشْرِكِينْ شَاءَ اللَّهُ وَلَوْ شَاءَ اللَّهُ مَا اللَّهُ عَلَيْهِمْ بِالْغَدَاةَ

Should have been rejected task 342076 · Batch 35 · 223 char edits · 59 word edits · token overlap 34%

A — بيسان الغرباوي (submitted)

بِسْمِ اللَّهِ الرَّحْمَانِ الرَّحِيمْ وَيَا قَوْمِ لَا يَجْرِمَنَّكُمْ شِقَا شِقَاقِي أَنْ يُصِيبَكُمْ مِثْلَ مَا أَقَا مِثْلَ مَا أَصَابَ قَوْمَ نُوحٍ أَوْ قَوْمَ هُودٍ أَوْ قَوْمَ صَالِحْ وَمَا قَوْمَ عَادٍ مِنْكُمْ بِبَعِيدْ وَاسْتَغْفِرُوا رَبَّكُمْ ثُمَّ تُوبُوا إِلَيْهْ إِنَّ رَبِّي رَحِيمٌ وَدُودْ قَالُوا يَا شُعَيْبُ مَا نَفْقَهُ كَثِيرًا مِمَّا تَقُولُ وَإِنَّا لَنَرَاكَ فِينَا ضَعِيفَا وَلَوْلَا رَهْطُكَ لَرَجَمْنَاكَ وَمَا أَنْتَ عَلَيْنَا بِعَزِيزْ قَالَ يَا قَوْمِ أَرَهْطِي أَعَزُّ عَلَيْكُمْ مِنَ اللَّهْ

B — tahaelkarem (skipped, reject_audio_quality_issue)

بِسْمِ اللَّهِ الرَّحْمَنِ الرَّحِيمْ وَالْقَلَمِ وَمَا يَسْطُرُونْ مَا أَنْتَ بِنِعْمَةِ رَبِّكَ بِمَجْنُونْ وَإِنَّ لَكَ لَأَجْرًا غَيْرَ مَمْنُونْ وَإِنَّكَ لَعَلَى خُلُقٍ عَظِيمْ فَسَتُبْصِرُ وَيُبْصِرُونْ ثُمَّ إِلَيْهِ مَرْجِعُكُمْ قَالَ يَا شُعَيْبُ أَيْنَ مَا تَكُونُوا أَيَخْرَجُ

Should have been rejected task 358520 · Batch 35 · 214 char edits · 54 word edits · token overlap 27%

A — awad fdila (skipped, reject_audio_quality_issue)

وَإِذْ قَالَ إِبْرَاهِيمُ رَبِّ اجْعَلْ هَذَا بَلَدًا آمِنًا وَارْزُقْ أَهْلَهُ مِنَ الثَّمَرَاتِ مَنْ آمَنَ مِنْهُمْ بِاللَّهِ وَالْيَوْمِ الْآخِرْ قَالَ وَمَنْ كَفَرَ فَأُمَتِّعُهُ قَلِيلًا ثُمَّ أَضْطَرُّهُ إِلَى عَذَابِ النَّارِ وَبِئْسَ الْمَصِيرْ أَلَمْ يُنْفِقُونْ

B — بيسان الغرباوي (submitted)

مِنَ الشَّيْطَانِ الرَّجِيمْ بِسْمِ اللَّهْ وَإِذْ قَالَ إِبْرَاهِيمُ رَبِّ أَرِنِي كَيْفَ تُحْيِي الْمَوْتَى قَالَ أَوَلَمْ تُؤْمِنْ قَالَ بَلَى وَلَاكِنْ لِيَطْمَئِنَّ قَلْبِي قَالَ فَخُذْ أَرْبَعَةً مِنَ الطَّيْرِ فَصُرْهُنَّ إِلَيْكْ ثُمَّ اجْعَلْ عَلَى مِنْهُنَّ جُزْءًا ثُمَّ ادْعُهُنَّ يَأْتِينَكَ سَعْيَا وَاعْلَمْ أَنَّ اللَّهَ عَزِيزٌ حَكِيمْ مَثَلُ الَّذِينَ يُنْفِقُونَ أَمْوَالَهُمْ

Should have been rejected task 352386 · Batch 35 · 205 char edits · 45 word edits · token overlap 55%

A — Walid Ahmad Muhammad (submitted)

يَقُولُونَ أَإِنْ كُنَّا مَعَكُمْ أَلَيْسَ اللَّهُ بِأَعْلَمَ بِمَا فِي صُدُورِ الْعَالَمِينْ وَلَيَعْلَمَ اللَّهْ اللَّهُ الَّذِينَ آمَنُوا وَيَعْلْ وَلَيَعْلَمَنَّ الْمُنَافِقِينْ وَقَالَ الَّذِينَ آمَنُوا اتَّبِعُونَا وَلْنَحْمِلْ خَطَايَاكُمْ وَمَا هُمْ بِحَامِلِينَ مِنْ خَطَايَاهُمْ مِنْ شَيْءٍ إِنَّهُمْ لَكَاذِبُونَ وَلَيَحْمِلُنَّ أَثْقَالَهُمْ وَأَثْقَالًا مَعَ أَثْقَالِهِمْ وَلَيُسْأَلُنَّ يَوْمَ الْقِيَامَةِ عَمَّا كَانُوا يَفْتَرُونْ وَلَقَدْ أَرْسَلْنَا نُوحًا إِلَى قَوْمِهِ فَلَبِثَ فِيهِمْ أَلْفَ سَنَةٍ إِلَّا خَمْسِينَ عَامًا فَأَخَذَهُمُ الطُّوفَانُ وَهُمْ ظَالِمُونْ فَأَنْجَيْنَاهُ وَأَصْحَ

B — Yasser Mohamad Mohamad (skipped, reject_audio_quality_issue)

يَقُولُونَ أَإِنَّا كُنَّا مَعَكُمْ أَلَيْسَ اللَّهُ بِأَعْلَمَ بِمَا فِي صُدُورِ الْعَالَمِينْ وَلَيَعْلَمَنَّ اللَّهُ الَّذِينَ آمَنُوا وَيَعْلَمَ وَلَيَعْلَمَنَّ الْمُنَافِقِينْ وَقَالَ الَّذِينَ لَا يَرْجُونَ لِقَاءَنَا لَوْلَا نُطَاعُونَ لِقَومٍ مَعَهُمُ الْكِتَابَ فَاعْبُدُوا إِلَّا خُمْسٍ فَأَمْرِكُو

Should have been rejected task 331216 · Batch 35 · 202 char edits · 50 word edits · token overlap 82%

A — عصام عبد الحميد عبد العزيز (submitted, annotator_focus_issue)

فَجَعَلَهُمْ جُذَاذًا إِلَّا كِبَرَ لَهُمْ لَعَلَّهُمْ لَيْهِ يَهْجَعُونْ غَالُوا مَنْ فَعَلَ هَاذَا بِآلِهَتِنَا إِنَّهُ لَمِنَ الظَّالِمِينْ قَالُوا سَمِعْنَا فَتًى يَذْكُرُهُمْ يُقَالْ لُهُ إِبْرَاهِيمْ قَالُوا فَأْتُوا بِهِ عَلَى أَعْيُنِ النَّاسِ لَعَهُمْ يَشْهَدُونْ قَالَ أَأَنْتَ فَعَلْتَ هَاذَا بِآلِهَتِنَا إِنَّهُ لَمِنَ الظَّالِمِينْ قَالُوا سَمِعْنَا فَتًى يَذْكُرُهُمْ يُقَالْ لِهْ إِبْرَاهِيمْ

B — saidmohamed (skipped, reject_multiple_recitations_overlap)

فَجَعَلَهُمْ جُذَاذًا إِلَّا كِفْلَهُمْ لَعَلَّهُمْ يَهْجَعُونْ وَلَهُمْ عَذَابْ إِنَّهُمْ لَمِنَ الظَّالِمِينْ قَالُوا سَمِعْنَا فَتًى يَذْكُرُهُمْ يُقَالُ إِبْرَاهِيمْ قَالُوا فَأْتُوا بِهِ عَلَى أَعْيُنِ النَّاسِ لَعَلَّهُمْ يَشْهَدُونْ قَالُوا أَأَنْتَ فَعَلْتَ هَذَا بِآلِهَتِنَا إِنَّهُ لَمِنَ الظَّالِمِينْ قَالُوا سَمِعْنَا فَتًى يَذْكُرُهُمْ يُقَالُ إِبْرَاهِيمْ

Wrong passage / submission mismatch task 130845 · Batch 34 · 210 char edits · 37 word edits · token overlap 0%

A — Marwan (escalated, annotator_focus_issue)

وَإِذَا الْجِبَالُ سُيِّرَتْ وَإِذَا الْعِشَارُ عُطِّلَتْ وَإِذَا الْوُحُوشُ حُشِرَتْ وَإِذَا الْبِحَارُ سُجِّرَتْ وَإِذَا النُّفُوسُ زُوِّجَتْ وَإِذَا الْمَوْءُودَةُ سُئِلَتْ بِأَيِّ ذَنْبٍ قُتِلَتْ وَإِذَا الصُّحُفُ نُشِرَتْ وَإِزَا السَّمَاءُ كُشِطَتْ

B — Mohamed Abdelghany (accepted, annotator_speed_mistake)

أَمِنْهُمْ مَنْ آمَنَ وَمِنْهُمْ مَنْ كَفَرْ وَلَوْ شَاءَ اللَّهُ مَا اقْتَتَتَلُوا وَلَاكِنَّ اللَّهَ يَفْعَلُ مَا يُرِيتْ يَا أَيُّهَا الَّزِينَ آمَنُوا أَنْفِقُوا مِمَّا كَسَ أَنْفُقُوا مِمَّا رَ رَزَقْنَاكُمْ مِنْ قَبْلْ أَنْ يَأْتِيَ يَوْمٌ لَا بَيْعٌ فِيهِ وَلَا خُلَّةُ

Wrong passage / submission mismatch task 131136 · Batch 34 · 198 char edits · 40 word edits · token overlap 15%

A — عبدالله صلاح العيسوي (rejected, reject_technical_issue)

طَاسِينْ مِيم تِلْكَ آيَاتُ الْكِتَابِ الْمُبِينْ لَعَلَّكَ بَاخِعٌ نَفْسَكَ أَلَّا يَكُونُوا مُؤْمِنِينْ إِنْ نَشَأْ نُنَزِّلْ عَلَيْهِمْ مِنَ السَّمَاءِ آيَةً فَظَلَّتْ أَعْنَاقُهُمْ لَهَا خَاضِعِينْ وَمَا يَأْتِيهِمْ مِنْ ذِكْرٍ إِلَّا كَانُوا عَنْهُ مُعْرِ مِنَ الرَّحْمَانِ إِلَّا كَانُوا عَنْهَا مُعْرِضِينْ فَقَدْ كْ

B — Fares Moustafa (accepted)

إِلَّا مَوْتَتُنَا الْأُولَى وَمَا نَحْنْ بِمُعَذَّبِينْ إِنَّ هَاذَا لَهُوَ الْفَوْزُ الْعَظِيمْ لِمِثْلِ هَاذَا فَلْيَعْمَلِ الْعَامِلُونْ أَذَالِكَ خَيْرٌ نُزُلًا أَمْ شَشَرَةُ الزَّقُّومْ

Wrong passage / submission mismatch task 232340 · Batch 34 · 164 char edits · 53 word edits · token overlap 17%

A — Yasser Waled (escalated, annotator_focus_issue)

وَلَا تَقُولُوا لِمَ تَصِفُ أَلْسِنَتُكُمُ وَلَا تَقُولُوا لِمَ تَصِفُ أَلْسِنَتُكُمُ الْكَذِبَ هَاذَا حَلَالٌ وَهَاذَا حَرَامٌ لِتَفْتَرُوا عَلَى اللَّهِ الْكَذِبْ إِنَّ الَّذِينَ يَفْتَرُونَ عَلَى اللَّهِ الْكَذِبَ لَا يُفْلِحُونْ مَتَاعٌ قَلِيلٌ وَلَهُمْ عَذَابٌ مَوَدَّةْ

B — Mariam Khaled (accepted)

بِسْمِ اللَّهِ الرَّحْمَانِ الرَّحِيمْ لَقَدْ كَانَ لَكُمْ فِيهُمْ أُسْوَةٌ حَسَنَةٌ لِمَنْ كَانَ يَرْجُو اللَّهَ وَالْيَوْمَ الْآخِرَ وَمَنْ يَتَوَلَّ فَإِنَّ اللَّهَ هُوَ الْغَنِيُّ الْحَمِيدْ عَسَى اللَّهُ إِنْ يَجْعَلْ بَيْنَكُمْ وَبَيْنَ الَّذِينَ عَادَيْتُمْ مِنْهُمْ مَوَدَّةْ

Wrong passage / submission mismatch task 130998 · Batch 34 · 148 char edits · 46 word edits · token overlap 5%

A — Fares Moustafa (accepted, annotator_focus_issue)

وَكَتَبْنَا عَلَيْهُمْ فِيهَا أَنَّ النَّفْسَ بِالنَّفْسِ وَالْعَيْنَ بِالْعَيْنْ وَالْأَنْفَ بِالْأَنْفِ وَالْأُذُنَ بِالْأُذُنِ وَالسِّنَّ بِالسِّنِّ وَالْجُرُوحَ قِصَاصْ فَمَنْ تَصَدَّقَ بِهِ فَهُوَ

B — Mohamed Abdelghany (accepted)

وَمَنْ وَلَمْ وَمَا كَانْ وَمَا كَانَ وَمَا وَلَمْ تَكُنْ لَهُ فِئَةٌ يَنْصُرُونَهُ مِنْ دُونِ اللَّهِ وَمَا كَانَ مُنْتَصِرَا وَلَوْلَا إِذْ دَخَلْتَ جَنَّتَكَ خُلْتَ مَا شَاءَ اللَّهْ لَا قُوَّةَ إِلَّا بِاللَّهِ إِنْ تَرَنِ أَقَ

Wrong passage / submission mismatch task 131088 · Batch 34 · 139 char edits · 28 word edits · token overlap 11%

A — Marwan (escalated, annotator_focus_issue)

حَوْلِكْ فَاعْفُ عَنْهُمْ وَاسْتَأْ فِرْ لَهُمْ وَشَاوِرَهُمْ فِي الْأَمْرْ فَإِذَا عَزِمْتَ فَعَ فَإِذَا عَزَمْتَ تَوَكَّلْ تَوَكَّلْ عَلَى اللَّهْ إِنَّ اللَّهَ يُحِبُّ الْمُتَكِينْ

B — Mahmoud Elsaey (rejected, reject_technical_issue)

قَالُوا ادْعُ لَنَا رَبَّكَ يُبَيِّنْ لَنَا مَا هِيَ إِنَّ الْبَقَرَ تَشَابَهَ عَلَيْنَا وَإِنَّا إِنْ شَاءَ اللَّهُ لَمُهْتَدُونْ قَالَ إِنَّهُ يَقُولُ إِنَّهَا بَقَرَةٌ ذَلُولٌ لَا تَثِيرُ الْأَرْضَ وَلَا

One side truncated task 335987 · Batch 35 · 159 char edits · 35 word edits · token overlap 100%

A — عصام عبد الحميد عبد العزيز (accepted, annotator_focus_issue)

مَوْلَى الَّزِينَ آمَنُوا وَأَنَّ الْكَافِرِينَ لَا مَوْلَى لَهُمْ إِنَّ اللَّهَ يُدْخِلُ الَّذِينَ آمَنُوا وَعَمِلُوا الصَّالِحَاتِ جَنَّاتٍ تَجْرِي مِنْ تَحْتِهَا الْأَنْهَارْ وَالَّذِينَ كَفَرُوا يَتَمَتَّعُونَ وَيَأْكُلُونَ كَمَا تَأْكُلُ الْأَنْعَامُ وَالنَّارُ مَسْوًى لَهُمْ وَكَأَيِّنْ مِنْ قَرْيَةٍ هِيَ أَشَدُّ قُوَّةً مِنْ قَرْيَتِكَ الَّتِي أَخْرَجَتْكَ أَهْلَكْنَاهُمْ فَلَا نَاصِرَ

B — awad fdila (draft, annotator_focus_issue)

مَوْلَى الَّزِينَ آمَنُوا وَأَنَّ الْكَافِرِينَ لَا مَوْلَى لَهُمْ

One side truncated task 351542 · Batch 35 · 107 char edits · 38 word edits · token overlap 31%

A — Abdullah Mohamed Samir (draft, annotator_focus_issue)

أَفَمَنْ يَنْصُرُنَا رَبِّهْ فَيَذَرُوهَا فَيَذَرُوهَا طَاغًا وَلَا أَمْتَا يَوْمَئِذٍ يَتَذَكَّرُ يَوْمَئِذٍ وَ خَشِيَةٍ عَلَى الرَّحْمَنِ لَا تَسْمَعُ إِلَّا

B — Gehad Refaat (submitted, annotator_focus_issue)

الْجِبَ فَقُلْ يَنْسِفُهَا رَبِّي نَسْفًا فَيَذَرْ فَيَذْ فَيَذَرُهَا فَيَذَرُهَا قَاءً صَفْصَفَا لَا تَرَى فِيهَا عِوَجًا وَلَا أَمْتَا يَوْمَئِذٍ يَتَّبِعُونَ الدْ يَوْمَئِذٍ يَتَّبِعُونَ الدَّائِيَ لَا حِوَجَ مِنْهُ لَا عِوَجًا لَهْ وَخَشَعْ وَخَشِ وَخَشِيَ وَخَشِيَتِ الْ أَصْوَاتُ لِلرَّحْمَانِ فَلَا تَسْمَعُ هِلْ لَا هَمْسًا يَوْمَئِذٍ

One side truncated task 131075 · Batch 34 · 98 char edits · 22 word edits · token overlap 0%

A — Marwan (escalated, annotator_focus_issue)

B — Mahmoud Elsaey (accepted)

لِنُخْرِجَ بِهِ هَبًّا وَنَبَاتَا وَجَنَّةٍ أَلْفَافَا

One side truncated task 356100 · Batch 35 · 97 char edits · 28 word edits · token overlap 75%

A — Fares Moustafa (submitted)

عَالِمُ الْغَيْبِ لَا يَعْلَمُهَا

B — Taha sobhi (submitted)

وَعِنْدَهُ مَفَاتِحُ الْغَيْبِ لَا يَعْلَمُهَا إِلَّا هُوْ يَعْلَمُ مَا فِي الْبَرِّ وَالْبَحْرْ وَمَا تَسْقُطُ مِنْ وَرَقَةٍ إِلَّا يَعْلَمُهَا وَلَا حَبَّةٍ فِي ظُلُمَاتِ الْأَرْضِ وَلَا رَطْبٍ وَلَا يَابِسٍ إِلَّا فِي كِتَابٍ مُبِينْ

One side truncated task 327285 · Batch 35 · 65 char edits · 16 word edits · token overlap 25%

A — Mariam Khaled (accepted)

اءُ مِنْ بَعْدُ وَلَا أَنْ تَبَدَّلَ بِهِنَّ مْ

B — awad fdila (escalated, annotator_focus_issue)

وَكَذَالِكَ نَجْزِي مَنْ أَسْرَفَ وَلَمْ يُؤْمِنْ بِآيَاتِ رَبِّهْ وَلَعَدَابُ الْآخِرَةِ أَشَدُّ وَأَبْقَى وَكَمْ أَهْلَكْنَا قَبْلَهُمْ مِ أَفَلَمْ يَهْدِ

hasMistakes disagreement task 155050 · Batch 34 · 157 char edits · 10 word edits · token overlap 64%

A — ghada ahmed (draft, annotator_focus_issue)

فِي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَافِرُونْ فِ ي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَا فِي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ فِي الْحَيَاةِ الدُّنْيَا وَفِي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَافِرُونْ

B — عصام عبد الحميد عبد العزيز (accepted, annotator_focus_issue)

فِي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَافِرُونْ فِي الْحَيَاةِ الدُّ وَتَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَافِ فِي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ فِي الْحَيَاةِ الدُّنْيَا وَ تَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَافِرُونْ فِي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَافِرُونْ

hasMistakes disagreement task 220396 · Batch 34 · 125 char edits · 38 word edits · token overlap 79%

A — Ahmed Saber (escalated)

لَاهَ كَانَ غَفُورْ رَحِيمَا وَلَقَدْ وَصَّيْنَا الَّذِينَ أُوتُوا الْكِتَابَ مِنْ قَبْلِكُمْ لَعَلَّكُمْ تَتَّقُونْ وَإِنْ تَكْفُرُوا فَإِنَّ اللَّهَ غَنِيٌّ حَمِيدْ اللَّهَ يُؤْتِي وَكَانَ اللَّهُ عَلَى ذَلِكَ قَدِيرَا مَنْ كَانَ يُرِيدُ ثَوَابَ الدُّنْيَا فَعِنْدَ اللَّهِ ثَوَابُ الْآخِرَةِ كَانَ اللَّهُ سَمِيعٌ بَصِيرَا يَا أَيُّهَا الَّذِينَ آمَنُوا

B — عبد الرحمن (draft, annotator_focus_issue)

اللَّهَ كَانَ غَفُورًا رَحِيمَا مَا فِي السَّمَاوَاتِ وَمَا فِي الْأَرْضْ وَلَقَدْ وَصَّيْنَا الَّذِينَ أُوتُوا الْكِتَابَ مِنْ قَبْلِكُمْ وَإِيَّاكُمْ أَنِ تَتَّقُونْ وَإِنْ تَكْفُرُوا فَإِنَّ للَّهِ مَا فِي السَّمَاوَاتِ وَمَا فِي الْأَرْضْ وَكَانَ اللَّهَ غَنِيًّا حَمِيدَا للَّهِ مَا فِي السَّمَاوَاتِ وَمَا فِي الْأَرْضْ وَكَفَى بِاللَّهِ وَكِيلَا إِنْ يَشَأْ يُذْهِبْكُمْ أَيُّهَا النَّاسُ وَيَأْتِ بِآخَرِينَ وَكَانَ اللَّهُ عَلَى ذَالِكَ قَدِيرًا مَنْ كَانَ يُرِيدُ ثَوَابَ الدُّنْيَا فَعِنْدَ اللَّهِ ثَوَابُ الْآخِرَهْ كَانَ اللَّهُ سَمِيعٌ بَصِيرَا يَا أَيُّهَا الَّذِينَ آمَنُوا

hasMistakes disagreement task 196169 · Batch 34 · 114 char edits · 30 word edits · token overlap 70%

A — Ahmed Saber (escalated)

إِنَّ أَصْحَابَ الجَّنَّةِ الْيَوْمَ فِي شُغُلٍ فَاكِهُونْ هُمْ وَأَزْوَاجُهُمْ فِي ظِلَالٍ عَلَى الْأَرَائِكِ مُتَّكِئُونْ لَهُمْ فِيهَا فَاكِهَةُ وَلَهُمْ مَا يَدَّعُونْ سَلَامٌ قَوْلًا مِنْ رَبٍّ رَحِيمْ وَامْتَازُوا الْيَوْمَ أَيُّهَا الْمُجْرِمُونْ أَلَمْ أَعْهَدْ إِلَيْكُمْ يَا بَنِي آدَمَ لَا تَعْبُدُوا الشَّيْطَانْ إِنَّهُ عَدُوٌّ مُبِينٌ مُبِينْ أَعُوذُ بِاللهِ مِنَ الشَّيْطَانِ الرَّجِيمْ وَمَا كَانَ لِتَعْلَمُوا أَنَّمَا تَفْعَلُونْ

B — Mariam Khaled (submitted)

إِنَّ أَصْحَابَ الْجَنَّةِ الْيَوْمَ فِي شَغَلٍ فَاكِهُونْ هُمْ وَأَزْوَاجُهُمْ فِي ظِلَا عَلَى الْأَرَائِكِ مُتَّكِ لَهُمْ فِيهَا فَاكِهَةٌ وَلَهُمْ مَا سَلَامٌ قُوْلًا مِنْ رَبٍّ رَ وَامْتَازُوا الْيَوْمَ أَيُّهَا الْمُجْرِمْ أَلَمْ أَعْهَدْ إِلَيْكُمْ يَا بَنِي آدَمَ لَا تَعْبُدُوا الشَّيْطَانْ إِنَّهُ عَدُوٌّ مُمِنٌ وَ أَ أَعْبُدُونِ هَاذَا صِرَاطَ مُسْتَ وَ أَضَلَ مِنْكُمْ جِبْلْاً كَثِ أَمْ تَكُ تَقِلُو هَاذِهِ جَهَنْمُ الْتِي كُنْتُمْ تُوعَدُ اصْلَوْهَا الْيَوْمَ بِمَا كُنْتُمْ تَكْفُرُو

hasMistakes disagreement task 330138 · Batch 35 · 97 char edits · 24 word edits · token overlap 92%

A — Yasser Waled (submitted)

فَلَمَّا ذَهَبَ عَنْ إِبْرَاهِيمَ الرَّوْعُ وَجَاءَتْهُ الْبُشْرَى يُجَادِلُنَا فِي قَوْمِ لُوطْ فَلَمَّا ذَهَبَ عَنْ إِبْرَاهِيمَ الرَّوْعُ وَجَاءَتْهُ الْبُشْرَى يُجَادِلُنَا فِي قَوْمِ لُوطْ فَلَمَّا ذَهَبَ عَنْ إِبْرَاهِيمَ الرَّوْعُ وَجَاءَتْهُ الْبُشْرَى يُجَادِلُنَا فِي قَوْمِ لُوطْ فَلَمَّا فَلَمَّا ذَهَبَ عَنْ إِبْ

B — Mahmoud Abdo (submitted)

فَلَمَّا ذَهَبَ عَنْ إِبْرَاهِيمَ الرَّوْعُ وَجَاءَتْهُ الْبُشْرَى يُجَادِلُنَا فِي قَوْمِ لُوتْ فَلَمَّا زَهَبَ عَنْ إِبْرَاهِيمَ الرَّوْعُ وَجَاءَتْهُ الْبُشْرَى يُجَادِلُنَا فِي قَوْمِ لُوتْ فَلَمَّا ذَهَبَ عَنْ إِبْرَاهِيمَ الرَّوْعُ وَجَاءَتْهُ الْبُشْرَى يُجَادِلُنَا فِي قَوْمِ لُوتْ فَلَمَّا فَلَمَّا ذَهَبَ عَنْ إِبْ

hasMistakes disagreement task 219168 · Batch 34 · 96 char edits · 25 word edits · token overlap 92%

A — أبو مسلم الأزهري (draft, annotator_focus_issue)

إِذًا لَأَذَقْنَاكَ ضِعْفَ الْحَيَاةِ وَضِعْفَ الْمَمَاتِ ثُمَّ لَا تَجِدُ لَكَ عَلَيْنَا نَصِيرَا إِذًا لَأَذَقْنَاكَ ضِعْفَ الْحَيَاةِ وَضِعْفَ الْمَمَاتِ ثُمَّ لَا تَجِدُ لَكَ عَلَيْنَا نَصِيرَا

B — Omar Yosri (accepted)

إِذًا لَأَدْقْنَاكَ طِعْفَ الْحَيَاةِ وَضِعْفَ الْمَمَاتِ ثُمَّ لَا تَجِدُ لَكَ عَلَيْنَا نَصِيرَا إِذًا لَأَضَقْنَاكَ ضِعْفَ الْحَيَاةِ وَضِعْفَ الْمَمَاتِ ثُمَّ لَا تَجِدُ لَكَ عَلَيْنَا نَصِيرَا

Hard audio / substantive disagreement task 365062 · Batch 35 · 335 char edits · 107 word edits · token overlap 74%

A — Taha sobhi (submitted)

وَهَازَا وَهَاذَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا وَفَصِيلَ الْآيَاتِ لِقَوْمٍ يَ لِقَوْمٍ يَذَّكَّرُونْ وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا وَهَازَا صِرَاطُ رَ صَ صِرَاهَطْ وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا قَدْ فَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَذَّكَّرُونْ وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا قَدْ فَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَذَّكَّرُونْ وَمَنْ فَمَنْ يُرِدِ اللهْ أَنْ يَ أَنْ يَهْدِيهْ سَ يَشْرَحْ صَدْرَهُ لِلْإِسْلَامْ وَمَنْ يُرِدْ أَنْ يُضِلَّهْ يَجْعَلْ صَدْرَهُ ضَيِّقًا جَرَحًا كَأَنَّمَا يَصَّعَّدْ فِي السَّمَاءِ كَذَالِكَ يَجْعَلُ اللَّهُ الرِّجْسَ عَلَى الَّذِينَ لَا يُؤْمِنُوا وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا قَدْ فَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَذَّكَّرُونْ

B — محمد سلامة محمد (submitted)

وَهَازَا وَهَازَا صِرَاطُ رَبُّكَ مُسْتَقِيمًا هَفَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَ لِقَوْمٍ يَزَّكَّرُونْ وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا وَهَازَا صِرَاطُ ظِ سَصِرَ هَازَ صِرَاطُ هَازَا صِرَاطُ رَبُّكَ مُسْتَقِيمًا قَدْ فَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَزَّكَّرُونْ وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا فَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَزَّكَّرُونْ مَنْ فَمَنْ يُرِدَ اللهُ أَنْ يَ أَنْ يَهْدِيهِ وَمَنْ يَشْرَحْ صَدْرَ لِلْإِسْلَامِ وَمَنْ يُرِدْ أَنْ يُضِلَّهْ يَجْعَلُهُ صَدْرَهُ ضَيِّقًا جَرَحًا كَأَنَّمَا يَصَّعَّدُ فِي السَّمَاءْ كَزَالِكَ يَجْعَلُ اللَّهُ الرِّجْزَ عَلَى الَّذِينَ لَا يُؤْمِنُونْ وَهَازَا صِرَاطُ رَبُّكَ مُسْتَقِيمًا قَدْ فَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَزَّكَّرُونْ

Hard audio / substantive disagreement task 349112 · Batch 35 · 293 char edits · 68 word edits · token overlap 100%

A — حسناء علاء حسن عبدالظاهر (accepted, annotator_focus_issue)

B — Ahmed Khairy (accepted, annotator_focus_issue)

يَقُولُ السُّفَهَاءُ مِنَ النَّاسِ مَا وَلَّاهُمْ عَنْ قِبْلَتِهِمُ الَّتِي كَانُوا عَلَيْهِمْ قُلْ لِلَّهِ الْمَشْرِقْ وَالْمَغْرِبُ يَهْدِي مَنْ يَشَاءُ إِلَى صِرَاطٍ مُسْتَقِيمْ وَكَذَالِكَ جَعَلْنَاكُمْ أُمَّةً وَسَطًا لِتَكُونُوا شُهَدَاءَ عَلَى النَّاثِ وَيَكُونَ الرَّسُولُ عَلَيْكُمْ شَهِيدَمَ وَمَا جَعَلْنَا الْقِبْلَةَ الَّتِي كُنْتَ عَلَيْهَا إِلَّا لِنَعْلَمْ مَنْ يَتَّبِعُ الرَّسُولُ مِمَّنْ يَنْقَلِبُ عَلَى عَقِبَيْهْ وَإِنْ كَانَتْ لَكَبِيرَةً إِلَّا عَلَى الَّذِينَ هَدَى اللَّهْ وَمَا كَانَ اللَّهُ لِيُضِيعَ إِمَانَكُمْ إِنَّ اللَّهَ بِالنَّاسِ لَرَءُوفٌ رَحِيمْ قَدْ نَرَى تَقَلُّبَ وَجْهِكَ فِي السَّمَاءْ فَلَنُوَلِّيَنَّكَ قِبْلَةً تَرْضَاهْ فَوَلِّ وَجْهَكَ شَرَ الْمَسْجِدِ الْحَرَامْ وَحَيْثُ مَا كُنْتُمْ فَوَلُّوا وُجُوهَكُمْ شَطْرَهْ إِلَى صِرَاطٍ مُسْتَقِيمْ وَكَذَالِكَ جَعَلْنَاكُمْ أُمَّةً وَسَطًا لِتَكُونُوا شُهَدَاءَ عَلَى النَّاثِ وَيَكُونَ الرَّسُولُ عَلَيْكُمْ شَهِيدَمَ وَمَا جَعَلْنَا الْقِبْلَةَ الَّتِي كُنْتَ عَلَيْهَا إِلَّا لِنَعْلَمْ مَنْ يَتَّبِعُ الرَّسُولُ مِمَّنْ يَنْقَلِبُ عَلَى عَقِبَيْهْ وَإِنْ كَانَتْ لَكَبِيرَةً إِلَّا عَلَى الَّذِينَ هَدَى اللَّهْ وَمَا كَانَ اللَّهُ لِيُضِيعَ إِمَانَكُمْ إِنَّ اللَّهَ بِالنَّاسِ لَرَءُوفٌ رَحِيمْ قَدْ نَرَى تَقَلُّبَ وَجْهِكَ فِي السَّمَاءْ فَلَنُوَلِّيَنَّكَ قِبْلَةً تَرْضَاهْ فَوَلِّ وَجْهَكَ شَرَ الْمَسْجِدِ الْحَرَامْ وَحَيْثُ مَا كُنْتُمْ فَوَلُّوا وُجُوهَكُمْ شَطْرَهْ

Hard audio / substantive disagreement task 372788 · Batch 35 · 192 char edits · 49 word edits · token overlap 95%

A — عصام عبد الحميد عبد العزيز (draft, annotator_speed_mistake)

ذَالِكَ بِأَنَّهُمْ خَالُوا إِنَّمَا الْبَيْعُ مِثْلُ الرِّبَا وَأَحَلَّ اللَّهُ الْبَيْعَ وَحَرَّمَ الرِّبَا فَمَنْ جَاءَهُ مَوْعِظَةٌ مِنْ رَبِّهِ فَانْسَهَى فَلَهُ مَا سَلَفَ وَأَمْرُهُ إِلَى اللَّهْ ذَالِكَ بِأَنَّهُمْ قَالُوا إِنَّمَا الْبَيْعُ مِثْلُ الرِّبَا وَأَحَلَّ اللَّهُ الْبَيْعَ وَحَرَّمَ الرِّبَا فَمَنْ جَاءَهُ مَوْعِظَةٌ مِنْ رَبِّهِ فَانْتَهَى فَلَهُ مَا سَلَفَ وَأَمْرُهُ إِلَى اللَّهْ وَمَنْ عَادَ فَأُلَائِكَ أَصْحَابُ النَّارِ هُمْ فِيهَا خَالِدُونْ يَمْحَقُ اللَّهُ الرِّبَا وَيُرْبِي الصَّدَخَاتْ وَاللَّهُ لَا يُحِبُّ كُ يَمْحَقُ اللَّهْ

B — Waled Mohamed (submitted, annotator_speed_mistake)

ذَالِكَ بِأَنَّهُمْ خَالُوا إِنَّمَا الْبَيْعُ مِثْلُ الرِّبَا وَحَلَّ اللَّهُ الْبَيْعَ وَحَرَّمَ الرِّبَا فَمَنْ جَاءَهُ مَوْعِظَةٌ مِنْ رَبِّهِ فَانْتَهَى فَلَهُ مَا سَلَفَ وَأَمْرُ إِلَى اللَّهْ ذَالِكَ بِأَنَّهُمْ خَالُوا إِنَّمَا الْبَيْعُ مِثْلُ الرِّبَا وَأَحَلَّ اللَّهُ الْبَيْعَ وَحَرَّمَ الرِّبَا فَمَنْ جَاءَهُ مَوْعِدَظٌ مِنْ رَبِّهِ فَانْتَهَى فَلَهُ مَا سَلَفَ وَأَمْرُهُ إِلَى اللَّهْ وَمَنْ عَادَ فَأُلَائِكَ أَصْحَابُ النَّارِ هُمْ فِيهَا خَالِدُونْ يَمْحَقُ اللَّهُ الرِّبَا وَيُرْبِي الصَّدَخَاتْ وَاللَّهُ لَا يُحِبُّ كُ يَمْحَقُ اللَّهْ

Hard audio / substantive disagreement task 361203 · Batch 35 · 167 char edits · 42 word edits · token overlap 98%

A — Yasser Waled (submitted)

B — awad fdila (submitted)

وَلَئِنْ أَرْسَلْنَا رِيحًا فَرَأَوْهُ مُصْفَرًّا لَظَلُّوا مِنْ بَعْدِهِ يَكْفُرُونْ فَإِنَّكَ لَا تُسْمِعُ الْمَوْتَى وَلَا تُسْمِعُ الصُّمَّ الدُّعَاءَ إِذَا وَلَّوْا مُدْبِرِينْ وَمَا أَنْتَ بِهَادِي الْعُمْيِ عَنْ ضَلَالَتِهِمْ إِنْ تُسْمِعُ إِلَّا مَنْ يُؤْمِنُ مُسْلِمُونْ اللَّهُ الَّذِي خَلَقَكُمْ مِنْ ضَعْفٍ ثُمَّ جَعَلَ مِنْ بَعْدِ ضَعْفٍ قُوَّةً ثُمَّ جَعَلَ مِنْ قُوَّةٍ ضَعْفًا وَشَيْبَهْ يَخْلُقُ مَا يَشَاءْ وَهُوَ الْعَلِيمُ الْقَدِيرْ وَيَوْمَ تَقُومُ السَّاعَةُ يُقْسِمُ الْمُجْرِمُونْ مَا لَبِثُوا غَيْرَ سَاعَهْ كَذَالِكَ كَانُوا يُؤْفَكْ كَذَالِكَ يُؤْفَكُونَ وَقَالَ الَّذِينَ أُوتُوا الْعِلْمَ وَالْإِيمَانَ لَقَدْ وَلَقَدْ ضَرَبْنَا لِلنَّاسِ فِي هَاذَا الْقُرْآنِ مِنْ كُلِّ مَثَلٍ وَلَئِنْ جِئْتَهُمْ بِآيَةٍ لَيَقُولَنَّ الَّذِينَ كَفَرُوا إِنْ أَنْتُمْ إِلَّا مُبْطِلُونَ كَذَالِكَ يَطْبَعُ عَلَى قُلُوبِ الَّذِينَ لَا يَعْلَمُونَ فَاصْبِرْ إِنَّ وَعْدَ اللَّهِ حَقٌّ وَلَا يَسْتَخِفَّنَّكَ الَّذِينَ لَا يُوقِنُونَ

Hard audio / substantive disagreement task 356781 · Batch 35 · 151 char edits · 52 word edits · token overlap 67%

A — Yasser Mohamad Mohamad (submitted)

حُرِّمَتْ عَلَيْكُمُ الْمَيْتَةُ وَالدَّمُ وَلَحْمُ الْخِنْزِيرِ وَمَا أُهِلَّ لِغَيْرِ اللَّهِ بِهِ وَالْمُنْخَنِقَةُ وَالْمَوْقُوذَةُ وَالْمُتَرَدِّيَةُ وَالنَّطِيحَةُ وَمَا أَكَلَ السَّبُعُ إِلَّا مَا ذَكَّيْتُمْ وَمَا ذُبِحَ عَلَى النُّصُبِ وَأَنْ تَسْتَقْسِمُوا بِالْأَزْلَامِ ذَالِكُمْ فِسْقٌ الْيَوْمَ يَئِسَ الَّذِينَ كَفَرُوا مِنْ دِينِكُمْ فَلَا تَخْشَوْهُمْ وَاخْشَوْنِ الْيَوْمَ أَكْمَلْتُ لَكُمْ دِينَكُمْ وَأَتْمَمْتُ عَلَيْكُمْ نِعْمَتِي وَرَضِيتُ لَكُمُ الْإِسْلَامَ دِينًا وَتُوأَ وَمَا أَكَلَ السَّبُعُ إِلَّا مَا ذَكَرَ وَتُحَمْ وَ تَسْقِمُوا بِالْأَزْلَامِ وَمَا أَكَلَ السَّبُعُ إِلَّا مَا ذَكَّيْتُمْ وَمَا ذُبِحَ عَلَى النُّصُبِ وَأَنْ تَسْتَقْسِمُوا بِالْأَزْلَامِ وَمَا أَكَلَ السَّبُعُ إِلَّا مَا ذَكَّيْتُمْ وَمَا ذُبِحَ عَلَى النُّصُبِ وَأَنْ تَسْتَقْسِمُوا بِالْأَزْلَامِ

B — Ahmed Fawzy (submitted)

حُرِّمَتْ عَلَيْكُمْ مَيْتَةُ وَالدَّمُ وَلَحْمُ الْخِنْزِيرْ مُلَّ لِغَيْرِ اللَّهِ بِهْ وَالْمُنْخَنِقَةُ وَالْمَوْقُوسَهْ وَالْمُتَرَدِّيَةُ وَالنَّطِيحَةُ وَمَا أَكَلَ السَّبُعُ إِلَّا مَا سَكَّيْتُمْ وَمَا سُبْحَ عَلَى النُّصُبِ أَنْ تَسْتَلِمُوا بِالْأَزْلَامْ سَالِكُمْ فِسْقْ الْيَوْمَ يَئِسَ الَّذِينَ أَفَوْا مِنْ دِينِكُمْ فَلَا تَخْشَوْهُمْ وَاخْشَوْنْ الْيَوْمَ أَكْمَتُ لَكُمْ دِينَكُمْ وَأَتْمَمْتُ عَلَيْكُمْ نِعْمَتِي وَرَدِيتُ لَكُمُ الْإِسْلَامَ دِينَا وَتُأَ وَمَا أَكَلَ السَّبُعُ إِلَّا مَا سَكَّيْتُمْ وَمَا سُبْهَ عَلَى النُّزُبْ تَسْلِمُوا بِالْأَزْلَامْ مَتْ مَتْ وَمَا أَكَلَ السَّبُعُ مَا سَكَّيْتُمْ مَا سُبْحَ عَلَى النُّصُبْ تَسْلِمُوا بِالْأَزْ مَا أَكَلَ سَبُعُ إِلَّا مَا سَكَّيْتُمْ وَمَا سُبْهَ عَلَى النُّزُبْ لَى اللَّهِ أَنْ تَسْلِمُوا بِالْأَزْلَامْ

Top word confusions (normalised — diacritic-collapsed)

Each row is a word pair the team can't agree on. The whole top of this list is Rule-1 silent-alif spelling. arabic_encode here collapses ة↔ه, ى↔ي, and hamza variants — those show up in the next table instead.

word A	word B	count
ذالك	ذلك	591
هاذا	هذا	525
الرحمان	الرحمن	409
ولاكن	ولكن	405
الذين	الزين	268
كذالك	كذلك	201
الاه	اله	197
الاءك	اولءك	190
ان	انا	145
الدين	الذين	138
الذي	الزي	136
الارض	الارظ	135
الذي	الذين	126
اذا	ازا	121
ثم	سم	109
هاذه	هذه	100
رب	ربي	99
وكذالك	وكذلك	97
ما	وما	92
ذالك	زالك	86

Top word confusions (diacritic-stripped — consonants preserved)

Diacritics removed, but ة, ه, ى, ي, and the hamza family are kept distinct. Surfaces pausal taa-marbuta (ة↔ه), alif-maksura (ى↔ي), and hamza-seat disagreements that the fully-normalised table hides.

word A	word B	count
ذالك	ذلك	592
هاذا	هذا	525
الرحمان	الرحمن	410
ولاكن	ولكن	405
الذين	الزين	264
كذالك	كذلك	201
إلاه	إله	196
ألائك	أولئك	190
الدين	الذين	138
الأرض	الأرظ	135
الذي	الزي	135
إن	إنا	132
الذي	الذين	123
إذا	إزا	121
ثم	سم	106
هاذه	هذه	100
رب	ربي	99
وكذالك	وكذلك	96
ما	وما	92
ذالك	زالك	86

Top word confusions (raw — every diacritic counts)

Same pairs as above with the diacritics that surrounded them — useful for sanity-checking which haraka pattern is in play.

word A (raw)	word B (raw)	count
ذَالِكَ	ذَلِكَ	556
هَاذَا	هَذَا	525
الرَّحْمَانِ	الرَّحْمَنِ	316
اللَّهُ	اللَّهْ	304
اللَّهَ	اللَّهْ	291
اللَّهِ	اللَّهْ	273
الَّذِينَ	الَّزِينَ	258
وَلَاكِنْ	وَلَكِنْ	228
كَذَالِكَ	كَذَلِكَ	193
أُلَائِكَ	أُولَئِكَ	186
وَلَاكِنَّ	وَلَكِنَّ	156
الَّذِي	الَّزِي	134
الَّدِينَ	الَّذِينَ	132
وَهُوَ	وَهْوَ	126
إِذَا	إِزَا	115
الْحَقّْ	الْحَقْ	111
إِنَّ	إِنَّا	110
إِلَاهَ	إِلَهَ	103
ثُمَّ	سُمَّ	102
هَاذِهِ	هَذِهِ	99

Top character confusions

Single-character substitutions counted across all pair-edits. The first rows are diacritic-mark swaps (fatha ↔ sukun, damma ↔ sukun, …) — these are the bulk of the diacritic-noise wedge above.

char A	char B	count
َ	ْ	7291
ُ	ْ	4450
َ	ُ	4234
َ	ِ	3980
ِ	ْ	3451
ذ	ز	2368
ُ	ِ	2109
ّ	ْ	1498
ة	ه	1455
د	ذ	1259
ح	ه	1143
ض	ظ	1133
ً	َ	1085
ث	س	1031
س	ص	959
ت	د	875
ق	ك	814
أ	ع	752
ل	ن	725
أ	ا	694

3. Annotator rule violations

What this answers: Which annotators produce the most silent-alif and hasMistakes-flip violations, and how much does review catch?

Findings

Sherif Bakry has the worst silent-alif rate: 0.97 per 100 words among annotators with at least 200 annotations.
hany saied has the highest hasMistakes=True/text-match rate: 41.9%. The top offenders cluster tightly, which points to SOP interpretation rather than one isolated person.
Review catch rate ranges from 2% to 100% in the cleaned chart. 6 annotators had negative catch values in the raw table; those are denominator artifacts from comparing pre-review and accepted-only word pools, so the chart drops them.

Question this plot answers: Which high-volume annotators produce silent-alif violations most densely, after normalising by word count?

Observation: This is a density metric per 100 words, so it is the right chart for coaching producers. It is not just rewarding low-volume annotators.

Question this plot answers: Who most often marks hasMistakes=True even when their submitted text matches the reference?

Caveat: A high value means the annotator marked hasMistakes=True while the text still matched the reference. That is usually a flag-rubric problem, not necessarily bad transcription.

Question this plot answers: For annotators with enough silent-alif evidence, how much does review reduce the violation rate before final acceptance?

Caveat: Review-catch is a rate comparison, not a direct count of caught rows. Negative raw values are possible when accepted rows have a different word denominator; those rows are excluded here for manager readability.

4. Inter-annotator agreement (κ)

What this answers: When two annotators see the same audio, how often do they agree? Where do they systematically diverge?

Cohen κ — hasMistakes

0.819

two-rater · pair-pooled

Fleiss κ — hasMistakes

0.818

63,671 tasks

κ — word-level mistake

0.911

closest to model supervision

κ — phoneme-level

0.879

What κ means and why we use it

Raw "% agreement" is misleading on imbalanced labels: if 90% of tasks have no mistake, two raters can agree 90% of the time just by both saying "no mistake" — without actually looking at the audio. Cohen's κ and Fleiss' κ correct for chance agreement: they answer "how much better than coin-flip do the raters agree?". The formula is κ = (p_observed − p_{expected by chance}) / (1 − p_{expected by chance}).

Cohen's κ — pairwise (two raters at a time). We use it to score every annotator pair that share ≥1 task. This gives us the heatmap and the worst-pair leaderboard — granular, names names.
Fleiss' κ — generalises Cohen's to multi-rater tasks. When ≥3 raters touched the same task we can't reduce it to a single pair; Fleiss aggregates across the full rater pool. We report it for the team as a whole over the 63,671 tasks that have multiple annotators (here, mostly 2 raters per task — so Cohen and Fleiss land very close: 0.819 vs 0.818).

Beyond hasMistakes (binary), we also report word-level κ (every word labelled as mistake-or-not — closest to the model supervision signal) and phoneme-level κ (after aligning each word's phonemes).

In §7, median_word_kappa is simply the median of each cohort's per-annotator word_kappa_mean. In plain English: for a volume cohort, it asks "what is the typical annotator's chance-corrected agreement with peers on which words are mistakes?" It is a cohort baseline, not an individual score.

How to read the numbers (Landis & Koch convention):

κ range	Agreement
0.00 – 0.20	Slight
0.21 – 0.40	Fair
0.41 – 0.60	Moderate
0.61 – 0.80	Substantial
0.81 – 1.00	Almost perfect ← where we are

Findings

Team-wide Cohen κ on hasMistakes = 0.819 (substantial agreement; >0.80 is the production-quality threshold). Fleiss κ over 63,671 tasks = 0.818.
Word-level mistake κ = 0.911 — excellent. This is the metric closest to the model's supervision signal.
Text agreement: 62.5% exact, 69.1% Imlaei-normalised, 78.6% diacritic-normalised. So 16 pp of disagreement is pure normalisation (diacritics/spacing), not real annotation divergence.
Top phoneme confusion is ةْ ↔ هْ (16,095 times) — the ة/ه vowel-ending ambiguity dominates. Next: ا ↔ نْ (6,092), ذ ↔ ز (~5,400 across kasra+fatha forms).

Question this plot answers: How much text agreement do we recover as we apply stronger normalisation?

Observation: The widening gap from exact to diacritic-normalised agreement quantifies normalisation noise. The diacritic-normalised value is the best proxy for semantic transcription agreement.

Question this plot answers: Which phoneme-level substitutions recur most often between annotators?

Observation: The phoneme chart surfaces recurring sound/spelling ambiguities. High counts here can come from many tiny endings, so use examples before writing a rule change.

Question this plot answers: Which annotator pairs have the weakest word-level agreement after requiring enough shared tasks?

Caveat: Low pairwise κ identifies annotator pairs that disagree; it does not prove which annotator is wrong. Use it as a sampling list for audio review.

5. Audio duration per batch

What this answers: How does per-task audio length distribute, and how much sits above the 20 s training cap?

Findings

Batch 34 is cleanly under cap: 121,733 tasks · median 14.4 s · p95 19.4 s · 0 tasks over the 20 s training cap.
Batch 35 has 72.1% of its tasks over the cap: 49,984 tasks · median 25.8 s · p95 38.2 s · 36,033 tasks over 20 s. Either the batch needs trimming or the training-side cap needs to lift.

Question this plot answers: What is the audio-length distribution, and how many tasks cross the 20 s training cap?

Observation: The 20 s line is the training-cap boundary, not an annotation-quality threshold. Batch 35 contributes the most over-cap clips: 36,033 tasks (72.1%).

Question this plot answers: How do the median, spread, and tail length compare between batches?

Caveat: The histogram is capped at 60 s for readability. The summary table is the safer source for cap-policy decisions because it keeps the full per-task counts.

Per-batch summary

batch	n_tasks	median_s	p95_s	over_20s	over_20s_pct
Batch 34	121733	14.380	19.380	0	0.0%
Batch 35	49984	25.800	38.180	36033	72.1%

6. Task lifecycle — where does time go?

What this answers: How long does a task spend in each pipeline stage, and where is the bottleneck?

How each metric is calculated

Producer: jobs/lifecycle/build.py → basirah.lifecycle.build_task_lifecycle. Two independent clocks per task: wall-clock timestamps (*_at) and front-end UI-active milliseconds (lead_time_ms).

metric	source	formula
`wait_for_annotator_s`	wall-clock	`first_annotation.annotation_created_at − task.task_created_at`
`annotation_lead_time_total_s`	UI-active	`sum(annotation.lead_time_ms) / 1000` over all annotations on the task (resubmissions counted)
`wait_between_annotation_and_review_s`	wall-clock (inferred)	`review_start_proxy − submission_proxy`, where `submission_proxy = first_annotation.annotation_created_at + first_annotation.lead_time_ms` and `review_start_proxy = first_review.created_at − first_review.lead_time_ms`. Negative values dropped to NA.
`review_lead_time_total_s`	UI-active	`sum(review.lead_time_ms) / 1000` over all review events on the task
`idle_time_s`	derived	`total_time_to_accept_s − (annotation_lead_time_total_s + review_lead_time_total_s)` — wall-clock minus active UI time. Negatives dropped to NA.
`total_time_to_accept_s`	wall-clock	`max(annotation.accepted_at) − task.task_created_at`. Unresolved tasks (`is_resolved = False`) are excluded from the plots below.

annotation.updated_at is not used as a submission timestamp — review events and draft saves both bump it, so it isn't a reliable proxy for when the annotator stopped editing. lead_time_ms is the only honest UI-active duration in the schema.

Findings

Batch 34 takes 77.1 days median end-to-end; Batch 35 takes 21.8 days median. Batch 35 is 3.5× faster — and that's because of one stage:
The bottleneck is the wait between annotation submission and review. Batch 34: 12.3 days median; Batch 35: 9 minutes median. That's a ~2000× difference. Reviewers on Batch 35 stay on top of the queue; reviewers on Batch 34 don't.
Hands-on work is ~3 minutes per task on both batches (1.4 min annotation + 1.5 min review). Everything else in the 21-77 day window is queue / idle time.
60% of resolved tasks needed at least one sent-back round; 12% needed two or more.

Median per phase, per batch

phase	batch	n	median	p25	p75	p95
Wait for annotator (queue)	Batch 34	53743	45.8 d	36.5 d	74.0 d	81.4 d
Wait for annotator (queue)	Batch 35	27479	21.3 d	14.5 d	28.0 d	32.9 d
Annotation hands-on (UI)	Batch 34	53743	1.2 min	37.2 s	2.1 min	4.5 min
Annotation hands-on (UI)	Batch 35	27479	1.8 min	1.1 min	2.9 min	5.8 min
Wait between annotation & review (queue)	Batch 34	53504	12.3 d	3.9 d	24.9 d	55.5 d
Wait between annotation & review (queue)	Batch 35	26226	9.0 min	3.8 min	33.5 min	11.1 h
Review hands-on (UI)	Batch 34	53743	1.0 min	23.5 s	2.9 min	9.7 min
Review hands-on (UI)	Batch 35	27479	2.3 min	1.3 min	4.4 min	10.9 min
Idle (total − all hands-on)	Batch 34	53743	77.1 d	42.6 d	94.8 d	126.0 d
Idle (total − all hands-on)	Batch 35	27479	21.8 d	14.9 d	28.5 d	33.1 d
Total elapsed	Batch 34	53743	77.1 d	42.6 d	94.8 d	126.0 d
Total elapsed	Batch 35	27479	21.8 d	14.9 d	28.5 d	33.1 d

Distributions (one panel per metric × batch)

Question this plot answers: For each lifecycle phase, where does time concentrate and which batch has the longer queue/tail?

Reading guide: Each lifecycle histogram now has a dashed red median line. Long right tails are clipped only for display, so the median line is the stable reference point.

Question this plot answers: How often did a task bounce back to annotation before it was finally accepted, and is that different by batch?

How it was made: Same logic as the Streamlit lifecycle page: one row per resolved task from reports/lifecycle/task_lifecycle.parquet, using n_sent_back_max as the number of sent-back review rounds. Counts are grouped by batch and clipped visually at 8 rounds.

Observation: The sent-back chart counts review loops per resolved task. A rate above zero is normal; multiple rounds are the expensive cases because they add reviewer events and queue time.

7. Annotator scorecard, volume-normalised

What this answers: Comparing annotators fairly when some produced 50× more output than others — who's actually worse than their volume cohort?

Findings

Cohort medians: heavy producers (n=24, median 6,677 annotations) accept rate 49.9%; medium (22, 2,442) 57.6%; light (24, 883) 55.7%; low_n (13, <50) excluded from comparisons.
Heavy producers have the lowest accept rate — they go fastest and trip on more reviews. This is the volume-vs-quality trade-off, made visible by normalising for cohort.
Worst-relative-to-cohort: Saleh Diaa Ahmed (light), Ahmed Khalifa (light), Marwan (medium), Gaber Alshykh (heavy), Mohamed Abdelghany (heavy). All > +1σ worse than their peers across multiple rate metrics.
sent_back_rate can be larger than 1. It is n_sent_back_total / n_total, where n_sent_back_total counts review events, not unique annotations. One annotation can be sent back multiple times, so this is "sent-back events per annotation", not a probability. In this dataset, 1 annotator has sent_back_rate > 1: Saleh Diaa Ahmed (62 sent-back events / 61 annotations = 1.016).
median_word_kappa is the cohort median of individual word_kappa_mean. It means the typical annotator in that cohort's chance-corrected agreement with peers on which aligned words are mistakes. 7 annotators have missing κ because they do not have enough paired-overlap rows in the IAA table.

How to read this scorecard

Do not read sent_back_rate as "% of annotations sent back". Read it as workload-normalised rework pressure: sent-back review events divided by total annotations. Values above 1 mean the annotator averaged more than one sent-back event per annotation, usually because some annotations cycled through review multiple times.

mean_worse_z averages signed within-cohort z-scores. Positive means worse than peers at similar volume; negative means better. This is the fair leaderboard because heavy producers naturally accumulate more raw mistakes.

Cohort summary

cohort	n	median_n_total	median_accept_rate	median_sent_back_rate	median_word_kappa
heavy	24	6676.500	0.499	0.382	0.912
medium	22	2442.000	0.576	0.447	0.898
light	24	882.500	0.557	0.455	0.907
low_n	13	10.000	0.568	0.100	0.814

Observation: Cohorts are volume buckets, not quality labels. Compare an annotator to their cohort first; only then compare across cohorts.

Question this plot answers: Does annotation volume correlate with accept rate, and which cohort does each annotator belong to?

Caveat: The x-axis is log-scaled. Low-volume outliers can look visually close to heavy producers, but their rates are much less stable.

15 worst annotators relative to their cohort

annotator_name	cohort	n_total	accept_rate	sent_back_rate	rate_silent_alif_per_100w	word_kappa_mean	mean_worse_z
Saleh Diaa Ahmed	light	61	0.623	1.016	0.104	0.667	1.181
Ahmed Khalifa	light	192	0.750	0.599	0.041	0.800	1.059
Marwan	medium	2926	0.605	0.309	0.040	0.817	1.033
Gaber Alshykh	heavy	7830	0.533	0.272	0.049	0.828	1.020
Mohamed Abdelghany	heavy	4973	0.548	0.266	0.043	0.814	0.969
Abdullah Mohamed Samir	medium	2002	0.584	0.463	0.064	0.839	0.857
Sherif Bakry	heavy	4927	0.039	0.048	0.974	0.864	0.830
عبدالله صلاح العيسوي	light	1402	0.380	0.322	0.150	0.791	0.783
Mahmoud Elsaey	medium	1940	0.412	0.284	0.109	0.812	0.697
Basma Mohammad	light	1263	0.781	0.678	0.006	0.828	0.648
مودة جمال	light	203	0.478	0.458	0.342	0.763	0.643
Ibrahim zaid	light	1542	0.423	0.309	0.102	0.788	0.606
Aya Mostafa	heavy	7811	0.382	0.251	0.051	0.859	0.491
ghada ahmed	medium	2674	0.427	0.255	0.035	0.820	0.485
ياسر ربيع	heavy	5004	0.472	0.447	0.286	0.872	0.450

Observation: This table intentionally excludes low_n annotators. The listed people are not necessarily the worst raw counts; they are worst relative to peers with similar annotation volume.

8. Silent-alif forensics

What this answers: Where does the Uthmani-form violation enter the pipeline, and when?

preann seed has SA

56,096

21.1%

annotation has SA

7,272

2.7%

accepted has SA

507

0.2%

annotator removed

49,598

seed → submission cleanup

Findings

56,096 preannotations carry the silent-alif violation → 7,272 annotations submitted with it → 507 acceptances leak it through. Annotators clean up 49,598 (seed dirty → submission clean); only 39 introduce it themselves.
The violation has been in the preannotation seed since the very first task (2026-01-12). The "fix" was never present — it lives in tdreeb's create_tasks pipeline (specifically referenceText before it's written into Tawseem), which doesn't run an Imlaei normalisation.
The seed got worse between Jan-12 and Apr-13 task batches: preannotation violation rate rose from 17.4% to 28.0%. Acceptances dropped (0.41% → 0.29%) because annotators got better at cleaning up — masking the seed regression.
Sherif Bakry as reviewer let through 182 leaked acceptances — 36% of all leaks. He's a high-volume contributor on both sides: 4,927 annotations (heavy cohort, rank 19/83) and 27,339 reviews (3rd-most-active reviewer). He's also the worst silent-alif producer rate in the team (0.97/100w, see §2). The combined producer + reviewer pattern at this volume is the single biggest signal in the report.
Top leaking annotators: جويرية جمال الليثي (48), عصام عبد الحميد عبد العزيز (33), Yasser Waled (31), Mariam Khaled (26), Fares Moustafa (21).

Question this plot answers: At which pipeline stage does the silent-alif violation enter, get fixed, or leak into accepted text?

Observation: The funnel shows the violation shrinking sharply from preannotation to submission, which means annotators are mostly cleaning an upstream seed problem. The remaining accepted leaks are reviewer enforcement failures.

Question this plot answers: Which annotators account for the most accepted silent-alif leaks by raw count?

Caveat: Accepted-leak counts are volume-sensitive. Use this chart to pick examples, then pair it with per-100-word rates before coaching.

Question this plot answers: Which reviewers accepted the most silent-alif leaks by raw count?

Observation: Reviewer leak counts identify where final acceptance allowed a known violation through. This is an enforcement signal, not a producer-rate signal.

Recommended actions

Fix the silent-alif seed in tdreeb. Add an Imlaei normalisation step in tdreeb/jobs/create_tasks/pipeline before referenceText is written into Tawseem. The rule table in basirah/src/rule_violations.py is portable.
1:1 with Sherif Bakry on silent-alif — heavy contributor on both sides (4.9k annotations + 27.3k reviews), highest annotator production rate of the violation (0.97/100w), and lets through 36% of all accepted leaks as reviewer. Walk through 5 examples from reports/audit/silent_alif_per_annotation.parquet filtered to his rows.
Split annotator_focus_issue into granular RCA buckets. It accounts for 28% of all annotations and 3× the next RCA, so it is too broad to drive targeted fixes. Replace it with specific sub-causes such as missed word, wrong passage, skipped segment, uncertainty handling, speed slip, or UI/workflow issue.
Triage the Batch 34 review queue. Median 12.3 days between submission and review is the single biggest lever to shorten end-to-end time. Staff or re-route reviewers.
Decide Batch 35 over-cap policy. 72% over 20 s is unusable for training as-is. Either re-segment those clips before annotation, or lift the training cap and accept the memory cost.
Coach the worst-relative-to-cohort annotators (Saleh Diaa Ahmed, Ahmed Khalifa, Marwan, Gaber Alshykh, Mohamed Abdelghany). All are > +1σ worse than their volume peers across multiple metrics.
Clarify the hasMistakes flag SOP. Top-5 annotators all sit at ~41% "hasMistakes=True but text matches" — that's an interpretation problem, not a personal one.