Annotation team report — Batch 34 + 35

Generated 2026-05-25 from the annotation-data refresh (annotation + review pull through audit + IAA + violations + lifecycle + silent-alif forensics).

Executive summary

1. Annotation & review audit

What this answers: What's the raw shape of the corpus? How much disagreement are reviewers having to resolve? Where in the RCA taxonomy is the team spending its energy?

annotations
270,472
all states
review events
296,700
distinct tasks
171,717

Findings

Question this plot answers: What is the current state mix of all annotation rows?
Observation: The state mix shows corpus inventory, not final yield. Submitted and draft rows are still work-in-progress, while accepted rows are candidate ground truth.
Question this plot answers: How much review traffic turns into acceptance, rejection, escalation, or rework?
Observation: Sent-back is high at 40.7% of review events, so rework volume is a core operating cost. Because this is event-level, repeated rework on one annotation appears multiple times.
Question this plot answers: Which annotation RCA reasons dominate the audit taxonomy?
Recommendation: The RCA chart is useful for policy review, and the main action is taxonomy design: broad catch-all labels such as annotator_focus_issue should be split into more granular buckets. Otherwise one huge general tag hides multiple operational problems that need different fixes.

2. Inter-annotator divergence — severity patterns

What this answers: When two annotators both annotated the same task, how often do they substantively disagree — versus disagree only on diacritic spelling? Where is the team's real disagreement (vs. normalisation noise)?

What "severity" means — exact thresholds

Every pair of annotators that touched the same task is scored by character-level edit distance between their submissions, ignoring whitespace. These are the exact absolute character-edit bins from src/basirah/divergence.py::_severity, applied independently to the normalised and raw views:

binconditionplain English
identicalchar_edits == 0no character edits in this view
minorchar_edits is 1-3tiny orthographic disagreement
moderatechar_edits is 4-15bounded text disagreement
gravechar_edits ≥ 16large character-level disagreement

Word edits, WER, and CER are still computed and stored as diagnostics, but only absolute character edits decide the severity bucket. Computed twice: once on normalised text (diacritics stripped — only consonants compared) and once on raw text (every character and diacritic counts). The gap between the two separates core consonant-level disagreement from strict orthographic/diacritic correctness.

Findings

Question this plot answers: After diacritic normalisation, how severe are annotator-vs-annotator text disagreements, and which batch contributes them?
Observation: The normalised chart is the one to use for human disagreement: diacritics are stripped first, and the severity bucket comes from absolute character edits.
Question this plot answers: Under the strict quality bar, how much disagreement appears when every raw character and diacritic counts?
Strict-quality view: The raw chart is the right view when the deliverable requires correct diacritics and every character matters. The normalised chart answers “did they agree on the consonant-level text?”; the raw chart answers “did they submit exactly the same fully written text?”.

Severity totals (across all 98,283 pairs)

severity normalised pairs raw pairs Δ (raw − norm) norm % raw %
identical 64951 56660 -8291 66.100 57.600
minor 22605 22551 -54 23.000 22.900
moderate 9052 15249 6197 9.200 15.500
grave 1675 3823 2148 1.700 3.900
Definition check: Word edits, WER, and CER are diagnostic columns only. A short pair with one changed word no longer becomes grave just because its WER is high.

Grave-pair root causes (enriched analysis)

From grave_pairs_enriched.parquet. The classifier looks at state, RCA reason, length asymmetry, hasMistakes disagreement, and normalised-token overlap to bucket each grave pair into one of five causes.

root cause technical label count %
Should have been rejected should_have_been_rejected 910 54.300
Hard audio / substantive disagreement normal_hard_audio 543 32.400
hasMistakes disagreement has_mistakes_disagreement 162 9.700
Wrong passage / submission mismatch wrong_passage_or_submission_mismatch 36 2.100
One side truncated one_side_truncated 24 1.400
Observation: The grave bucket is now much more manager-actionable, but it should not be read as “all hard audio.” A distinct slice is wrong-passage/submission mismatch: both sides are long enough, the normalised character edit distance is huge, and the two submissions share very little normalised vocabulary. Those cases are more consistent with a wrong-text submission, task/audio mapping issue, copy/paste or UI-selection error, or an early review miss. They should be caught upstream instead of treated as normal ambiguous recitation.
Heuristic: The wrong-passage/submission-mismatch bucket currently means: both sides have at least 8 words, char_edits_norm ≥ 50, and normalised token overlap over the shorter side is ≤20%. This is a triage signal, not proof of a UI bug.

Examples (up to 5 per root cause)

Changed tokens are highlighted. Examples are sorted by largest normalised character edit distance and de-duplicated by task_id within each root cause. Full pairs are in reports/audit/grave_pairs_enriched.parquet.

Should have been rejected task 329533 · Batch 35 · 243 char edits · 63 word edits · token overlap 62%
A — Khaled Hussein (submitted)
لَا تُدْرِكُهُ الْأَبْصَارَ وَهُوَ يُدْرِكُ الْأَبْصَارَ وَهُوَ اللَّطِيفُ الْخَبِيرُ كُلَّ شَيْءٍ لَّطِيفُ الْخَبِيرْ قَدْ جَاءَكُمْ بَصَائِرُ مِنْ رَبِّكُمْ فَمَنْ أَبْصَرَ فَلِنَفْسِهِ وَمَنْ عَمِيَ فَعَلَيْهَا وَمَا أَنَا عَلَيْكُمْ بِحَفِيظٍ وَمَا أَنَا عَلَيْكُمْ بِحَفِيظْ وَذَالِكَ نُصَرِّفُ الْآيَاتِ وَلِيَقُولُوا دَرَسْتَ وَلِنُبَيِّنَهُ وَلِيَقُولُوا دَرَسْتَ وَلِنُبَيِّنَهُ لِقَوْمٍ يَعْلَمُونْ اتَّبِعْ مَا أُوحِيَ إِلَيْكَ مِنْ رَبِّكَ لَا إِلَاهَ إِلَّا هُوَ وَأَعْرِضْ عَنِ الْمُشْرِكِينْ وَلَوْ شَاءَ اللَّهُ مَا وَلَوْ شَاءَ اللَّهُ مَا لم أَشْرَكُوا وَمَا جَعَلْنَاكَ عَلَيْهِمْ حَفِيظًا وَمَا أَنْتَ عَلَيْهِمْ بِوَكِيلْ وَلَا تَسُبُّوا الَّذِينَ يَدْعُونَ رَبَهُمْ بِالْغَدَاةَ اللَّهِ شِيْ
B — Moniem Gamal (skipped, reject_audio_quality_issue)
الْخَبِيرْ وَكُلَّ شَيْءٍ خَبِيرْ قَدْ جَاءَكُمْ مَثَلُكُمْ بَصَارُ عَلَيْهِ وَإِنَّ عَلَيْكُمْ لَحَافِظِينْ وَمَا أَنَا عَلَيْكُمْ بِحَفِيظْ وَذَلِكَ نُبَيِّنُ لَكُمْ لِقَوْمٍ يَعْلَمُونْ مِنْ رَبِّكَ مُشْرِكِينْ شَاءَ اللَّهُ وَلَوْ شَاءَ اللَّهُ مَا اللَّهُ عَلَيْهِمْ بِالْغَدَاةَ
Should have been rejected task 342076 · Batch 35 · 223 char edits · 59 word edits · token overlap 34%
A — بيسان الغرباوي (submitted)
بِسْمِ اللَّهِ الرَّحْمَانِ الرَّحِيمْ وَيَا قَوْمِ لَا يَجْرِمَنَّكُمْ شِقَا شِقَاقِي أَنْ يُصِيبَكُمْ مِثْلَ مَا أَقَا مِثْلَ مَا أَصَابَ قَوْمَ نُوحٍ أَوْ قَوْمَ هُودٍ أَوْ قَوْمَ صَالِحْ وَمَا قَوْمَ عَادٍ مِنْكُمْ بِبَعِيدْ وَاسْتَغْفِرُوا رَبَّكُمْ ثُمَّ تُوبُوا إِلَيْهْ إِنَّ رَبِّي رَحِيمٌ وَدُودْ قَالُوا يَا شُعَيْبُ مَا نَفْقَهُ كَثِيرًا مِمَّا تَقُولُ وَإِنَّا لَنَرَاكَ فِينَا ضَعِيفَا وَلَوْلَا رَهْطُكَ لَرَجَمْنَاكَ وَمَا أَنْتَ عَلَيْنَا بِعَزِيزْ قَالَ يَا قَوْمِ أَرَهْطِي أَعَزُّ عَلَيْكُمْ مِنَ اللَّهْ
B — tahaelkarem (skipped, reject_audio_quality_issue)
بِسْمِ اللَّهِ الرَّحْمَنِ الرَّحِيمْ وَالْقَلَمِ وَمَا يَسْطُرُونْ مَا أَنْتَ بِنِعْمَةِ رَبِّكَ بِمَجْنُونْ وَإِنَّ لَكَ لَأَجْرًا غَيْرَ مَمْنُونْ وَإِنَّكَ لَعَلَى خُلُقٍ عَظِيمْ فَسَتُبْصِرُ وَيُبْصِرُونْ ثُمَّ إِلَيْهِ مَرْجِعُكُمْ قَالَ يَا شُعَيْبُ أَيْنَ مَا تَكُونُوا أَيَخْرَجُ
Should have been rejected task 358520 · Batch 35 · 214 char edits · 54 word edits · token overlap 27%
A — awad fdila (skipped, reject_audio_quality_issue)
وَإِذْ قَالَ إِبْرَاهِيمُ رَبِّ اجْعَلْ هَذَا بَلَدًا آمِنًا وَارْزُقْ أَهْلَهُ مِنَ الثَّمَرَاتِ مَنْ آمَنَ مِنْهُمْ بِاللَّهِ وَالْيَوْمِ الْآخِرْ قَالَ وَمَنْ كَفَرَ فَأُمَتِّعُهُ قَلِيلًا ثُمَّ أَضْطَرُّهُ إِلَى عَذَابِ النَّارِ وَبِئْسَ الْمَصِيرْ أَلَمْ يُنْفِقُونْ
B — بيسان الغرباوي (submitted)
مِنَ الشَّيْطَانِ الرَّجِيمْ بِسْمِ اللَّهْ وَإِذْ قَالَ إِبْرَاهِيمُ رَبِّ أَرِنِي كَيْفَ تُحْيِي الْمَوْتَى قَالَ أَوَلَمْ تُؤْمِنْ قَالَ بَلَى وَلَاكِنْ لِيَطْمَئِنَّ قَلْبِي قَالَ فَخُذْ أَرْبَعَةً مِنَ الطَّيْرِ فَصُرْهُنَّ إِلَيْكْ ثُمَّ اجْعَلْ عَلَى مِنْهُنَّ جُزْءًا ثُمَّ ادْعُهُنَّ يَأْتِينَكَ سَعْيَا وَاعْلَمْ أَنَّ اللَّهَ عَزِيزٌ حَكِيمْ مَثَلُ الَّذِينَ يُنْفِقُونَ أَمْوَالَهُمْ
Should have been rejected task 352386 · Batch 35 · 205 char edits · 45 word edits · token overlap 55%
A — Walid Ahmad Muhammad (submitted)
يَقُولُونَ أَإِنْ كُنَّا مَعَكُمْ أَلَيْسَ اللَّهُ بِأَعْلَمَ بِمَا فِي صُدُورِ الْعَالَمِينْ وَلَيَعْلَمَ اللَّهْ اللَّهُ الَّذِينَ آمَنُوا وَيَعْلْ وَلَيَعْلَمَنَّ الْمُنَافِقِينْ وَقَالَ الَّذِينَ آمَنُوا اتَّبِعُونَا وَلْنَحْمِلْ خَطَايَاكُمْ وَمَا هُمْ بِحَامِلِينَ مِنْ خَطَايَاهُمْ مِنْ شَيْءٍ إِنَّهُمْ لَكَاذِبُونَ وَلَيَحْمِلُنَّ أَثْقَالَهُمْ وَأَثْقَالًا مَعَ أَثْقَالِهِمْ وَلَيُسْأَلُنَّ يَوْمَ الْقِيَامَةِ عَمَّا كَانُوا يَفْتَرُونْ وَلَقَدْ أَرْسَلْنَا نُوحًا إِلَى قَوْمِهِ فَلَبِثَ فِيهِمْ أَلْفَ سَنَةٍ إِلَّا خَمْسِينَ عَامًا فَأَخَذَهُمُ الطُّوفَانُ وَهُمْ ظَالِمُونْ فَأَنْجَيْنَاهُ وَأَصْحَ
B — Yasser Mohamad Mohamad (skipped, reject_audio_quality_issue)
يَقُولُونَ أَإِنَّا كُنَّا مَعَكُمْ أَلَيْسَ اللَّهُ بِأَعْلَمَ بِمَا فِي صُدُورِ الْعَالَمِينْ وَلَيَعْلَمَنَّ اللَّهُ الَّذِينَ آمَنُوا وَيَعْلَمَ وَلَيَعْلَمَنَّ الْمُنَافِقِينْ وَقَالَ الَّذِينَ لَا يَرْجُونَ لِقَاءَنَا لَوْلَا نُطَاعُونَ لِقَومٍ مَعَهُمُ الْكِتَابَ فَاعْبُدُوا إِلَّا خُمْسٍ فَأَمْرِكُو
Should have been rejected task 331216 · Batch 35 · 202 char edits · 50 word edits · token overlap 82%
A — عصام عبد الحميد عبد العزيز (submitted, annotator_focus_issue)
فَجَعَلَهُمْ جُذَاذًا إِلَّا كِبَرَ لَهُمْ لَعَلَّهُمْ لَيْهِ يَهْجَعُونْ غَالُوا مَنْ فَعَلَ هَاذَا بِآلِهَتِنَا إِنَّهُ لَمِنَ الظَّالِمِينْ قَالُوا سَمِعْنَا فَتًى يَذْكُرُهُمْ يُقَالْ لُهُ إِبْرَاهِيمْ قَالُوا فَأْتُوا بِهِ عَلَى أَعْيُنِ النَّاسِ لَعَهُمْ يَشْهَدُونْ قَالَ أَأَنْتَ فَعَلْتَ هَاذَا بِآلِهَتِنَا إِنَّهُ لَمِنَ الظَّالِمِينْ قَالُوا سَمِعْنَا فَتًى يَذْكُرُهُمْ يُقَالْ لِهْ إِبْرَاهِيمْ
B — saidmohamed (skipped, reject_multiple_recitations_overlap)
فَجَعَلَهُمْ جُذَاذًا إِلَّا كِفْلَهُمْ لَعَلَّهُمْ يَهْجَعُونْ وَلَهُمْ عَذَابْ إِنَّهُمْ لَمِنَ الظَّالِمِينْ قَالُوا سَمِعْنَا فَتًى يَذْكُرُهُمْ يُقَالُ إِبْرَاهِيمْ قَالُوا فَأْتُوا بِهِ عَلَى أَعْيُنِ النَّاسِ لَعَلَّهُمْ يَشْهَدُونْ قَالُوا أَأَنْتَ فَعَلْتَ هَذَا بِآلِهَتِنَا إِنَّهُ لَمِنَ الظَّالِمِينْ قَالُوا سَمِعْنَا فَتًى يَذْكُرُهُمْ يُقَالُ إِبْرَاهِيمْ
Wrong passage / submission mismatch task 130845 · Batch 34 · 210 char edits · 37 word edits · token overlap 0%
A — Marwan (escalated, annotator_focus_issue)
وَإِذَا الْجِبَالُ سُيِّرَتْ وَإِذَا الْعِشَارُ عُطِّلَتْ وَإِذَا الْوُحُوشُ حُشِرَتْ وَإِذَا الْبِحَارُ سُجِّرَتْ وَإِذَا النُّفُوسُ زُوِّجَتْ وَإِذَا الْمَوْءُودَةُ سُئِلَتْ بِأَيِّ ذَنْبٍ قُتِلَتْ وَإِذَا الصُّحُفُ نُشِرَتْ وَإِزَا السَّمَاءُ كُشِطَتْ
B — Mohamed Abdelghany (accepted, annotator_speed_mistake)
أَمِنْهُمْ مَنْ آمَنَ وَمِنْهُمْ مَنْ كَفَرْ وَلَوْ شَاءَ اللَّهُ مَا اقْتَتَتَلُوا وَلَاكِنَّ اللَّهَ يَفْعَلُ مَا يُرِيتْ يَا أَيُّهَا الَّزِينَ آمَنُوا أَنْفِقُوا مِمَّا كَسَ أَنْفُقُوا مِمَّا رَ رَزَقْنَاكُمْ مِنْ قَبْلْ أَنْ يَأْتِيَ يَوْمٌ لَا بَيْعٌ فِيهِ وَلَا خُلَّةُ
Wrong passage / submission mismatch task 131136 · Batch 34 · 198 char edits · 40 word edits · token overlap 15%
A — عبدالله صلاح العيسوي (rejected, reject_technical_issue)
طَاسِينْ مِيم تِلْكَ آيَاتُ الْكِتَابِ الْمُبِينْ لَعَلَّكَ بَاخِعٌ نَفْسَكَ أَلَّا يَكُونُوا مُؤْمِنِينْ إِنْ نَشَأْ نُنَزِّلْ عَلَيْهِمْ مِنَ السَّمَاءِ آيَةً فَظَلَّتْ أَعْنَاقُهُمْ لَهَا خَاضِعِينْ وَمَا يَأْتِيهِمْ مِنْ ذِكْرٍ إِلَّا كَانُوا عَنْهُ مُعْرِ مِنَ الرَّحْمَانِ إِلَّا كَانُوا عَنْهَا مُعْرِضِينْ فَقَدْ كْ
B — Fares Moustafa (accepted)
إِلَّا مَوْتَتُنَا الْأُولَى وَمَا نَحْنْ بِمُعَذَّبِينْ إِنَّ هَاذَا لَهُوَ الْفَوْزُ الْعَظِيمْ لِمِثْلِ هَاذَا فَلْيَعْمَلِ الْعَامِلُونْ أَذَالِكَ خَيْرٌ نُزُلًا أَمْ شَشَرَةُ الزَّقُّومْ
Wrong passage / submission mismatch task 232340 · Batch 34 · 164 char edits · 53 word edits · token overlap 17%
A — Yasser Waled (escalated, annotator_focus_issue)
وَلَا تَقُولُوا لِمَ تَصِفُ أَلْسِنَتُكُمُ وَلَا تَقُولُوا لِمَ تَصِفُ أَلْسِنَتُكُمُ الْكَذِبَ هَاذَا حَلَالٌ وَهَاذَا حَرَامٌ لِتَفْتَرُوا عَلَى اللَّهِ الْكَذِبْ إِنَّ الَّذِينَ يَفْتَرُونَ عَلَى اللَّهِ الْكَذِبَ لَا يُفْلِحُونْ مَتَاعٌ قَلِيلٌ وَلَهُمْ عَذَابٌ مَوَدَّةْ
B — Mariam Khaled (accepted)
بِسْمِ اللَّهِ الرَّحْمَانِ الرَّحِيمْ لَقَدْ كَانَ لَكُمْ فِيهُمْ أُسْوَةٌ حَسَنَةٌ لِمَنْ كَانَ يَرْجُو اللَّهَ وَالْيَوْمَ الْآخِرَ وَمَنْ يَتَوَلَّ فَإِنَّ اللَّهَ هُوَ الْغَنِيُّ الْحَمِيدْ عَسَى اللَّهُ إِنْ يَجْعَلْ بَيْنَكُمْ وَبَيْنَ الَّذِينَ عَادَيْتُمْ مِنْهُمْ مَوَدَّةْ
Wrong passage / submission mismatch task 130998 · Batch 34 · 148 char edits · 46 word edits · token overlap 5%
A — Fares Moustafa (accepted, annotator_focus_issue)
وَكَتَبْنَا عَلَيْهُمْ فِيهَا أَنَّ النَّفْسَ بِالنَّفْسِ وَالْعَيْنَ بِالْعَيْنْ وَالْأَنْفَ بِالْأَنْفِ وَالْأُذُنَ بِالْأُذُنِ وَالسِّنَّ بِالسِّنِّ وَالْجُرُوحَ قِصَاصْ فَمَنْ تَصَدَّقَ بِهِ فَهُوَ
B — Mohamed Abdelghany (accepted)
وَمَنْ وَلَمْ وَمَا كَانْ وَمَا كَانَ وَمَا وَلَمْ تَكُنْ لَهُ فِئَةٌ يَنْصُرُونَهُ مِنْ دُونِ اللَّهِ وَمَا كَانَ مُنْتَصِرَا وَلَوْلَا إِذْ دَخَلْتَ جَنَّتَكَ خُلْتَ مَا شَاءَ اللَّهْ لَا قُوَّةَ إِلَّا بِاللَّهِ إِنْ تَرَنِ أَقَ
Wrong passage / submission mismatch task 131088 · Batch 34 · 139 char edits · 28 word edits · token overlap 11%
A — Marwan (escalated, annotator_focus_issue)
حَوْلِكْ فَاعْفُ عَنْهُمْ وَاسْتَأْ فِرْ لَهُمْ وَشَاوِرَهُمْ فِي الْأَمْرْ فَإِذَا عَزِمْتَ فَعَ فَإِذَا عَزَمْتَ تَوَكَّلْ تَوَكَّلْ عَلَى اللَّهْ إِنَّ اللَّهَ يُحِبُّ الْمُتَكِينْ
B — Mahmoud Elsaey (rejected, reject_technical_issue)
قَالُوا ادْعُ لَنَا رَبَّكَ يُبَيِّنْ لَنَا مَا هِيَ إِنَّ الْبَقَرَ تَشَابَهَ عَلَيْنَا وَإِنَّا إِنْ شَاءَ اللَّهُ لَمُهْتَدُونْ قَالَ إِنَّهُ يَقُولُ إِنَّهَا بَقَرَةٌ ذَلُولٌ لَا تَثِيرُ الْأَرْضَ وَلَا
One side truncated task 335987 · Batch 35 · 159 char edits · 35 word edits · token overlap 100%
A — عصام عبد الحميد عبد العزيز (accepted, annotator_focus_issue)
مَوْلَى الَّزِينَ آمَنُوا وَأَنَّ الْكَافِرِينَ لَا مَوْلَى لَهُمْ إِنَّ اللَّهَ يُدْخِلُ الَّذِينَ آمَنُوا وَعَمِلُوا الصَّالِحَاتِ جَنَّاتٍ تَجْرِي مِنْ تَحْتِهَا الْأَنْهَارْ وَالَّذِينَ كَفَرُوا يَتَمَتَّعُونَ وَيَأْكُلُونَ كَمَا تَأْكُلُ الْأَنْعَامُ وَالنَّارُ مَسْوًى لَهُمْ وَكَأَيِّنْ مِنْ قَرْيَةٍ هِيَ أَشَدُّ قُوَّةً مِنْ قَرْيَتِكَ الَّتِي أَخْرَجَتْكَ أَهْلَكْنَاهُمْ فَلَا نَاصِرَ
B — awad fdila (draft, annotator_focus_issue)
مَوْلَى الَّزِينَ آمَنُوا وَأَنَّ الْكَافِرِينَ لَا مَوْلَى لَهُمْ
One side truncated task 351542 · Batch 35 · 107 char edits · 38 word edits · token overlap 31%
A — Abdullah Mohamed Samir (draft, annotator_focus_issue)
أَفَمَنْ يَنْصُرُنَا رَبِّهْ فَيَذَرُوهَا فَيَذَرُوهَا طَاغًا وَلَا أَمْتَا يَوْمَئِذٍ يَتَذَكَّرُ يَوْمَئِذٍ وَ خَشِيَةٍ عَلَى الرَّحْمَنِ لَا تَسْمَعُ إِلَّا
B — Gehad Refaat (submitted, annotator_focus_issue)
الْجِبَ فَقُلْ يَنْسِفُهَا رَبِّي نَسْفًا فَيَذَرْ فَيَذْ فَيَذَرُهَا فَيَذَرُهَا قَاءً صَفْصَفَا لَا تَرَى فِيهَا عِوَجًا وَلَا أَمْتَا يَوْمَئِذٍ يَتَّبِعُونَ الدْ يَوْمَئِذٍ يَتَّبِعُونَ الدَّائِيَ لَا حِوَجَ مِنْهُ لَا عِوَجًا لَهْ وَخَشَعْ وَخَشِ وَخَشِيَ وَخَشِيَتِ الْ أَصْوَاتُ لِلرَّحْمَانِ فَلَا تَسْمَعُ هِلْ لَا هَمْسًا يَوْمَئِذٍ
One side truncated task 131075 · Batch 34 · 98 char edits · 22 word edits · token overlap 0%
A — Marwan (escalated, annotator_focus_issue)
حَوْلِكْ فَاعْفُ عَنْهُمْ وَاسْتَأْ فِرْ لَهُمْ وَشَاوِرَهُمْ فِي الْأَمْرْ فَإِذَا عَزِمْتَ فَعَ فَإِذَا عَزَمْتَ تَوَكَّلْ تَوَكَّلْ عَلَى اللَّهْ إِنَّ اللَّهَ يُحِبُّ الْمُتَكِينْ
B — Mahmoud Elsaey (accepted)
لِنُخْرِجَ بِهِ هَبًّا وَنَبَاتَا وَجَنَّةٍ أَلْفَافَا
One side truncated task 356100 · Batch 35 · 97 char edits · 28 word edits · token overlap 75%
A — Fares Moustafa (submitted)
عَالِمُ الْغَيْبِ لَا يَعْلَمُهَا
B — Taha sobhi (submitted)
وَعِنْدَهُ مَفَاتِحُ الْغَيْبِ لَا يَعْلَمُهَا إِلَّا هُوْ يَعْلَمُ مَا فِي الْبَرِّ وَالْبَحْرْ وَمَا تَسْقُطُ مِنْ وَرَقَةٍ إِلَّا يَعْلَمُهَا وَلَا حَبَّةٍ فِي ظُلُمَاتِ الْأَرْضِ وَلَا رَطْبٍ وَلَا يَابِسٍ إِلَّا فِي كِتَابٍ مُبِينْ
One side truncated task 327285 · Batch 35 · 65 char edits · 16 word edits · token overlap 25%
A — Mariam Khaled (accepted)
اءُ مِنْ بَعْدُ وَلَا أَنْ تَبَدَّلَ بِهِنَّ مْ
B — awad fdila (escalated, annotator_focus_issue)
وَكَذَالِكَ نَجْزِي مَنْ أَسْرَفَ وَلَمْ يُؤْمِنْ بِآيَاتِ رَبِّهْ وَلَعَدَابُ الْآخِرَةِ أَشَدُّ وَأَبْقَى وَكَمْ أَهْلَكْنَا قَبْلَهُمْ مِ أَفَلَمْ يَهْدِ
hasMistakes disagreement task 155050 · Batch 34 · 157 char edits · 10 word edits · token overlap 64%
A — ghada ahmed (draft, annotator_focus_issue)
فِي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَافِرُونْ فِ ي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَا فِي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ فِي الْحَيَاةِ الدُّنْيَا وَفِي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَافِرُونْ
B — عصام عبد الحميد عبد العزيز (accepted, annotator_focus_issue)
فِي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَافِرُونْ فِي الْحَيَاةِ الدُّ وَتَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَافِ فِي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ فِي الْحَيَاةِ الدُّنْيَا وَ تَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَافِرُونْ فِي الْحَيَاةِ الدُّنْيَا وَتَزْهَقَ أَنْفُسُهُمْ وَهُمْ كَافِرُونْ
hasMistakes disagreement task 220396 · Batch 34 · 125 char edits · 38 word edits · token overlap 79%
A — Ahmed Saber (escalated)
لَاهَ كَانَ غَفُورْ رَحِيمَا وَلَقَدْ وَصَّيْنَا الَّذِينَ أُوتُوا الْكِتَابَ مِنْ قَبْلِكُمْ لَعَلَّكُمْ تَتَّقُونْ وَإِنْ تَكْفُرُوا فَإِنَّ اللَّهَ غَنِيٌّ حَمِيدْ اللَّهَ يُؤْتِي وَكَانَ اللَّهُ عَلَى ذَلِكَ قَدِيرَا مَنْ كَانَ يُرِيدُ ثَوَابَ الدُّنْيَا فَعِنْدَ اللَّهِ ثَوَابُ الْآخِرَةِ كَانَ اللَّهُ سَمِيعٌ بَصِيرَا يَا أَيُّهَا الَّذِينَ آمَنُوا
B — عبد الرحمن (draft, annotator_focus_issue)
اللَّهَ كَانَ غَفُورًا رَحِيمَا مَا فِي السَّمَاوَاتِ وَمَا فِي الْأَرْضْ وَلَقَدْ وَصَّيْنَا الَّذِينَ أُوتُوا الْكِتَابَ مِنْ قَبْلِكُمْ وَإِيَّاكُمْ أَنِ تَتَّقُونْ وَإِنْ تَكْفُرُوا فَإِنَّ للَّهِ مَا فِي السَّمَاوَاتِ وَمَا فِي الْأَرْضْ وَكَانَ اللَّهَ غَنِيًّا حَمِيدَا للَّهِ مَا فِي السَّمَاوَاتِ وَمَا فِي الْأَرْضْ وَكَفَى بِاللَّهِ وَكِيلَا إِنْ يَشَأْ يُذْهِبْكُمْ أَيُّهَا النَّاسُ وَيَأْتِ بِآخَرِينَ وَكَانَ اللَّهُ عَلَى ذَالِكَ قَدِيرًا مَنْ كَانَ يُرِيدُ ثَوَابَ الدُّنْيَا فَعِنْدَ اللَّهِ ثَوَابُ الْآخِرَهْ كَانَ اللَّهُ سَمِيعٌ بَصِيرَا يَا أَيُّهَا الَّذِينَ آمَنُوا
hasMistakes disagreement task 196169 · Batch 34 · 114 char edits · 30 word edits · token overlap 70%
A — Ahmed Saber (escalated)
إِنَّ أَصْحَابَ الجَّنَّةِ الْيَوْمَ فِي شُغُلٍ فَاكِهُونْ هُمْ وَأَزْوَاجُهُمْ فِي ظِلَالٍ عَلَى الْأَرَائِكِ مُتَّكِئُونْ لَهُمْ فِيهَا فَاكِهَةُ وَلَهُمْ مَا يَدَّعُونْ سَلَامٌ قَوْلًا مِنْ رَبٍّ رَحِيمْ وَامْتَازُوا الْيَوْمَ أَيُّهَا الْمُجْرِمُونْ أَلَمْ أَعْهَدْ إِلَيْكُمْ يَا بَنِي آدَمَ لَا تَعْبُدُوا الشَّيْطَانْ إِنَّهُ عَدُوٌّ مُبِينٌ مُبِينْ أَعُوذُ بِاللهِ مِنَ الشَّيْطَانِ الرَّجِيمْ وَمَا كَانَ لِتَعْلَمُوا أَنَّمَا تَفْعَلُونْ
B — Mariam Khaled (submitted)
إِنَّ أَصْحَابَ الْجَنَّةِ الْيَوْمَ فِي شَغَلٍ فَاكِهُونْ هُمْ وَأَزْوَاجُهُمْ فِي ظِلَا عَلَى الْأَرَائِكِ مُتَّكِ لَهُمْ فِيهَا فَاكِهَةٌ وَلَهُمْ مَا سَلَامٌ قُوْلًا مِنْ رَبٍّ رَ وَامْتَازُوا الْيَوْمَ أَيُّهَا الْمُجْرِمْ أَلَمْ أَعْهَدْ إِلَيْكُمْ يَا بَنِي آدَمَ لَا تَعْبُدُوا الشَّيْطَانْ إِنَّهُ عَدُوٌّ مُمِنٌ وَ أَ أَعْبُدُونِ هَاذَا صِرَاطَ مُسْتَ وَ أَضَلَ مِنْكُمْ جِبْلْاً كَثِ أَمْ تَكُ تَقِلُو هَاذِهِ جَهَنْمُ الْتِي كُنْتُمْ تُوعَدُ اصْلَوْهَا الْيَوْمَ بِمَا كُنْتُمْ تَكْفُرُو
hasMistakes disagreement task 330138 · Batch 35 · 97 char edits · 24 word edits · token overlap 92%
A — Yasser Waled (submitted)
فَلَمَّا ذَهَبَ عَنْ إِبْرَاهِيمَ الرَّوْعُ وَجَاءَتْهُ الْبُشْرَى يُجَادِلُنَا فِي قَوْمِ لُوطْ فَلَمَّا ذَهَبَ عَنْ إِبْرَاهِيمَ الرَّوْعُ وَجَاءَتْهُ الْبُشْرَى يُجَادِلُنَا فِي قَوْمِ لُوطْ فَلَمَّا ذَهَبَ عَنْ إِبْرَاهِيمَ الرَّوْعُ وَجَاءَتْهُ الْبُشْرَى يُجَادِلُنَا فِي قَوْمِ لُوطْ فَلَمَّا فَلَمَّا ذَهَبَ عَنْ إِبْ
B — Mahmoud Abdo (submitted)
فَلَمَّا ذَهَبَ عَنْ إِبْرَاهِيمَ الرَّوْعُ وَجَاءَتْهُ الْبُشْرَى يُجَادِلُنَا فِي قَوْمِ لُوتْ فَلَمَّا زَهَبَ عَنْ إِبْرَاهِيمَ الرَّوْعُ وَجَاءَتْهُ الْبُشْرَى يُجَادِلُنَا فِي قَوْمِ لُوتْ فَلَمَّا ذَهَبَ عَنْ إِبْرَاهِيمَ الرَّوْعُ وَجَاءَتْهُ الْبُشْرَى يُجَادِلُنَا فِي قَوْمِ لُوتْ فَلَمَّا فَلَمَّا ذَهَبَ عَنْ إِبْ
hasMistakes disagreement task 219168 · Batch 34 · 96 char edits · 25 word edits · token overlap 92%
A — أبو مسلم الأزهري (draft, annotator_focus_issue)
إِذًا لَأَذَقْنَاكَ ضِعْفَ الْحَيَاةِ وَضِعْفَ الْمَمَاتِ ثُمَّ لَا تَجِدُ لَكَ عَلَيْنَا نَصِيرَا إِذًا لَأَذَقْنَاكَ ضِعْفَ الْحَيَاةِ وَضِعْفَ الْمَمَاتِ ثُمَّ لَا تَجِدُ لَكَ عَلَيْنَا نَصِيرَا
B — Omar Yosri (accepted)
إِذًا لَأَدْقْنَاكَ طِعْفَ الْحَيَاةِ وَضِعْفَ الْمَمَاتِ ثُمَّ لَا تَجِدُ لَكَ عَلَيْنَا نَصِيرَا إِذًا لَأَضَقْنَاكَ ضِعْفَ الْحَيَاةِ وَضِعْفَ الْمَمَاتِ ثُمَّ لَا تَجِدُ لَكَ عَلَيْنَا نَصِيرَا
Hard audio / substantive disagreement task 365062 · Batch 35 · 335 char edits · 107 word edits · token overlap 74%
A — Taha sobhi (submitted)
وَهَازَا وَهَاذَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا وَفَصِيلَ الْآيَاتِ لِقَوْمٍ يَ لِقَوْمٍ يَذَّكَّرُونْ وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا وَهَازَا صِرَاطُ رَ صَ صِرَاهَطْ وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا قَدْ فَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَذَّكَّرُونْ وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا قَدْ فَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَذَّكَّرُونْ وَمَنْ فَمَنْ يُرِدِ اللهْ أَنْ يَ أَنْ يَهْدِيهْ سَ يَشْرَحْ صَدْرَهُ لِلْإِسْلَامْ وَمَنْ يُرِدْ أَنْ يُضِلَّهْ يَجْعَلْ صَدْرَهُ ضَيِّقًا جَرَحًا كَأَنَّمَا يَصَّعَّدْ فِي السَّمَاءِ كَذَالِكَ يَجْعَلُ اللَّهُ الرِّجْسَ عَلَى الَّذِينَ لَا يُؤْمِنُوا وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا قَدْ فَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَذَّكَّرُونْ
B — محمد سلامة محمد (submitted)
وَهَازَا وَهَازَا صِرَاطُ رَبُّكَ مُسْتَقِيمًا هَفَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَ لِقَوْمٍ يَزَّكَّرُونْ وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا وَهَازَا صِرَاطُ ظِ سَصِرَ هَازَ صِرَاطُ هَازَا صِرَاطُ رَبُّكَ مُسْتَقِيمًا قَدْ فَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَزَّكَّرُونْ وَهَازَا صِرَاطُ رَبِّكَ مُسْتَقِيمًا فَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَزَّكَّرُونْ مَنْ فَمَنْ يُرِدَ اللهُ أَنْ يَ أَنْ يَهْدِيهِ وَمَنْ يَشْرَحْ صَدْرَ لِلْإِسْلَامِ وَمَنْ يُرِدْ أَنْ يُضِلَّهْ يَجْعَلُهُ صَدْرَهُ ضَيِّقًا جَرَحًا كَأَنَّمَا يَصَّعَّدُ فِي السَّمَاءْ كَزَالِكَ يَجْعَلُ اللَّهُ الرِّجْزَ عَلَى الَّذِينَ لَا يُؤْمِنُونْ وَهَازَا صِرَاطُ رَبُّكَ مُسْتَقِيمًا قَدْ فَصَّلْنَا الْآيَاتِ لِقَوْمٍ يَزَّكَّرُونْ
Hard audio / substantive disagreement task 349112 · Batch 35 · 293 char edits · 68 word edits · token overlap 100%
A — حسناء علاء حسن عبدالظاهر (accepted, annotator_focus_issue)
يَقُولُ السُّفَهَاءُ مِنَ النَّاسِ مَا وَلَّاهُمْ عَنْ قِبْلَتِهِمُ الَّتِي كَانُوا عَلَيْهِمْ قُلْ لِلَّهِ الْمَشْرِقْ وَالْمَغْرِبُ يَهْدِي مَنْ يَشَاءُ إِلَى صِرَاطٍ مُسْتَقِيمْ وَكَذَالِكَ جَعَلْنَاكُمْ أُمَّةً وَسَطًا لِتَكُونُوا شُهَدَاءَ عَلَى النَّاثِ وَيَكُونَ الرَّسُولُ عَلَيْكُمْ شَهِيدَمَ وَمَا جَعَلْنَا الْقِبْلَةَ الَّتِي كُنْتَ عَلَيْهَا إِلَّا لِنَعْلَمْ مَنْ يَتَّبِعُ الرَّسُولُ مِمَّنْ يَنْقَلِبُ عَلَى عَقِبَيْهْ وَإِنْ كَانَتْ لَكَبِيرَةً إِلَّا عَلَى الَّذِينَ هَدَى اللَّهْ وَمَا كَانَ اللَّهُ لِيُضِيعَ إِمَانَكُمْ إِنَّ اللَّهَ بِالنَّاسِ لَرَءُوفٌ رَحِيمْ قَدْ نَرَى تَقَلُّبَ وَجْهِكَ فِي السَّمَاءْ فَلَنُوَلِّيَنَّكَ قِبْلَةً تَرْضَاهْ فَوَلِّ وَجْهَكَ شَرَ الْمَسْجِدِ الْحَرَامْ وَحَيْثُ مَا كُنْتُمْ فَوَلُّوا وُجُوهَكُمْ شَطْرَهْ
B — Ahmed Khairy (accepted, annotator_focus_issue)
يَقُولُ السُّفَهَاءُ مِنَ النَّاسِ مَا وَلَّاهُمْ عَنْ قِبْلَتِهِمُ الَّتِي كَانُوا عَلَيْهِمْ قُلْ لِلَّهِ الْمَشْرِقْ وَالْمَغْرِبُ يَهْدِي مَنْ يَشَاءُ إِلَى صِرَاطٍ مُسْتَقِيمْ وَكَذَالِكَ جَعَلْنَاكُمْ أُمَّةً وَسَطًا لِتَكُونُوا شُهَدَاءَ عَلَى النَّاثِ وَيَكُونَ الرَّسُولُ عَلَيْكُمْ شَهِيدَمَ وَمَا جَعَلْنَا الْقِبْلَةَ الَّتِي كُنْتَ عَلَيْهَا إِلَّا لِنَعْلَمْ مَنْ يَتَّبِعُ الرَّسُولُ مِمَّنْ يَنْقَلِبُ عَلَى عَقِبَيْهْ وَإِنْ كَانَتْ لَكَبِيرَةً إِلَّا عَلَى الَّذِينَ هَدَى اللَّهْ وَمَا كَانَ اللَّهُ لِيُضِيعَ إِمَانَكُمْ إِنَّ اللَّهَ بِالنَّاسِ لَرَءُوفٌ رَحِيمْ قَدْ نَرَى تَقَلُّبَ وَجْهِكَ فِي السَّمَاءْ فَلَنُوَلِّيَنَّكَ قِبْلَةً تَرْضَاهْ فَوَلِّ وَجْهَكَ شَرَ الْمَسْجِدِ الْحَرَامْ وَحَيْثُ مَا كُنْتُمْ فَوَلُّوا وُجُوهَكُمْ شَطْرَهْ إِلَى صِرَاطٍ مُسْتَقِيمْ وَكَذَالِكَ جَعَلْنَاكُمْ أُمَّةً وَسَطًا لِتَكُونُوا شُهَدَاءَ عَلَى النَّاثِ وَيَكُونَ الرَّسُولُ عَلَيْكُمْ شَهِيدَمَ وَمَا جَعَلْنَا الْقِبْلَةَ الَّتِي كُنْتَ عَلَيْهَا إِلَّا لِنَعْلَمْ مَنْ يَتَّبِعُ الرَّسُولُ مِمَّنْ يَنْقَلِبُ عَلَى عَقِبَيْهْ وَإِنْ كَانَتْ لَكَبِيرَةً إِلَّا عَلَى الَّذِينَ هَدَى اللَّهْ وَمَا كَانَ اللَّهُ لِيُضِيعَ إِمَانَكُمْ إِنَّ اللَّهَ بِالنَّاسِ لَرَءُوفٌ رَحِيمْ قَدْ نَرَى تَقَلُّبَ وَجْهِكَ فِي السَّمَاءْ فَلَنُوَلِّيَنَّكَ قِبْلَةً تَرْضَاهْ فَوَلِّ وَجْهَكَ شَرَ الْمَسْجِدِ الْحَرَامْ وَحَيْثُ مَا كُنْتُمْ فَوَلُّوا وُجُوهَكُمْ شَطْرَهْ
Hard audio / substantive disagreement task 372788 · Batch 35 · 192 char edits · 49 word edits · token overlap 95%
A — عصام عبد الحميد عبد العزيز (draft, annotator_speed_mistake)
ذَالِكَ بِأَنَّهُمْ خَالُوا إِنَّمَا الْبَيْعُ مِثْلُ الرِّبَا وَأَحَلَّ اللَّهُ الْبَيْعَ وَحَرَّمَ الرِّبَا فَمَنْ جَاءَهُ مَوْعِظَةٌ مِنْ رَبِّهِ فَانْسَهَى فَلَهُ مَا سَلَفَ وَأَمْرُهُ إِلَى اللَّهْ ذَالِكَ بِأَنَّهُمْ قَالُوا إِنَّمَا الْبَيْعُ مِثْلُ الرِّبَا وَأَحَلَّ اللَّهُ الْبَيْعَ وَحَرَّمَ الرِّبَا فَمَنْ جَاءَهُ مَوْعِظَةٌ مِنْ رَبِّهِ فَانْتَهَى فَلَهُ مَا سَلَفَ وَأَمْرُهُ إِلَى اللَّهْ وَمَنْ عَادَ فَأُلَائِكَ أَصْحَابُ النَّارِ هُمْ فِيهَا خَالِدُونْ يَمْحَقُ اللَّهُ الرِّبَا وَيُرْبِي الصَّدَخَاتْ وَاللَّهُ لَا يُحِبُّ كُ يَمْحَقُ اللَّهْ
B — Waled Mohamed (submitted, annotator_speed_mistake)
ذَالِكَ بِأَنَّهُمْ خَالُوا إِنَّمَا الْبَيْعُ مِثْلُ الرِّبَا وَحَلَّ اللَّهُ الْبَيْعَ وَحَرَّمَ الرِّبَا فَمَنْ جَاءَهُ مَوْعِظَةٌ مِنْ رَبِّهِ فَانْتَهَى فَلَهُ مَا سَلَفَ وَأَمْرُ إِلَى اللَّهْ ذَالِكَ بِأَنَّهُمْ خَالُوا إِنَّمَا الْبَيْعُ مِثْلُ الرِّبَا وَأَحَلَّ اللَّهُ الْبَيْعَ وَحَرَّمَ الرِّبَا فَمَنْ جَاءَهُ مَوْعِدَظٌ مِنْ رَبِّهِ فَانْتَهَى فَلَهُ مَا سَلَفَ وَأَمْرُهُ إِلَى اللَّهْ وَمَنْ عَادَ فَأُلَائِكَ أَصْحَابُ النَّارِ هُمْ فِيهَا خَالِدُونْ يَمْحَقُ اللَّهُ الرِّبَا وَيُرْبِي الصَّدَخَاتْ وَاللَّهُ لَا يُحِبُّ كُ يَمْحَقُ اللَّهْ
Hard audio / substantive disagreement task 361203 · Batch 35 · 167 char edits · 42 word edits · token overlap 98%
A — Yasser Waled (submitted)
وَلَئِنْ أَرْسَلْنَا رِيحًا فَرَأَوْهُ مُصْفَرًّا لَظَلُّوا مِنْ بَعْدِهِ يَكْفُرُونْ فَإِنَّكَ لَا تُسْمِعُ الْمَوْتَى وَلَا تُسْمِعُ الصُّمَّ الدُّعَاءَ إِذَا وَلَّوْا مُدْبِرِينْ وَمَا أَنْتَ بِهَادِي الْعُمْيِ عَنْ ضَلَالَتِهِمْ إِنْ تُسْمِعُ إِلَّا مَنْ يُؤْمِنُ مُسْلِمُونْ اللَّهُ الَّذِي خَلَقَكُمْ مِنْ ضَعْفٍ ثُمَّ جَعَلَ مِنْ بَعْدِ ضَعْفٍ قُوَّةً ثُمَّ جَعَلَ مِنْ قُوَّةٍ ضَعْفًا وَشَيْبَهْ يَخْلُقُ مَا يَشَاءْ وَهُوَ الْعَلِيمُ الْقَدِيرْ وَيَوْمَ تَقُومُ السَّاعَةُ يُقْسِمُ الْمُجْرِمُونْ كَذَالِكَ كَانُوا يُؤْفَكُونْ وَقَالَ الَّذِينَ كَفَرُوا إِنْ أَنْتُمْ مُعْلَمُونْ فَاصْبِرْ
B — awad fdila (submitted)
وَلَئِنْ أَرْسَلْنَا رِيحًا فَرَأَوْهُ مُصْفَرًّا لَظَلُّوا مِنْ بَعْدِهِ يَكْفُرُونْ فَإِنَّكَ لَا تُسْمِعُ الْمَوْتَى وَلَا تُسْمِعُ الصُّمَّ الدُّعَاءَ إِذَا وَلَّوْا مُدْبِرِينْ وَمَا أَنْتَ بِهَادِي الْعُمْيِ عَنْ ضَلَالَتِهِمْ إِنْ تُسْمِعُ إِلَّا مَنْ يُؤْمِنُ مُسْلِمُونْ اللَّهُ الَّذِي خَلَقَكُمْ مِنْ ضَعْفٍ ثُمَّ جَعَلَ مِنْ بَعْدِ ضَعْفٍ قُوَّةً ثُمَّ جَعَلَ مِنْ قُوَّةٍ ضَعْفًا وَشَيْبَهْ يَخْلُقُ مَا يَشَاءْ وَهُوَ الْعَلِيمُ الْقَدِيرْ وَيَوْمَ تَقُومُ السَّاعَةُ يُقْسِمُ الْمُجْرِمُونْ مَا لَبِثُوا غَيْرَ سَاعَهْ كَذَالِكَ كَانُوا يُؤْفَكْ كَذَالِكَ يُؤْفَكُونَ وَقَالَ الَّذِينَ أُوتُوا الْعِلْمَ وَالْإِيمَانَ لَقَدْ وَلَقَدْ ضَرَبْنَا لِلنَّاسِ فِي هَاذَا الْقُرْآنِ مِنْ كُلِّ مَثَلٍ وَلَئِنْ جِئْتَهُمْ بِآيَةٍ لَيَقُولَنَّ الَّذِينَ كَفَرُوا إِنْ أَنْتُمْ إِلَّا مُبْطِلُونَ كَذَالِكَ يَطْبَعُ عَلَى قُلُوبِ الَّذِينَ لَا يَعْلَمُونَ فَاصْبِرْ إِنَّ وَعْدَ اللَّهِ حَقٌّ وَلَا يَسْتَخِفَّنَّكَ الَّذِينَ لَا يُوقِنُونَ
Hard audio / substantive disagreement task 356781 · Batch 35 · 151 char edits · 52 word edits · token overlap 67%
A — Yasser Mohamad Mohamad (submitted)
حُرِّمَتْ عَلَيْكُمُ الْمَيْتَةُ وَالدَّمُ وَلَحْمُ الْخِنْزِيرِ وَمَا أُهِلَّ لِغَيْرِ اللَّهِ بِهِ وَالْمُنْخَنِقَةُ وَالْمَوْقُوذَةُ وَالْمُتَرَدِّيَةُ وَالنَّطِيحَةُ وَمَا أَكَلَ السَّبُعُ إِلَّا مَا ذَكَّيْتُمْ وَمَا ذُبِحَ عَلَى النُّصُبِ وَأَنْ تَسْتَقْسِمُوا بِالْأَزْلَامِ ذَالِكُمْ فِسْقٌ الْيَوْمَ يَئِسَ الَّذِينَ كَفَرُوا مِنْ دِينِكُمْ فَلَا تَخْشَوْهُمْ وَاخْشَوْنِ الْيَوْمَ أَكْمَلْتُ لَكُمْ دِينَكُمْ وَأَتْمَمْتُ عَلَيْكُمْ نِعْمَتِي وَرَضِيتُ لَكُمُ الْإِسْلَامَ دِينًا وَتُوأَ وَمَا أَكَلَ السَّبُعُ إِلَّا مَا ذَكَرَ وَتُحَمْ وَ تَسْقِمُوا بِالْأَزْلَامِ وَمَا أَكَلَ السَّبُعُ إِلَّا مَا ذَكَّيْتُمْ وَمَا ذُبِحَ عَلَى النُّصُبِ وَأَنْ تَسْتَقْسِمُوا بِالْأَزْلَامِ وَمَا أَكَلَ السَّبُعُ إِلَّا مَا ذَكَّيْتُمْ وَمَا ذُبِحَ عَلَى النُّصُبِ وَأَنْ تَسْتَقْسِمُوا بِالْأَزْلَامِ
B — Ahmed Fawzy (submitted)
حُرِّمَتْ عَلَيْكُمْ مَيْتَةُ وَالدَّمُ وَلَحْمُ الْخِنْزِيرْ مُلَّ لِغَيْرِ اللَّهِ بِهْ وَالْمُنْخَنِقَةُ وَالْمَوْقُوسَهْ وَالْمُتَرَدِّيَةُ وَالنَّطِيحَةُ وَمَا أَكَلَ السَّبُعُ إِلَّا مَا سَكَّيْتُمْ وَمَا سُبْحَ عَلَى النُّصُبِ أَنْ تَسْتَلِمُوا بِالْأَزْلَامْ سَالِكُمْ فِسْقْ الْيَوْمَ يَئِسَ الَّذِينَ أَفَوْا مِنْ دِينِكُمْ فَلَا تَخْشَوْهُمْ وَاخْشَوْنْ الْيَوْمَ أَكْمَتُ لَكُمْ دِينَكُمْ وَأَتْمَمْتُ عَلَيْكُمْ نِعْمَتِي وَرَدِيتُ لَكُمُ الْإِسْلَامَ دِينَا وَتُأَ وَمَا أَكَلَ السَّبُعُ إِلَّا مَا سَكَّيْتُمْ وَمَا سُبْهَ عَلَى النُّزُبْ تَسْلِمُوا بِالْأَزْلَامْ مَتْ مَتْ وَمَا أَكَلَ السَّبُعُ مَا سَكَّيْتُمْ مَا سُبْحَ عَلَى النُّصُبْ تَسْلِمُوا بِالْأَزْ مَا أَكَلَ سَبُعُ إِلَّا مَا سَكَّيْتُمْ وَمَا سُبْهَ عَلَى النُّزُبْ لَى اللَّهِ أَنْ تَسْلِمُوا بِالْأَزْلَامْ

Top word confusions (normalised — diacritic-collapsed)

Each row is a word pair the team can't agree on. The whole top of this list is Rule-1 silent-alif spelling. arabic_encode here collapses ة↔ه, ى↔ي, and hamza variants — those show up in the next table instead.

word A word B count
ذالك ذلك 591
هاذا هذا 525
الرحمان الرحمن 409
ولاكن ولكن 405
الذين الزين 268
كذالك كذلك 201
الاه اله 197
الاءك اولءك 190
ان انا 145
الدين الذين 138
الذي الزي 136
الارض الارظ 135
الذي الذين 126
اذا ازا 121
ثم سم 109
هاذه هذه 100
رب ربي 99
وكذالك وكذلك 97
ما وما 92
ذالك زالك 86

Top word confusions (diacritic-stripped — consonants preserved)

Diacritics removed, but ة, ه, ى, ي, and the hamza family are kept distinct. Surfaces pausal taa-marbuta (ة↔ه), alif-maksura (ى↔ي), and hamza-seat disagreements that the fully-normalised table hides.

word A word B count
ذالك ذلك 592
هاذا هذا 525
الرحمان الرحمن 410
ولاكن ولكن 405
الذين الزين 264
كذالك كذلك 201
إلاه إله 196
ألائك أولئك 190
الدين الذين 138
الأرض الأرظ 135
الذي الزي 135
إن إنا 132
الذي الذين 123
إذا إزا 121
ثم سم 106
هاذه هذه 100
رب ربي 99
وكذالك وكذلك 96
ما وما 92
ذالك زالك 86

Top word confusions (raw — every diacritic counts)

Same pairs as above with the diacritics that surrounded them — useful for sanity-checking which haraka pattern is in play.

word A (raw) word B (raw) count
ذَالِكَ ذَلِكَ 556
هَاذَا هَذَا 525
الرَّحْمَانِ الرَّحْمَنِ 316
اللَّهُ اللَّهْ 304
اللَّهَ اللَّهْ 291
اللَّهِ اللَّهْ 273
الَّذِينَ الَّزِينَ 258
وَلَاكِنْ وَلَكِنْ 228
كَذَالِكَ كَذَلِكَ 193
أُلَائِكَ أُولَئِكَ 186
وَلَاكِنَّ وَلَكِنَّ 156
الَّذِي الَّزِي 134
الَّدِينَ الَّذِينَ 132
وَهُوَ وَهْوَ 126
إِذَا إِزَا 115
الْحَقّْ الْحَقْ 111
إِنَّ إِنَّا 110
إِلَاهَ إِلَهَ 103
ثُمَّ سُمَّ 102
هَاذِهِ هَذِهِ 99

Top character confusions

Single-character substitutions counted across all pair-edits. The first rows are diacritic-mark swaps (fatha ↔ sukun, damma ↔ sukun, …) — these are the bulk of the diacritic-noise wedge above.

char A char B count sample word pairs
َ ْ 7291
ُ ْ 4450
َ ُ 4234
َ ِ 3980
ِ ْ 3451
ذ ز 2368
ُ ِ 2109
ّ ْ 1498
ة ه 1455
د ذ 1259
ح ه 1143
ض ظ 1133
ً َ 1085
ث س 1031
س ص 959
ت د 875
ق ك 814
أ ع 752
ل ن 725
أ ا 694

3. Annotator rule violations

What this answers: Which annotators produce the most silent-alif and hasMistakes-flip violations, and how much does review catch?

Findings

Question this plot answers: Which high-volume annotators produce silent-alif violations most densely, after normalising by word count?
Observation: This is a density metric per 100 words, so it is the right chart for coaching producers. It is not just rewarding low-volume annotators.
Question this plot answers: Who most often marks hasMistakes=True even when their submitted text matches the reference?
Caveat: A high value means the annotator marked hasMistakes=True while the text still matched the reference. That is usually a flag-rubric problem, not necessarily bad transcription.
Question this plot answers: For annotators with enough silent-alif evidence, how much does review reduce the violation rate before final acceptance?
Caveat: Review-catch is a rate comparison, not a direct count of caught rows. Negative raw values are possible when accepted rows have a different word denominator; those rows are excluded here for manager readability.

4. Inter-annotator agreement (κ)

What this answers: When two annotators see the same audio, how often do they agree? Where do they systematically diverge?

Cohen κ — hasMistakes
0.819
two-rater · pair-pooled
Fleiss κ — hasMistakes
0.818
63,671 tasks
κ — word-level mistake
0.911
closest to model supervision
κ — phoneme-level
0.879

What κ means and why we use it

Raw "% agreement" is misleading on imbalanced labels: if 90% of tasks have no mistake, two raters can agree 90% of the time just by both saying "no mistake" — without actually looking at the audio. Cohen's κ and Fleiss' κ correct for chance agreement: they answer "how much better than coin-flip do the raters agree?". The formula is κ = (pobserved − pexpected by chance) / (1 − pexpected by chance).

  • Cohen's κ — pairwise (two raters at a time). We use it to score every annotator pair that share ≥1 task. This gives us the heatmap and the worst-pair leaderboard — granular, names names.
  • Fleiss' κ — generalises Cohen's to multi-rater tasks. When ≥3 raters touched the same task we can't reduce it to a single pair; Fleiss aggregates across the full rater pool. We report it for the team as a whole over the 63,671 tasks that have multiple annotators (here, mostly 2 raters per task — so Cohen and Fleiss land very close: 0.819 vs 0.818).

Beyond hasMistakes (binary), we also report word-level κ (every word labelled as mistake-or-not — closest to the model supervision signal) and phoneme-level κ (after aligning each word's phonemes).

In §7, median_word_kappa is simply the median of each cohort's per-annotator word_kappa_mean. In plain English: for a volume cohort, it asks "what is the typical annotator's chance-corrected agreement with peers on which words are mistakes?" It is a cohort baseline, not an individual score.

How to read the numbers (Landis & Koch convention):

κ rangeAgreement
0.00 – 0.20Slight
0.21 – 0.40Fair
0.41 – 0.60Moderate
0.61 – 0.80Substantial
0.81 – 1.00Almost perfect ← where we are

Findings

Question this plot answers: How much text agreement do we recover as we apply stronger normalisation?
Observation: The widening gap from exact to diacritic-normalised agreement quantifies normalisation noise. The diacritic-normalised value is the best proxy for semantic transcription agreement.
Question this plot answers: Which phoneme-level substitutions recur most often between annotators?
Observation: The phoneme chart surfaces recurring sound/spelling ambiguities. High counts here can come from many tiny endings, so use examples before writing a rule change.
Question this plot answers: Which annotator pairs have the weakest word-level agreement after requiring enough shared tasks?
Caveat: Low pairwise κ identifies annotator pairs that disagree; it does not prove which annotator is wrong. Use it as a sampling list for audio review.

5. Audio duration per batch

What this answers: How does per-task audio length distribute, and how much sits above the 20 s training cap?

Findings

Question this plot answers: What is the audio-length distribution, and how many tasks cross the 20 s training cap?
Observation: The 20 s line is the training-cap boundary, not an annotation-quality threshold. Batch 35 contributes the most over-cap clips: 36,033 tasks (72.1%).
Question this plot answers: How do the median, spread, and tail length compare between batches?
Caveat: The histogram is capped at 60 s for readability. The summary table is the safer source for cap-policy decisions because it keeps the full per-task counts.

Per-batch summary

batch n_tasks median_s p95_s over_20s over_20s_pct
Batch 34 121733 14.380 19.380 0 0.0%
Batch 35 49984 25.800 38.180 36033 72.1%

6. Task lifecycle — where does time go?

What this answers: How long does a task spend in each pipeline stage, and where is the bottleneck?

How each metric is calculated

Producer: jobs/lifecycle/build.pybasirah.lifecycle.build_task_lifecycle. Two independent clocks per task: wall-clock timestamps (*_at) and front-end UI-active milliseconds (lead_time_ms).

metricsourceformula
wait_for_annotator_swall-clockfirst_annotation.annotation_created_at − task.task_created_at
annotation_lead_time_total_sUI-activesum(annotation.lead_time_ms) / 1000 over all annotations on the task (resubmissions counted)
wait_between_annotation_and_review_swall-clock (inferred)review_start_proxy − submission_proxy, where submission_proxy = first_annotation.annotation_created_at + first_annotation.lead_time_ms and review_start_proxy = first_review.created_at − first_review.lead_time_ms. Negative values dropped to NA.
review_lead_time_total_sUI-activesum(review.lead_time_ms) / 1000 over all review events on the task
idle_time_sderivedtotal_time_to_accept_s − (annotation_lead_time_total_s + review_lead_time_total_s) — wall-clock minus active UI time. Negatives dropped to NA.
total_time_to_accept_swall-clockmax(annotation.accepted_at) − task.task_created_at. Unresolved tasks (is_resolved = False) are excluded from the plots below.

annotation.updated_at is not used as a submission timestamp — review events and draft saves both bump it, so it isn't a reliable proxy for when the annotator stopped editing. lead_time_ms is the only honest UI-active duration in the schema.

Findings

Median per phase, per batch

phase batch n median p25 p75 p95
Wait for annotator (queue) Batch 34 53743 45.8 d 36.5 d 74.0 d 81.4 d
Wait for annotator (queue) Batch 35 27479 21.3 d 14.5 d 28.0 d 32.9 d
Annotation hands-on (UI) Batch 34 53743 1.2 min 37.2 s 2.1 min 4.5 min
Annotation hands-on (UI) Batch 35 27479 1.8 min 1.1 min 2.9 min 5.8 min
Wait between annotation & review (queue) Batch 34 53504 12.3 d 3.9 d 24.9 d 55.5 d
Wait between annotation & review (queue) Batch 35 26226 9.0 min 3.8 min 33.5 min 11.1 h
Review hands-on (UI) Batch 34 53743 1.0 min 23.5 s 2.9 min 9.7 min
Review hands-on (UI) Batch 35 27479 2.3 min 1.3 min 4.4 min 10.9 min
Idle (total − all hands-on) Batch 34 53743 77.1 d 42.6 d 94.8 d 126.0 d
Idle (total − all hands-on) Batch 35 27479 21.8 d 14.9 d 28.5 d 33.1 d
Total elapsed Batch 34 53743 77.1 d 42.6 d 94.8 d 126.0 d
Total elapsed Batch 35 27479 21.8 d 14.9 d 28.5 d 33.1 d

Distributions (one panel per metric × batch)

Question this plot answers: For each lifecycle phase, where does time concentrate and which batch has the longer queue/tail?
Reading guide: Each lifecycle histogram now has a dashed red median line. Long right tails are clipped only for display, so the median line is the stable reference point.
Question this plot answers: How often did a task bounce back to annotation before it was finally accepted, and is that different by batch?
How it was made: Same logic as the Streamlit lifecycle page: one row per resolved task from reports/lifecycle/task_lifecycle.parquet, using n_sent_back_max as the number of sent-back review rounds. Counts are grouped by batch and clipped visually at 8 rounds.
Observation: The sent-back chart counts review loops per resolved task. A rate above zero is normal; multiple rounds are the expensive cases because they add reviewer events and queue time.

7. Annotator scorecard, volume-normalised

What this answers: Comparing annotators fairly when some produced 50× more output than others — who's actually worse than their volume cohort?

Findings

How to read this scorecard

Do not read sent_back_rate as "% of annotations sent back". Read it as workload-normalised rework pressure: sent-back review events divided by total annotations. Values above 1 mean the annotator averaged more than one sent-back event per annotation, usually because some annotations cycled through review multiple times.

mean_worse_z averages signed within-cohort z-scores. Positive means worse than peers at similar volume; negative means better. This is the fair leaderboard because heavy producers naturally accumulate more raw mistakes.

Cohort summary

cohort n median_n_total median_accept_rate median_sent_back_rate median_word_kappa
heavy 24 6676.500 0.499 0.382 0.912
medium 22 2442.000 0.576 0.447 0.898
light 24 882.500 0.557 0.455 0.907
low_n 13 10.000 0.568 0.100 0.814
Observation: Cohorts are volume buckets, not quality labels. Compare an annotator to their cohort first; only then compare across cohorts.
Question this plot answers: Does annotation volume correlate with accept rate, and which cohort does each annotator belong to?
Caveat: The x-axis is log-scaled. Low-volume outliers can look visually close to heavy producers, but their rates are much less stable.

15 worst annotators relative to their cohort

annotator_name cohort n_total accept_rate sent_back_rate rate_silent_alif_per_100w word_kappa_mean mean_worse_z
Saleh Diaa Ahmed light 61 0.623 1.016 0.104 0.667 1.181
Ahmed Khalifa light 192 0.750 0.599 0.041 0.800 1.059
Marwan medium 2926 0.605 0.309 0.040 0.817 1.033
Gaber Alshykh heavy 7830 0.533 0.272 0.049 0.828 1.020
Mohamed Abdelghany heavy 4973 0.548 0.266 0.043 0.814 0.969
Abdullah Mohamed Samir medium 2002 0.584 0.463 0.064 0.839 0.857
Sherif Bakry heavy 4927 0.039 0.048 0.974 0.864 0.830
عبدالله صلاح العيسوي light 1402 0.380 0.322 0.150 0.791 0.783
Mahmoud Elsaey medium 1940 0.412 0.284 0.109 0.812 0.697
Basma Mohammad light 1263 0.781 0.678 0.006 0.828 0.648
مودة جمال light 203 0.478 0.458 0.342 0.763 0.643
Ibrahim zaid light 1542 0.423 0.309 0.102 0.788 0.606
Aya Mostafa heavy 7811 0.382 0.251 0.051 0.859 0.491
ghada ahmed medium 2674 0.427 0.255 0.035 0.820 0.485
ياسر ربيع heavy 5004 0.472 0.447 0.286 0.872 0.450
Observation: This table intentionally excludes low_n annotators. The listed people are not necessarily the worst raw counts; they are worst relative to peers with similar annotation volume.

8. Silent-alif forensics

What this answers: Where does the Uthmani-form violation enter the pipeline, and when?

preann seed has SA
56,096
21.1%
annotation has SA
7,272
2.7%
accepted has SA
507
0.2%
annotator removed
49,598
seed → submission cleanup

Findings

Question this plot answers: At which pipeline stage does the silent-alif violation enter, get fixed, or leak into accepted text?
Observation: The funnel shows the violation shrinking sharply from preannotation to submission, which means annotators are mostly cleaning an upstream seed problem. The remaining accepted leaks are reviewer enforcement failures.
Question this plot answers: Which annotators account for the most accepted silent-alif leaks by raw count?
Caveat: Accepted-leak counts are volume-sensitive. Use this chart to pick examples, then pair it with per-100-word rates before coaching.
Question this plot answers: Which reviewers accepted the most silent-alif leaks by raw count?
Observation: Reviewer leak counts identify where final acceptance allowed a known violation through. This is an enforcement signal, not a producer-rate signal.

Recommended actions

  1. Fix the silent-alif seed in tdreeb. Add an Imlaei normalisation step in tdreeb/jobs/create_tasks/pipeline before referenceText is written into Tawseem. The rule table in basirah/src/rule_violations.py is portable.
  2. 1:1 with Sherif Bakry on silent-alif — heavy contributor on both sides (4.9k annotations + 27.3k reviews), highest annotator production rate of the violation (0.97/100w), and lets through 36% of all accepted leaks as reviewer. Walk through 5 examples from reports/audit/silent_alif_per_annotation.parquet filtered to his rows.
  3. Split annotator_focus_issue into granular RCA buckets. It accounts for 28% of all annotations and 3× the next RCA, so it is too broad to drive targeted fixes. Replace it with specific sub-causes such as missed word, wrong passage, skipped segment, uncertainty handling, speed slip, or UI/workflow issue.
  4. Triage the Batch 34 review queue. Median 12.3 days between submission and review is the single biggest lever to shorten end-to-end time. Staff or re-route reviewers.
  5. Decide Batch 35 over-cap policy. 72% over 20 s is unusable for training as-is. Either re-segment those clips before annotation, or lift the training cap and accept the memory cost.
  6. Coach the worst-relative-to-cohort annotators (Saleh Diaa Ahmed, Ahmed Khalifa, Marwan, Gaber Alshykh, Mohamed Abdelghany). All are > +1σ worse than their volume peers across multiple metrics.
  7. Clarify the hasMistakes flag SOP. Top-5 annotators all sit at ~41% "hasMistakes=True but text matches" — that's an interpretation problem, not a personal one.