Live-pipeline leak — empirical validation

Author: Khaled Abdelhamid · Generated 2026-05-25 · Data source: ClickHouse tarteeldb.public_* · For: TA-4670 preflight gate decision & tomorrow's MCP review meeting
TL;DR. The leak TA-4634 identified is still active at ~7.8 users/day persistent (~0.1% race rate on trial-end transitions, not the ~6K/day TA-4634 projected). About 1,600 additional users are in a transient ~24-hour cleanup window at any given moment — handled by the existing 01:30 UTC process_recording_tagging_queue. The safety-net (TA-4917) catches the persistent residue: ~55 users/week × ~8–10 sessions ≈ ~450–550 sessions per weekly run, ~3–8 seconds wall-clock.

1. What we measured

A leaked session is a profiles_quransession row where:

Restricted to sessions created on or after 2026-05-05 04:19 UTCTA-4632's S3-inventory cutoff — so we measure only the live leak rate, not the historical residue (343 M sessions, separately addressed by TA-4670 Workstream 1).

Query (ClickHouse, executed against tarteeldb.public_*): saved at /tmp/leak_validation_main.sql (embedded in §7). Entitlement CTE mirrors basirah-lab:jobs/orphan_audio_audit/identify_users.sql verbatim except that students_pg (PG-only export) is omitted — at worst slightly overcounts likely_leaked by the count of active students (≪ 1% of users).

2. Empirical findings

2.1 Persistent race-failure rate (per lapse-date, last 21 days)

What this chart shows: for each leaked user, their most-recent lapse moment binned by day. Today's transient cohort (users whose trial just ended and will be cleaned by tomorrow's 01:30 UTC cron) is shown separately on the right — it is not what the safety-net targets. The visible signal is the persistent race-failure backlog: ~5–15 users/day across every past day, flat.
20 15 10 5 0 mean ≈ 7.8/day 05-06 05-11 05-16 05-21 05-25 persistent race-failure backlog (lapse-date) ≈ 1,622 (scale break) 05-26 (today) trial just ended transient cohort
Left panel: persistent race-failure backlog by lapse date. ~5–15 users/day, mean ≈ 7.8/day, flat across the past 20 days. This is what the safety-net catches.
Right panel (scale break): today's transient cohort (1,622 users whose trial ended today, awaiting tomorrow's 01:30 UTC cleanup). Not on the safety-net's path.
Lapse dayPersistent race-failuresNote
2026-05-26 (today)1,622Transient — trial ending today, will be cleaned by next 01:30 UTC cron run.
2026-05-2511Persistent
2026-05-249Persistent
2026-05-2310Persistent
2026-05-227Persistent
2026-05-2113Persistent
2026-05-2013Persistent
2026-05-1916Persistent (10 trial-driven, 6 sub-driven)
5-18 → 5-06flat ~5–13/dayPersistent

Per-week math for the safety-net: ~7.8 persistent race-failures/day × 7 days ≈ ~55 users/week × ~8–10 sessions/user ≈ ~450–550 sessions/week to flip. That's the workload the safety-net actually catches (the existing 01:30 UTC cron handles the transient cohort).

2.2 Composition of leaked users

6,175 unique users were counted leaked at the 2026-05-25 snapshot. The entitlement-history breakdown below is informative: ~all leaked users have a free-trial history, ~none lack any entitlement signal entirely — confirming the gate works at session-create and the bug is in the post-trial path.

CategoryUsersShareInterpretation
Ever had a free trial (any time)6,16299.8%~all leaked users have a trial somewhere in their history. Note: trials are nearly universal in Tarteel (auto-grant on first login), so this is closer to "active user" than "trial lapser".
Ever had a subscription1081.7%Tiny minority. Mostly old churners (6,073 subs expired > 90d ago across all leaked users — overlaps with "ever had a subscription" weakly).
· Subscription expired in last 30d1021.7%Within our measurement window — these are recent paid-tier lapsers.
· Subscription expired 30–90d ago00%Suspiciously empty — likely a clustering artifact in the cohort.
· Subscription expired > 90d ago6,07398.3%**Likely overlap with the "ever had trial" group: same user with both signals counted in both rows.
Ever had family-plan membership40.06%Negligible — not a meaningful lapse path here.
Ever had alim beneficiary1332.2%Small. Likely access-revoked cases.
Never had ANY entitlement signal2~0%Almost zero. Means the PR #1564 entitlement gate is doing its job at session-create time — <1 per ten thousand sessions bypass it.
Most useful read: the leak is overwhelmingly users who consumed a 7-day free trial, exited, but somehow continue producing has_audio=TRUE sessions for now-non-entitled accounts. PR #1564's session-create gate is holding (only 2 users with zero entitlement signal). The bug is in the post-trial path — TA-4634's exact diagnosis: trial expiry races / lost events in profiles/tasks/user_management.py:92 failing to flip has_audio.

3. Why the leak happens — root cause (from TA-4634, structurally unchanged)

TA-4634 traced this to two compounding code-path facts in core-api:

  1. PR #1564 (TA-4027, Feb 2026) was the first commit to gate has_audio at session create on entitlement (if not user.should_store_audio: has_audio = False). The gate is holding: today, only ~2 of 6,175 leaked users had no entitlement signal ever.
  2. process_recording_tagging_queue is event-driven — it consumes a Redis queue of downgrade events. The lapse paths that don't enqueue an event leave the flag stuck at TRUE:
    • Trial expiry (profiles/tasks/user_management.py:92) — enqueues with eligible_at = now(), but if anything filters the user out (e.g. they re-subscribed between trial-end and queue run, then unsubscribed), the flag isn't flipped. This is the dominant lapse path in our data.
    • Subscription EXPIRED transitions that bypass post_subscription_downgrade_events (webhook dedup edge cases).
    • Family-plan / alim / student status changes that don't trigger process_recording_tagging_queue at all.

None of the 4 commits to core-api/main since TA-4634 was filed (2026-05-06) touch profiles/tasks/user_data.py, profiles/tasks/user_management.py, or the _delete_audio_recordings path. Verified by:

git log --since=2026-05-06 --oneline origin/main -- \
  profiles/tasks/user_data.py profiles/tasks/user_management.py \
  profiles/models/users.py billing/ profiles/views/webhooks.py

Adjacent activity worth noting: Yūnus merged #1622 today ("Add event tracking for expired event") which instruments the billing expiration task but doesn't patch the lapse-pipeline gap. He may be staging diagnostic work in this area.

4. What it consists of — the leaked sessions themselves

Reading the chart in §2.1 directly:

5. Recommended action — TA-4917 (filed, PR open)

Weekly safety-net. CH identifies leaked sessions, PG flips them via async psycopg PK lookup. Sundays 02:45 UTC — runs after the existing 01:30 UTC process_recording_tagging_queue so it catches only the residual race-failures (~7.8/day × 7d ≈ ~55 users, ~450–550 sessions per week — see §2.1).

Final shape (matches the codebase pattern):

Ticket: TA-4917. PR: core-api#1624.

6. Implications for TA-4670

7. Queries (reproducible)

All queries were executed against ClickHouse Cloud (v12748f444.us-west-2.aws.clickhouse.cloud) with the read-only khaled_ro credentials per the local clickhouse skill. Saved alongside this report.

Q1 — Main: daily leak rate (post-audit-cutoff, last 21 days)
File: /tmp/leak_validation_main.sql · Returns: day, new has_audio=TRUE sessions, likely leaked, leak %
-- Leak validation: is the live-pipeline gap still active 20 days post-audit?
-- Mirrors identify_users.sql entitlement logic (omits students_pg).
-- See: TA-4634 (Done 2026-05-06), basirah-lab/jobs/orphan_audio_audit/identify_users.sql

WITH
    now64() AS run_now,
    toDateTime64('2026-05-05 04:19:00', 6, 'UTC') AS audit_cutoff,

    entitled_users AS (
        SELECT DISTINCT user_id FROM tarteeldb.public_billing_subscription FINAL
        WHERE _peerdb_is_deleted = 0 AND status IN ('ACTIVE', 'GRACE', 'CANCELLED')
        UNION DISTINCT
        SELECT id AS user_id FROM tarteeldb.public_profiles_user FINAL
        WHERE _peerdb_is_deleted = 0 AND subscription_type = 'PREMIUM'
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_profiles_freetrial FINAL
        WHERE _peerdb_is_deleted = 0 AND started_at <= run_now AND ended_at > run_now
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_profiles_familymember FINAL
        WHERE _peerdb_is_deleted = 0
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_alim_beneficiary FINAL
        WHERE _peerdb_is_deleted = 0 AND access_revoked >= run_now
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_billing_subscription FINAL
        WHERE _peerdb_is_deleted = 0 AND start_date <= run_now - INTERVAL 2 YEAR
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_billing_subscription FINAL
        WHERE _peerdb_is_deleted = 0 AND status = 'EXPIRED' AND expired_at > run_now - INTERVAL 7 DAY
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_billing_subscriptionevent FINAL
        WHERE _peerdb_is_deleted = 0
          AND type IN ('EXPIRED', 'CANCELLED')
          AND created_at > run_now - INTERVAL 7 DAY
    )

SELECT
    toDate(created_at) AS day,
    count() AS new_has_audio_sessions,
    countIf(user_id NOT IN (SELECT user_id FROM entitled_users)) AS likely_leaked,
    round(
        countIf(user_id NOT IN (SELECT user_id FROM entitled_users))
        / nullIf(count(), 0) * 100,
    1) AS leak_pct
FROM tarteeldb.public_profiles_quransession FINAL
WHERE _peerdb_is_deleted = 0
  AND created_at >= audit_cutoff
  AND has_audio = TRUE
  AND session_type = 'RECITE'
  AND is_deleted = FALSE
GROUP BY day
ORDER BY day DESC
FORMAT TabSeparatedWithNames
Q2 — Composition: per-lapse-path user breakdown
File: /tmp/leak_composition.sql · Returns: vertical single-row summary (subscription / trial / family / alim / never-had-any)
-- Composition of the leak: per "lapse path", how many leaked USERS have that history?
-- For each leaked user (currently non-entitled, has_audio=TRUE post-cutoff session),
-- classify what their last-known entitlement signal was.

WITH
    now64() AS run_now,
    toDateTime64('2026-05-05 04:19:00', 6, 'UTC') AS audit_cutoff,

    entitled_users AS (
        -- (same 8-branch UNION as Q1 — see above)
        SELECT DISTINCT user_id FROM tarteeldb.public_billing_subscription FINAL
        WHERE _peerdb_is_deleted = 0 AND status IN ('ACTIVE', 'GRACE', 'CANCELLED')
        UNION DISTINCT
        SELECT id AS user_id FROM tarteeldb.public_profiles_user FINAL
        WHERE _peerdb_is_deleted = 0 AND subscription_type = 'PREMIUM'
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_profiles_freetrial FINAL
        WHERE _peerdb_is_deleted = 0 AND started_at <= run_now AND ended_at > run_now
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_profiles_familymember FINAL
        WHERE _peerdb_is_deleted = 0
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_alim_beneficiary FINAL
        WHERE _peerdb_is_deleted = 0 AND access_revoked >= run_now
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_billing_subscription FINAL
        WHERE _peerdb_is_deleted = 0 AND start_date <= run_now - INTERVAL 2 YEAR
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_billing_subscription FINAL
        WHERE _peerdb_is_deleted = 0 AND status = 'EXPIRED' AND expired_at > run_now - INTERVAL 7 DAY
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_billing_subscriptionevent FINAL
        WHERE _peerdb_is_deleted = 0
          AND type IN ('EXPIRED', 'CANCELLED')
          AND created_at > run_now - INTERVAL 7 DAY
    ),

    leaked_users AS (
        SELECT DISTINCT user_id
        FROM tarteeldb.public_profiles_quransession FINAL
        WHERE _peerdb_is_deleted = 0
          AND created_at >= audit_cutoff
          AND has_audio = TRUE
          AND session_type = 'RECITE'
          AND is_deleted = FALSE
          AND user_id NOT IN (SELECT user_id FROM entitled_users)
    ),

    -- Per leaked user: do they have a record in each source?
    sub_history AS (
        SELECT DISTINCT user_id, max(expired_at) AS last_expired
        FROM tarteeldb.public_billing_subscription FINAL
        WHERE _peerdb_is_deleted = 0
          AND user_id IN (SELECT user_id FROM leaked_users)
        GROUP BY user_id
    ),
    trial_history AS (
        SELECT DISTINCT user_id, max(ended_at) AS last_ended
        FROM tarteeldb.public_profiles_freetrial FINAL
        WHERE _peerdb_is_deleted = 0
          AND user_id IN (SELECT user_id FROM leaked_users)
        GROUP BY user_id
    ),
    family_history AS (
        SELECT DISTINCT user_id
        FROM tarteeldb.public_profiles_familymember FINAL
        WHERE user_id IN (SELECT user_id FROM leaked_users)
    ),
    alim_history AS (
        SELECT DISTINCT user_id
        FROM tarteeldb.public_alim_beneficiary FINAL
        WHERE user_id IN (SELECT user_id FROM leaked_users)
    )

SELECT
    countDistinct(l.user_id) AS total_leaked_users,
    countDistinctIf(l.user_id, sh.user_id != 0) AS ever_had_subscription,
    countDistinctIf(l.user_id, sh.last_expired > run_now - INTERVAL 30 DAY) AS sub_expired_last_30d,
    countDistinctIf(l.user_id, sh.last_expired BETWEEN run_now - INTERVAL 90 DAY AND run_now - INTERVAL 30 DAY) AS sub_expired_30d_90d,
    countDistinctIf(l.user_id, sh.last_expired < run_now - INTERVAL 90 DAY) AS sub_expired_over_90d,
    countDistinctIf(l.user_id, th.user_id != 0) AS ever_had_trial,
    countDistinctIf(l.user_id, fh.user_id != 0) AS ever_had_family,
    countDistinctIf(l.user_id, ah.user_id != 0) AS ever_had_alim,
    countDistinctIf(l.user_id,
        sh.user_id = 0 AND th.user_id = 0 AND fh.user_id = 0 AND ah.user_id = 0
    ) AS never_had_any_entitlement
FROM leaked_users l
LEFT JOIN sub_history sh ON sh.user_id = l.user_id
LEFT JOIN trial_history th ON th.user_id = l.user_id
LEFT JOIN family_history fh ON fh.user_id = l.user_id
LEFT JOIN alim_history ah ON ah.user_id = l.user_id
FORMAT Vertical
Q3 — Subscription expiration pattern (last 30 days) — ruled out Stripe-renewal hypothesis
File: /tmp/leak_expiration_pattern.sql · Returns: expirations per day. Result was flat ~250–300/day, no spike around 2026-05-11.
-- Hypothesis: the 2026-05-18 leak cliff is driven by a batch of subscription
-- EXPIRED events around 2026-05-11 (7 days earlier — grace period rolls off).
-- PR #1621 (Yūnus, 2026-05-25) fixed "Stripe renewal transactions being silently
-- dropped" + ships scripts/backfill_stripe_renewals.py. If that's the cause,
-- we should see an anomalous spike of EXPIRED events around 2026-05-11.

WITH
    now64() AS run_now

SELECT
    toDate(expired_at) AS expired_day,
    count() AS expired_subscriptions,
    -- Compare to active count for context
    countIf(status = 'EXPIRED') AS now_status_expired,
    countIf(status = 'ACTIVE') AS now_status_active,
    countIf(status = 'GRACE') AS now_status_grace,
    countIf(status = 'CANCELLED') AS now_status_cancelled
FROM tarteeldb.public_billing_subscription FINAL
WHERE _peerdb_is_deleted = 0
  AND expired_at >= run_now - INTERVAL 30 DAY
  AND expired_at <= run_now
GROUP BY expired_day
ORDER BY expired_day DESC
FORMAT TabSeparatedWithNames
Q4 — Free trial ending pattern (last 30 days) — ruled out trial-cohort hypothesis
File: /tmp/leak_trial_pattern.sql · Returns: trial-end events per day. Result was flat ~7,500/day, avg trial length 7 days.
-- Free trials ending by day for last 30 days. Looking for cohort spike that
-- explains the 2026-05-18 leak cliff. Trials have no grace period per the audit
-- (profiles/tasks/user_management.py:92), so trial-end → immediate "non-entitled".

WITH
    now64() AS run_now

SELECT
    toDate(ended_at) AS trial_ended_day,
    count() AS trials_ending,
    -- For these trials, were they actually consumed (started_at < ended_at)?
    countIf(started_at < ended_at) AS started_before_end,
    -- Average trial length in days
    round(avg(dateDiff('day', started_at, ended_at)), 1) AS avg_trial_days
FROM tarteeldb.public_profiles_freetrial FINAL
WHERE _peerdb_is_deleted = 0
  AND ended_at >= run_now - INTERVAL 30 DAY
  AND ended_at <= run_now
GROUP BY trial_ended_day
ORDER BY trial_ended_day DESC
FORMAT TabSeparatedWithNames
Q5 — User concentration per day (sessions-per-leaked-user)
File: /tmp/leak_concentration.sql · Returns: leaked sessions, unique leaked users, and avg sessions per user per day. Confirms anomaly is many users (5,736 on 2026-05-18) not a few power-users.
WITH
    now64() AS run_now,
    toDateTime64('2026-05-05 04:19:00', 6, 'UTC') AS audit_cutoff,

    entitled_users AS (
        -- (same 8-branch UNION as Q1 — see above)
        SELECT DISTINCT user_id FROM tarteeldb.public_billing_subscription FINAL
        WHERE _peerdb_is_deleted = 0 AND status IN ('ACTIVE', 'GRACE', 'CANCELLED')
        UNION DISTINCT
        SELECT id AS user_id FROM tarteeldb.public_profiles_user FINAL
        WHERE _peerdb_is_deleted = 0 AND subscription_type = 'PREMIUM'
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_profiles_freetrial FINAL
        WHERE _peerdb_is_deleted = 0 AND started_at <= run_now AND ended_at > run_now
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_profiles_familymember FINAL
        WHERE _peerdb_is_deleted = 0
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_alim_beneficiary FINAL
        WHERE _peerdb_is_deleted = 0 AND access_revoked >= run_now
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_billing_subscription FINAL
        WHERE _peerdb_is_deleted = 0 AND start_date <= run_now - INTERVAL 2 YEAR
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_billing_subscription FINAL
        WHERE _peerdb_is_deleted = 0 AND status = 'EXPIRED' AND expired_at > run_now - INTERVAL 7 DAY
        UNION DISTINCT
        SELECT DISTINCT user_id FROM tarteeldb.public_billing_subscriptionevent FINAL
        WHERE _peerdb_is_deleted = 0
          AND type IN ('EXPIRED', 'CANCELLED')
          AND created_at > run_now - INTERVAL 7 DAY
    )

SELECT
    toDate(created_at) AS day,
    count() AS leaked_sessions,
    countDistinct(user_id) AS leaked_users,
    round(count() / nullIf(countDistinct(user_id), 0), 2) AS avg_sessions_per_user
FROM tarteeldb.public_profiles_quransession FINAL
WHERE _peerdb_is_deleted = 0
  AND created_at >= audit_cutoff
  AND has_audio = TRUE
  AND session_type = 'RECITE'
  AND is_deleted = FALSE
  AND user_id NOT IN (SELECT user_id FROM entitled_users)
GROUP BY day
ORDER BY day DESC
FORMAT TabSeparatedWithNames
References