- Stripe Connect — the payment rail itself, run by Stripe. Stripe’s uptime track record applies directly to subscription billing.
- Recurr’s webhook + control layer — the events stream, retry logic, replay surface, and operator dashboard.
- The customer’s downstream systems — entitlement sync, BI ingest, CS tooling. Recurr’s webhook delivery is the contract; downstream handling is the customer’s.
Service level commitments
Formal SLA terms finalise in customer-specific MSAs. The operational targets Recurr operates against:| Surface | Target |
|---|---|
| Webhook delivery (first attempt) | < 5 seconds from billing-event trigger |
| Webhook delivery (after retry) | 99.9% success within 24 hours |
| Subscription state read API | < 200ms p95 |
| Replay API | < 60 seconds for cohort up to 10K events |
| Operator dashboard | 99.9% availability |
Incident response
When something breaks, Recurr’s response runs the same pattern:Detection
Monitoring fires on signature failures: webhook delivery failure rate above baseline, latency spikes, error-code anomalies, dependency degradation (Stripe, Resend, downstream destinations). Synthetic checks run continuously against the API surface.
Triage + response
On-call engineer paged within 5 minutes of detection. Initial response within 15 minutes — acknowledgement, scope assessment, communication plan.
Communication
Status page updated within 30 minutes of detection. For customer-impacting incidents, designated technical + commercial contacts notified directly. Updates at 30-minute intervals while the incident is open.
Error modes + handling
The common failure surfaces and how Recurr handles them:Webhook delivery failure to customer endpoint
The customer’s endpoint returns 5xx or times out. Recurr retries on exponential backoff (1m, 5m, 30m, 2h, 12h, 24h × N) for up to 7 days. After 7 days the event lands in a dead-letter queue for manual replay. The customer’sevent_id stays stable across retries, so downstream dedup remains straightforward. See the retry policy for full mechanics.
Subscription state collision (Apple receipt vs. web sub)
If a subscriber holds both an active Apple receipt and an active web subscription (mid-migration race condition), Recurr writes both states to the entitlement system and flags the conflict. The customer’s entitlement logic decides resolution; Recurr surfaces the conflict for manual review.Failed payment on web sub
Stripe’s smart retries handle ~30% of involuntary churn on first failure. Recurr’s dunning layer then runs the configured recovery cadence — email + in-portal — before subscription cancels. The full recovery sequence is captured as events your CS team can react to in real time.Identity bridge failure (migration arrival)
If a migrated subscriber arrives at branded checkout and the identity bridge can’t match them to their existing app account, the flow routes to a manual-resolution surface. Recurr’s CX support layer holds the subscriber’s state; the customer’s CX team gets a ticket with the relevant subscription context.Stripe Connect dependency
Stripe is Recurr’s payment rail; Stripe outages affect subscription billing. Recurr can’t insulate from Stripe-side incidents but communicates them clearly when they happen — Stripe status flows into Recurr’s status page.On-call coverage
While pre-customer, on-call is founder-led (Matt) with 24/7 paging. Response targets:- Critical incident (customer impact, data integrity): < 15 minutes
- High severity (operational degradation): < 1 hour
- Standard (non-blocking issue): next business day
Status page
Recurr’s status page is at recurr.instatus.com. All incidents, scheduled maintenance, and dependency degradations land there with timestamps and impact scope. Subscribe via email or RSS for incident notifications.What’s not yet formalised
Honest pre-customer framing:- SOC 2 Type II — in progress; Type I target alongside customer-#1 deployment, Type II 12 months after. See compliance posture.
- Multi-region failover — currently single-region (US-East). Multi-region with active failover scoped for post-customer-5.
- Formal SLA percentages with credits — finalise in customer-specific MSAs. The operational targets above are what Recurr runs against; the contractual percentages reflect the same targets with explicit credit mechanics.
Cross-references
- Risk register → — what could go wrong, segmented by exposure
- Compliance posture → — SOC 2, GDPR, audit posture
- Infrastructure → — where data lives, encryption, access controls
- API reference overview → — webhook delivery + retry mechanics
