Riverbend Veterinary Group synthetic data methodology

Riverbend is a fictional 4-hospital specialty/ER/urgent-care network we built to demo Central Uplift. Every number you see in the demo is generated; no real patient, clinician, or financial record is involved. This document covers how the data is produced, which distributions are grounded in published research, and the seeded anomalies you can verify yourself. The lead anomaly, and the demo’s lead story, is charge integrity: completed care that got charted but never made it onto a bill, reconciled across every PMS the network runs.

Your software catches the charges your team enters in that one system; we show you the completed care that got charted but never billed across all the systems your hospitals run, including the older sites that catch none of it.

Network composition

Hospital	Type	PMS	Location	Open since	Integration date
Riverbend Specialty Pittsburgh	specialty	ezyVet	Pittsburgh PA	2019-03-12	2025-02-04
Riverbend Emergency Cleveland	ER	Instinct	Cleveland OH	2017-06-22	2024-11-18
Riverbend Urgent Care Columbus	urgent care	Cornerstone	Columbus OH	2021-07-08	2025-01-06
Riverbend Specialty Akron	specialty	AVImark	Akron OH	2018-09-01	2026-03-15 (recent acq.)

~8-month visit window: September 2025 → April 2026. Akron only has data from 2026-03-15 forward, matching its post-acquisition integration moment. The PMS mix is the point: a modern self-serve PIMS (ezyVet, Instinct), a Cornerstone site, and a legacy AVImark site with no native charge capture.

Volumes (synthetic targets vs. realized)

Entity	Target	Realized
Clinicians	~64 (~16/hosp)	56
Patients	~7,800	7,617
Referring practices	~110	110
Visits	~28,000	27,632
Procedures (line items)	~100,000	102,332
Referrals	~4,800	4,820
rDVM communications	~3,700	3,754
Daily aggregates	one per (hospital, day) with traffic	773

Procedures per non-empty visit run ~3.9 — a real ER/specialty episode bills several lines (exam + diagnostics + fluids + an injection + nursing/consumables), not an exam plus a coin-flip second item. Network billed revenue annualizes to ~$31.3M.

Benchmark anchoring

Where a real benchmark exists we name it exactly. Where no public veterinary benchmark exists (ER walkout rate, rDVM report-back compliance, imaging revenue share, household value, bail rate), the number is labeled a Central Uplift estimate or construct and is never dressed as a third-party figure.

Missed-charge magnitude (the lead wedge) — Instinct, The State of Emergency and Specialty Veterinary Care in 2024 (545 professionals): 35% of ER/specialty practices leave completed services off the bill (down from 43%). AVMA: the industry misses ~5–10% of all charges. These frame whether a measured rate is high or low; the demo’s per-site rates are computed on the seeded data, not substituted from these.
Species mix (60% canine, 35% feline, 5% exotic) — AVMA 2022 Pet Ownership and Demographics Sourcebook; the specialty-network skew toward dogs is consistent with referral patterns at multi-site groups.
ER walkout baseline (~6%) — no public veterinary ER LWBS (left-without-being-seen) benchmark exists; this is a Central Uplift estimate. Directional support: Instinct’s State of Emergency and Specialty Veterinary Care 2024 found 16% of ER patients wait more than two hours, consistent with walkouts being a real, material loss.
rDVM report-back rate — the 2025 AAHA Referral Guidelines codify the specialty-to-primary discharge summary as the expectation. No public compliance-rate distribution exists, so the ~94% network baseline and Cleveland’s ~50% are Central Uplift constructs illustrating the metric, not cited rates.
Specialty consult average revenue (~$385) — AAHA, The Veterinary Fee Reference, 11th edition (US fees for 500+ services across 950 practices); our EXM-101 line item is $385.
Imaging share of ER procedure revenue — modeled as a Central Uplift construct (no public “imaging share of ER revenue” benchmark exists). Cleveland’s imaging share of procedure revenue trends up across the window (roughly 13% in the first full month to ~22% in the last) as the CT lease comes online. The demo moment is the upward trend, which is a modeled linear ramp; the level (imaging meaningful but well under half of revenue) is deliberately realistic for a vet ER. Thoracic CT attaches to ~5% of ER visits, not the human-academic-center rate a higher number would imply.
Price dispersion — per-line revenue carries a modeled spread (discounts, modifiers, regional/seniority pricing), so SELECT DISTINCT revenue_cents for any procedure code returns a distribution, not a single constant. A real billing extract never has exactly one price per code; the spread is deterministic (drawn from each line’s own hash) so the dataset still regenerates row-for-row.
Procedure codes — these are practice-style invoice item codes (PIMS-native), abstracted from common ezyVet/Cornerstone/AVImark conventions and mapped to the VMG/AAHA Chart of Accounts revenue categories (exam, imaging, lab, surgery, anesthesia, pharmacy, hospitalization, therapeutic, supplies). We do not use CPT codes; CPT is a copyrighted human-medicine code set with no veterinary application. Clinical coding in veterinary medicine is standardized by the VeNom Coding Group, not by a payer taxonomy.

Seeded anomalies (each query-verifiable)

Run SELECT against the demo D1 — the SQL shown is what each dashboard tab executes. The dashboards read a trailing 240-day window relative to the current date, so the exact counts below move slightly day to day; the magnitudes and directions are stable. Values shown are the realized figures for the seeded dataset within that window.

#	Anomaly	Where in demo	Verifying query
1	Charge integrity (the lead). Completed care charted but never billed: ~9.9% of charted line value network-wide, ~$1.8M net recoverable in the window (gross ~$2.1M before the disclosed 0.85 realization factor). The spread is the story: Akron (legacy AVImark, no native charge capture) misses ~33.6% of charted line value, the Cleveland ER ~14.3%, against Pittsburgh ezyVet ~7.2% and Columbus Cornerstone ~5.1%	`/dashboard/charge-integrity`	`src/lib/queries.ts:chargeIntegrity`; `SUM(expected_revenue_cents) - SUM(revenue_cents)` per hospital, net of the realization factor
2	Westwood Animal Hospital referral loop — about 15.5% of the referrals it originated completed at an outside specialty hospital instead of one of the network’s own sites (115 of 742 in the window; ~$126,500 at the single $1,100 per-episode value, the build shown in the drawer)	`/dashboard/referrals`	per-practice query (`src/lib/queries.ts:referralLeakage`); `status='lost_to_competitor'` / total
3	Dr. Mendez DACVIM at ~75% utilization, Dr. Park DACVIM at ~50% (same specialty, same hospital), against an 11-slot/workday model derived from appointment length, not back-solved	`/dashboard/clinicians`	utilization formula in `src/lib/queries.ts:clinicianUtilization`
4	February 2026 urgent-care downcoding — ~28% of urgent-care exams billed under the brief code UC-EXAM-BRIEF ($81) when the visit complexity matched the complex code UC-EXAM-COMPLEX ($202). The same revenue-integrity gap as a missed charge, a wrong code instead of a missing line, not a demand problem	`/dashboard/procedures`	calendar-month mix query + the brief-vs-complex February split query (drawer query on the page)
5	Cleveland ER Fri 21:00–01:00 walkout rate ~16% vs ~6.7% the rest of the week (~2.3×); the weeks with no DVM on shift line up with the spikes	`/dashboard/er-patterns`	walkout heatmap; cells Fri ≥21:00 and Sat ≤01:00 are red
6	Akron integration on 2026-03-15 with one 8-minute full-historical sync_runs row	`/dashboard/audit`	`SELECT * FROM sync_runs WHERE connector_name='avimark' AND duration_ms > 60000`
7	Cleveland ER rDVM report-back: only ~50% of completed referrals get a report sent within 14 days (the other three hospitals in the low-to-mid 90s). The 2025 AAHA Referral Guidelines establish the discharge summary as the expectation; the compliance rates here are constructs	`/dashboard/rdvm-health`	`src/lib/queries.ts:rdvmRelationshipHealth`; report-sent within 14d / completed referrals
8	YoY revenue +8.0% looks healthy; same-store (ex-Akron) is +1.2%, masking flat organic. Current-period revenue is real query output; the prior-year baseline is a synthetic calibration constant (see `networkYoY`)	`/dashboard` callout	`SELECT SUM(revenue_cents) FROM visits GROUP BY hospital_id` for the current period
9	Cleveland imaging share of procedure revenue trends up across the window (~13% in the first full month to ~22% in the last) as the CT lease comes online	`/dashboard/procedures`	category share by calendar month for Cleveland
10	Pittsburgh Tuesday avg wait ~61 min vs ~21 min other days; concentrated 11:00–12:00	`/dashboard/capacity`	per-DOW wait table (`capacityByDow`)
11	200 cross-network owners with patients at ≥3 Riverbend hospitals (a decaying funnel from those at ≥2 sites down to the few at all 4)	`/dashboard/cross-network`	`SELECT owner_external_id, COUNT(DISTINCT primary_hospital_id) … HAVING ≥3`

Determinism

The generator is seeded with 0xCAFEBABE at the top of scripts/generate-sample-data.ts. Regenerating from a clean checkout produces the same Riverbend dataset row-for-row, including the per-line price dispersion (the spread is drawn deterministically from each line’s hash). To re-roll the network entirely, change the seed and re-run.

What is NOT in the synthetic data

No notes, no DICOM, no actual exam findings — every visit has notes_redacted='[redacted]'. We are demonstrating the shape of operator data, not the contents of patient records.
No payer/insurance info. Riverbend bills owners directly; insurance flows are out of scope for this demo.
No staff scheduling. The DOW × hour × visit attribution is implied from check-in times.

How the missed-charge number is built (so it survives an audit)

The recoverable dollar is a real subtraction over two real columns: expected_revenue_cents (the full charted line value of the visit) minus revenue_cents (what the invoice actually captured), summed per visit so an over-collected visit can never net against an under-collected one. There is no undisclosed multiplier — the one business constant, the 0.85 realization factor (the fraction of charted-but-unbilled dollars that would realistically have collected), is surfaced inline in the SQL drawer the way networkYoY discloses its prior-year ratio. On a real engagement the realization factor is derived from the operator’s own write-off/decline rate, never an industry assumption.

Honest disclosures for buyers

Riverbend’s $-figures should not be benchmarked against your own ops. They are calibrated for demo legibility, not for clinical or operational comparability.
The anomalies are seeded, meaning the dashboards will find them by construction. The same query patterns applied to your own data have to find anomalies in your data. Whether they do is the question the paid diagnostic answers, on your real consented extract, at your own prices.
Central Uplift measures and recommends. It contacts no pet owner and no referring vet, and sends nothing; the operator acts on the findings inside its own PIMS.
The Akron integration moment is intentionally clean (8 min, no errors). Real-world first-time PMS ingestion typically involves at least one schema delta requiring connector follow-up. We are transparent about this in the architecture deep-dive (see docs/cto-deep-dive.md).