Skip to main content

Atlas — per-cohort ThermoCas9 target shortlists

Per-cohort top-100 candidate PAM sites for ThermoCas9 across four public methylation cohorts, scored under the compositional skeleton p_targ × (gap factor) × p_trust. Headline figure visualizes per-positive whole-genome rank percentiles — including an honest disclosure of the GSE69914 ESR1 reversal under V2.5-sigmoid that the methods paper documents in §6.1.

Source paper: paper-5-10j Atlas tag: memo-2026-04-22-bw Cohorts: 4 (n_pos = 3 Roth Fig. 5d targets) Status: Hypothesis generation only
Scope
Every row on this page is a candidate, not a target. The scoring framework is open educational research. It has been benchmarked retrospectively against four public methylation cohorts, but it has not been validated prospectively against any wet-lab editing readout. Do not select a row from any of the top-100 tables for therapeutic application without running the full freeze-and-validate pipeline described in the external-validation instruction.
New · 2026-05-02 · v0.5 Posture B

v0.5 Posture B coordinated release set — four cancers, mixed-platform

v0.5 is the first coordinated release set of the ThermoCas9 Candidate Opportunity Atlas: BRCA on EPIC-v2 / hg38 (cell-line cohort) plus TCGA-COAD / TCGA-LUAD / TCGA-LIHC on HM450 / hg19 (patient cohorts). Each release is hash-pinned via SHA-256 in atlas_manifest.json and tagged immutably in the atlas repository. Hypothesis generation only — no prospective wet-lab validation. The HM450 / hg19 catalog is 19,787,820 sites; the EPIC-v2 / hg38 catalog is 35,406,213 sites. Per-cancer detail pages are under /atlas/brca/, /atlas/coad/, /atlas/luad/, /atlas/lihc/.

Per-release download tiles

Each tile links to the immutable v0.5 GitHub release tag on sigaihealth/atlas. Reviewer-access caveat: the atlas repository is currently restricted to organization members; unauthenticated requests return HTTP 404. Public-read access is a held GigaScience submission gate. The release URLs below resolve once the repo flips to public.

BRCA · EPIC-v2 / hg38
Source: GSE322563 MCF-7 vs MCF-10A cell-line cohort (n=2 / n=2)
Tier B candidates: 0 — the cell-line cohort sits below the framework p_trust_ramp_n=30 floor; this is a documented release shape, not a bug.
manifest sha256: 94cc08baa800…
atlas-brca-v0.5.0-wg-epic-v2-hg38 ↗
TCGA-COAD · HM450 / hg19
Source: TCGA-COAD patient tissue (n=312 tumor / n=38 normal)
Tier B candidates: 63,853
manifest sha256: 89d592be4104…
atlas-tcga-coad-v0.5.0-wg-sigmoid ↗
TCGA-LUAD · HM450 / hg19
Source: TCGA-LUAD patient tissue (n=473 tumor / n=32 normal)
Tier B candidates: 21,318
manifest sha256: a277af86a5b9…
atlas-tcga-luad-v0.5.0-wg-sigmoid ↗
TCGA-LIHC · HM450 / hg19
Source: TCGA-LIHC patient tissue (n=377 tumor / n=50 normal)
Tier B candidates: 149,099
manifest sha256: 2ae90a7a48f3…
atlas-tcga-lihc-v0.5.0-wg-sigmoid ↗

panel_layer normal-tissue coverage

v0.5 introduces a first-class panel_layer manifest block that records each release's normal-tissue panel coverage against the underlying probe catalog. The three TCGA HM450 / hg19 releases share a 10-tissue, 687-sample broad-normal envelope.

COAD
83.68%
16,558,370 / 19,787,820 HM450 catalog sites covered
LUAD
83.63%
16,549,327 / 19,787,820 HM450 catalog sites covered
LIHC
83.77%
16,576,426 / 19,787,820 HM450 catalog sites covered
BRCA
unavailable_controlled_access_pending: no public EPIC-v2 / hg38 normal breast cohort surfaces in any GPL33022 series. Recorded as a release limitation, not substituted by a non-breast panel.

Cross-cancer shared Tier B loci (3-way HM450 intersection)

Across the three HM450 / hg19 patient-cohort releases (COAD, LUAD, LIHC), the Tier B candidate sets share a 3-way intersection of 6,441 candidates / 1,378 probe loci / 1,635 nearest-gene symbols. Same loci, different signal strength across cancers. BRCA EPIC-v2 / hg38 is omitted from this intersection by design: cross-platform candidate-id overlap is incoherent (different probe space + different reference assembly).

Headline figure — per-positive whole-genome rank percentiles

Per-positive whole-genome rank percentiles for the three Roth Fig. 5d positives across four cohort paths, comparing V2.5-diff (open circles) and V2.5-sigmoid (filled circles).
Figure 4. For each of the three Roth Fig. 5d positives (ESR1, EGFLAM, GATA3), the dot-plot shows the rank percentile of that positive's PAM site within the cohort's whole-genome candidate universe (HM450 ≈ 19.8 M loci; EPIC v2 ≈ 35.4 M). Filled circles are V2.5-sigmoid; open circles are V2.5-diff; the line connects the two so the rank-lift direction is immediately visible. Underlying data: per_positive_wg_percentile.json · SVG.

What to read off the plot.

Per-cohort top-100 explorer

Pick a cohort, filter by gene symbol or PAM family. Rows highlighted in the accent color overlap a Roth Fig. 5d positive at the wide (±500 bp) tier. The full per-row schema (30+ columns including RepeatMasker overlap and CGI distance) lives in the per-cohort TSV linked below; this table shows the slim-schema columns that fit on a screen.

Loading atlas data…

Summary by cohort

GSE322563 HM450
Source: Roth MCF-7 / MCF-10A Path: HM450-intersect n_t / n_n: 2 / 2
GSE322563 native EPIC v2
Source: Roth MCF-7 / MCF-10A Path: Native EPIC v2 n_t / n_n: 2 / 2
GSE77348
Source: δ-development cohort Path: HM450 n_t / n_n: 3 / 3
GSE69914 tissue
Source: Primary tumor + healthy donor Path: HM450 n_t / n_n: 305 / 50

How rows on this page map to rows in the paper

What is not in this atlas (intentionally)

Cite or reproduce