Skip to main content

Methods — what p_targ × gap × p_trust actually does

Plain-language walkthrough of the thermocas scoring framework. The compositional skeleton, why tie-band-aware benchmarking is the durable contribution, and how to read the benchmark numbers honestly.

Paper tag: paper-5-10j Manuscript tag: memo-2026-04-22-bw Tests: 245 passing License: BSD-3
Preprint · open educational research

Compositional probability-scale scoring and tie-band-aware benchmarking for methylome-guided ThermoCas9 target-site ranking

Author
Allison Huang, Columbia University · allisonhmercer@gmail.com
Date
2026-04-22
PAPER.pdf
long-form technical memo (audit-trail revision; cite paper-5-10j)
MANUSCRIPT.pdf
Bioinformatics-shaped short version (cite memo-2026-04-22-bw)
Repository
github.com/AllisonH12/thermocas9 · BSD-3 · Python 3.11+
bioRxiv DOI
to be added on posting
Zenodo DOI
to be minted on first GitHub release (citable archive of the tagged source)
Status — read this before reading anything else
This is a methods and benchmarking paper on public methylation data. It does not include prospective wet-lab editing validation. Per-site p-values are not reported because p_observation_trustworthy saturates by EvidenceClass, not continuously. Read the score as a probability-scale ranking axis, not as a calibrated hypothesis test. The framework is open educational research — not a clinical decision-support system.

The biology, in one paragraph

Roth et al. (Nature 2026) showed that ThermoCas9 — a thermophilic Cas9 from Geobacillus thermodenitrificans T12 — refuses to cleave when the fifth cytosine of its PAM is methylated. An unmethylated PAM cleaves as usual; a 5-methylcytosine at that position abolishes binding. The therapeutic implication is that loci which are unmethylated in tumor cells but methylated in matched normal cells become selectively editable — the editor only cuts the disease side. Roth demonstrated this with three breast-cancer targets in MCF-7 vs MCF-10A.

Turning that mechanism into a genome-scale shortlist is a ranking problem. Which loci, across millions of candidates, look like the best bets?

Why a generic guide-scorer does not work

Off-the-shelf CRISPR guide-scoring tools (Azimuth, DeepCRISPR, CRISPRitz) score sequence properties and off-target risk. They have no notion of methylation state at the PAM cytosine, because they were not designed for a methylation-sensitive Cas9 variant. Methylation-array differential-analysis tools (methylKit, minfi, limma) produce per-probe statistics but not ranked target lists; their output feeds a scorer, it does not replace one.

Three problems, specifically, made it worth building a new scorer:

  1. Replicate counts are tiny. Methylation cohorts often ship with n = 2–3 samples per side. Any scorer must carry its own uncertainty rather than emit a scalar that hides it.
  2. The "normal" arm is inconsistent. Adjacent-normal tissue in a bulk methylation study is not the same object as a matched cell-line pair. A scorer that bakes in a fixed β_normal threshold will fail across cohort types.
  3. Top-K can sit inside a tied band. On low-replicate cohorts the top of any composite can collapse into hundreds or thousands of equally-scored candidates. Reporting a single Precision@K without that interval misleads the reader.

The compositional skeleton

thermocas answers all three with the same multiplicative form:

p_therapeutic_selectivity = p_targ × (gap factor) × p_trust

Each factor maps to one independently-replaceable question.

p_targis the tumor side actually targetable? This is the probability that the tumor-arm methylation at the PAM cytosine is low enough to permit cleavage. It is computed by integrating a method-of-moments Beta posterior on the per-probe summary β, with a piecewise-linear fallback when the moment match degenerates. A site cannot be a candidate if the editor cannot bind to it.

Gap factor — is the tumor-vs-normal contrast meaningful? Two instances ship in the current tag:

The point is not which gap factor "wins" — the point is that the slot is replaceable. A future axis (an exact difference-of-Betas, an SE-on-mean variant, a mixture-aware version) can be plugged in without rewriting the scorer.

p_trustis the per-probe observation trustworthy? Discrete EvidenceClass (EXACT, PROXIMAL_CLOSE, PROXIMAL, REGIONAL, UNOBSERVED) times a sample-count ramp min(1, min(n_t, n_n) / 30). A site whose methylation is nominated by a regional proxy on a 2-sample cohort cannot dominate the top-K just because its p_targ × gap happens to peak.

The multiplicative form is a gating-style ranking heuristic: a candidate is penalized if any required component is weak. An additive score over the same signals would let a strong component compensate for a failed gate, which is less aligned with a selectivity screen. The product is not a calibrated joint probability — p_targ, the gap factor, and p_trust are correlated on real catalogs (EXACT-class records cluster at extreme β values that also drive p_targ). Read the score as a probability-scale ranking axis throughout, not as "0.9 means 90% of 0.9-scored sites edit successfully."

Tie-band-aware benchmarking — the actual durable contribution

This is the part of the paper most likely to outlive any specific gap-factor choice.

On a n = 2/2 matched cell-line cohort with a probabilistic composite, the score distribution at K = 100 routinely sits inside a tied band. V2.5-diff at the whole-genome scale produces tied bands of 421 to 1,493 records at K = 100 across the four evaluated cohort paths — a single Precision@100 number for that score is meaningless without an interval.

Every BenchmarkResult row that thermocas produces carries:

Switching from V2.5-diff to V2.5-sigmoid does not change matched-cell-line AUC by more than 0.002, but it collapses every WG tie_band@100 to 1. That is a strict improvement in top-K usability on the same biological signal — and it is only visible because the benchmark contract reports tie bands explicitly.

What the benchmark says

Four public methylation cohorts, scored on three positives tiers (validated from the Roth Fig. 5d targets; narrow ±50 bp; wide ±500 bp).

The atlas page has a per-positive whole-genome rank dot-plot that visualizes this heterogeneity directly — including the GSE69914 ESR1 reversal under V2.5-sigmoid.

Honest scope

A few things the paper is explicit about:

How to cite

Until the bioRxiv DOI is live, cite the immutable git tag:

Huang, A. (2026). Compositional probability-scale scoring and tie-band-aware benchmarking for methylome-guided ThermoCas9 target-site ranking. Technical memo, version paper-5-10j. https://github.com/AllisonH12/thermocas9/tree/paper-5-10j

For the Bioinformatics-shaped short version, cite memo-2026-04-22-bw instead.

Where to go next