Methodology
This page explains how we test theories, score results, and determine whether a cipher approach has been eliminated.
The K4 Ciphertext
K4 consists of 97 characters, and all 26 letters of the English alphabet appear at least once. The ciphertext is shown below with known plaintext positions highlighted:
Known plaintext (positions numbered from 0, confirmed by Jim Sanborn):
- Positions 21–33:
EASTNORTHEAST(13 characters) - Positions 63–73:
BERLINCLOCK(11 characters)
These 24 known characters are the foundation of our scoring system. They also reveal the internal key values the cipher uses at those positions, which constrain what methods and keys are possible.
Scoring: Crib Matching (0–24)
For any proposed decryption method and key, we check how many of the 24 known plaintext positions produce the correct letter. This gives a score from 0 to 24.
| Score | Classification | Meaning |
|---|---|---|
| 0–9 | NOISE | Expected performance from a random key/method. Not recorded. |
| 10–17 | INTERESTING | Above noise floor. Recorded for completeness, but almost certainly coincidental. |
| 18–23 | SIGNAL | Statistically unusual. Warrants investigation, but may be a false positive at high periods. |
| 24 | FULL MATCH | All 24 cribs match. Requires Bean constraint validation, quadgram analysis, and human review before any claim is made. |
These thresholds are conservative. With a random key and a 26-letter alphabet, you’d expect fewer than 1 letter out of 24 to match by pure luck. A score of 6 is already far beyond chance. The SIGNAL threshold at 18 is set high because longer keys can produce misleadingly high scores (see below).
Expected Random Scores by Period
A critical subtlety: how well a random key scores depends on the key’s length. Longer keys have fewer checks per key position, so random matches become more common, inflating scores artificially.
| Period | Approx. Expected Random Score | Discriminative? |
|---|---|---|
| 2–7 | ∼8 / 24 | Yes, meaningful range |
| 8 | ∼10 / 24 | Marginally |
| 13 | ∼14 / 24 | Weak |
| 17 | ∼17 / 24 | No (random matches 17+) |
| 24 | ∼19 / 24 | No (random matches 19+) |
| 26 | ∼20 / 24 | No |
These values are approximate, based on how many known-letter positions share each key position at a given key length. The key insight: at key length 17 or above, random keys routinely score above our SIGNAL threshold, making those scores meaningless. Only scores at key length 7 or below are reliable.
Bean Constraints
In 2021, Richard Bean published additional constraints on the K4 keystream, derived from the repeated letter P at positions 27 and 65 of the ciphertext (both of which decrypt to R). This gives:
- Equality: The key values at positions 27 and 65 must be equal
- 242 inequalities: Hundreds of pairs of key positions that must differ, regardless of which cipher variant is used
These constraints hold no matter which cipher variant is used (Vigenère, Beaufort, or Variant Beaufort). Any valid solution must satisfy all of them, provided the cipher uses an additive key model (CT[i] = f(PT[i], K[i]) mod 26). If K4 uses a non-additive cipher (lookup table, physical overlay, or grid-based system), Bean constraints do not apply.
Combined with a counting argument about repeated key values, these constraints show that no repeating key of any length can produce the K4 plaintext under direct positional correspondence (CT[i] → PT[i]) and additive-key assumptions. This is a deterministic proof (Level A), reproducible from the codebase. It does not apply if a transposition layer reorders positions before substitution, or if the cipher is non-additive.
Two-System Model
Jim Sanborn stated at the Kryptos dedication that “there are two systems of enciphering the bottom text.” Scheidt stated that he “masked the English language so it’s more of a challenge” and that solvers need to “solve the technique first and then go for the puzzle.” The “two systems” quote is public evidence. Any specific interpretation of that quote is a hypothesis, not a fact.
Pure transposition is independently impossible: the ciphertext contains only 2 E’s, but the known cribs require 3. Therefore at least one layer must be substitution. Beyond that, the architecture remains open. Current live project surfaces include:
- Layered classical models: heavily tested in bounded structured families, but not globally exhausted.
- Procedural or physical constructions: still live, but only as explicit, finite, testable procedures.
- W-delimiter segmentation: the five carved W positions explain the old width-21 anomaly. As of April 2026 the single-layer construction is saturated (80+ tested, no signal) and is now admissible only as a multi-layer component; the interpretation is otherwise still open.
- Null extraction: still possible in principle, but the old score-conditioned null-palette result is retired and cannot be cited as evidence.
The April 2026 audit explicitly rejected treating any single two-system story as the project’s established model. The correct public stance is: K4 likely involves a technique beyond straightforward single-layer classical encryption, but the exact composition is unknown.
What Has Been Exhaustively Tested
Over 502 experiments spanning 671.0B+ individual hypothesis evaluations have been run (with overlaps across experiments), eliminating:
- All repeating-key ciphers (every variant, both alphabets, all key lengths): proven impossible under direct positional correspondence and additive-key assumptions (Level A)
- All self-keying ciphers (every variant, all primers): proven structurally impossible under additive-key assumptions (Level A)
- Many structured substitution + rearrangement combinations (~1.2 billion evaluations): 14 rearrangement families under the registered bounded search programs
- All Cold War-era cipher models (VIC family): extensively tested across multiple variants
- All letter-pair ciphers (Four-Square, Playfair, Two-Square): apparent high scores are overfitting artifacts
- Every specialized cipher we could find: Gromark (8.74 billion keys), plus Feistel, Gronsfeld, Porta, and more
- Running key from 60,000+ public texts (106 billion position-checks): zero signal. A narrower follow-up (April 2026) found that running-key × columnar widths 6/8/9 is blocked by the current 242-inequality Bean constraint set regardless of source text; other transposition families and non-English sources are not covered by that result.
- Two-layer compositions tested (105,692 branches): additive × transposition, transposition × periodic substitution, 6 stateful families — zero Bean passes, max crib score 6/24 within the registered layer families and default keyword sets
- Three-layer non-columnar compositions tested (838,350 branches, April 2026): {additive, Vig, Beau} × {myszkowski, rail fence, route, block transposition} × {additive, Vig, Beau} — max crib score 7/24. This covers the enumerated layer registry with default parameter generators; non-registered outer families (e.g., homophonic, bifid, four-square as composition outer) are not included.
- All rearrangements of the 73-character text: 4.5 million rearrangements tested with each cipher variant
- Sculpture reading paths as keys: 10,777 paths tested, all noise
- Grid-position-based keys: key derived from position on the grid, zero signal
- Morse code hypotheses: multiple Morse-based encoding schemes, all noise
73-Character Hypothesis
The carved text has 97 characters. One working model proposes that 24 characters are nulls (97 − 24 = 73 real ciphertext characters). This is a hypothesis, not a proven fact.
The strongest current public support is not the old null-palette result; that evidence
was retired in April 2026. The cleaner live structural observation is that the five
carved Ws create a bounded segmentation hypothesis. Even that does not prove
nulls. The 73-character idea remains open, but it currently lacks independent,
model-neutral quantitative support.
Note: The number 24 also appears in other K4 contexts (24 crib characters, Berlin Clock has 24 facets, K3 chart has 24 rows). These coincidences are not evidence; many small integers recur naturally. They are documented here only because they are frequently mentioned in community discussions.
The Null Palette (Retired April 2026)
A score-conditioned null-palette result was once treated as a key observation. It is retained in the repo only as a cautionary historical case.
Matched controls disproved the palette’s specificity. The apparent convergence advantage was generic to restrictive palette-constrained search and did not justify treating that letter set as real evidence about K4.
Palette constraints remain useful as a computational technique in some search programs, but the site no longer treats any retired palette identity as a cryptographic clue.
What Remains
Null mask + periodic substitution is proven impossible for ANY choice of 24 null positions at ANY period (algebraic proof extended to CT73 via Bean inequalities across all 11,440 candidate masks). If the 73-character model is correct, the cipher operating on the extracted characters must use a non-periodic key: a running key from an unknown source, a bespoke procedural method, or a one-time key.
On 2026-04-08 we ran an adversarial internal audit of the scope of our own testing and reclassified the frontier into “testable now” (bounded, reproducible campaigns we can run), “weakly testable” (requires better detection apparatus, not more compute), and “untestable under current clues” (requires new primary-source evidence). We are aware that classical cipher space is infinite and cannot be literally exhausted; this classification describes the scope of what we have tested under our specific assumptions, not a claim about K4 as a mathematical object. Full audit and record are in the internal status audit.
If you have an idea we have not tested, the Submit a Theory page is the direct path. We want to be wrong about anything we have classified as ruled out.
Validation Criteria
A candidate solution is not accepted unless it passes all of the following simultaneously:
- Crib score: 24/24
- Bean constraints: PASS
- Text quality: letter patterns must match normal English (measured by how common its 4-letter sequences are)
- Letter frequency: must match the statistical profile of English text
- Readability: must produce meaningful English with recognizable words (human review required)
How to Read Claims on This Site
We classify every claim by its evidence strength. When reading results on this site:
- Level A: Proven within stated assumptions
- Mathematical proof or complete enumeration. Always conditioned on explicit assumptions (e.g., correct cribs, additive key model). If you disagree with the assumptions, the proof does not apply.
- Level B: Exhaustively negative within tested scope
- Every configuration in a defined parameter space was tested and produced noise. Does not extend to untested variants or multi-layer combinations. “Ruled out within tested scope” means exactly that.
- Level C: Descriptive anomaly
- A pattern discovered post-hoc (from the data, not predicted in advance). P-values are uncorrected for search breadth unless stated otherwise. Does not prove how K4 was encrypted.
- Level D: Hypothesis
- A plausible conjecture without quantitative support. Labeled “hypothesis” or “open question.”
All p-values on this site are uncorrected for the project’s full search breadth (~1000 experiments) unless explicitly stated. Over that many tests, individually “significant” results are expected by chance. We document them for transparency, not as proof.
Truth Taxonomy
Every claim in our database is classified by its evidence level:
- [PUBLIC FACT]
- Verified by reputable public reporting or primary-source statements.
- [DERIVED FACT]
- Deterministic consequence of public facts, reproducible by a provided command.
- [INTERNAL RESULT]
- Empirical result from this project. Includes artifact pointers and a reproduction command.
- [HYPOTHESIS]
- Plausible claim not yet proven. Includes a test plan.
Confidence Tiers
- Tier 1: Mathematical Proof
- Algebraic proof that the method cannot produce the known plaintext. Permanently valid unless crib positions or ciphertext transcription are wrong.
- Tier 2: Exhaustive Search
- Every possible configuration was tested under stated assumptions. Solid for the specific model tested, but does not eliminate multi-layer variants.
- Tier 3: Partial/Statistical
- Sampling-based or incomplete coverage. May warrant re-testing under different assumptions.
- Tier 4: Untested
- Never properly tested. Fully open.
Reproducibility
Every elimination includes a reproduction command
you can run yourself. The entire codebase is
open source. Clone the repo,
install Python 3.11+, and run any experiment with PYTHONPATH=src.