From 9ef0d71698c2e1c1d698ad6dbfd81e2c2642c25d Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 17 Nov 2025 11:27:58 +0000 Subject: [PATCH 1/2] Initial plan From 55cbf984415dc2208e948a71f279ba72e1f0b1c6 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 17 Nov 2025 11:34:21 +0000 Subject: [PATCH 2/2] Add comprehensive documentation on failure message construction - Explain two-phase matching process for error messages - Detail how rightmost failure position is tracked - Document role of memoization table in storing failures - Provide example illustrating the process - Explain fluffy failures concept Co-authored-by: pdubroy <17293+pdubroy@users.noreply.github.com> --- doc/api-reference.md | 50 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/doc/api-reference.md b/doc/api-reference.md index 0535e210..dfbcbbeb 100644 --- a/doc/api-reference.md +++ b/doc/api-reference.md @@ -150,6 +150,56 @@ Return the index in the input stream at which the match failed. Return an array of Failure objects describing the failures the occurred at the rightmost failure position. +#### How Failure Messages are Constructed + +When a match fails, Ohm constructs an error message by performing a two-phase matching process: + +**Phase 1: Find the rightmost failure position** + +During the initial match attempt, Ohm tracks the _rightmost failure position_ — the position in the input where the parser made the most progress before failing. This position is maintained in `MatchState.rightmostFailurePosition`: + +- Every time a parsing expression fails (e.g., a terminal doesn't match, a range check fails), `processFailure()` is called, which updates `rightmostFailurePosition` to be the maximum of its current value and the failure position. +- When entering and exiting rule applications, the rightmost failure position is managed using a stack (`_rightmostFailurePositionStack`), allowing nested rules to track their own failures independently while propagating the overall rightmost position upward. +- After matching completes (successfully or not), the `MatchResult` object records this rightmost failure position via `getRightmostFailurePosition()`. + +**Phase 2: Collect detailed failures at the rightmost position** + +If the match failed and the user accesses `message`, `shortMessage`, or calls `getRightmostFailures()`, Ohm performs a _second_ match with failure recording enabled: + +1. The matcher is re-run with `positionToRecordFailures` set to the rightmost failure position found in Phase 1. +2. During this second pass, whenever a parsing expression fails at exactly `positionToRecordFailures`, the failure is recorded in `MatchState.recordedFailures` (a dictionary keyed by failure description). +3. These recorded failures are also stored in the memoization table alongside successful match results. + +**The Role of the Memoization Table** + +Ohm uses a memoization table (also called a "memo table" or "packrat parsing table") to avoid re-parsing the same input at the same position. Each entry in the table is indexed by position and rule, and stores: + +- The match result (success/failure) +- The matched length +- The examined length (how far ahead the parser looked) +- **`rightmostFailureOffset`**: The offset from the current position to the rightmost failure that occurred during this rule application +- **`failuresAtRightmostPosition`**: A snapshot of all the specific failures that occurred at the rightmost failure position during this rule application + +**Why store failures in the memo table?** + +When a memoized result is reused (in `useMemoizedResult()`), Ohm needs to: + +1. Update the current `rightmostFailurePosition` based on the memoized `rightmostFailureOffset` +2. If recording failures and the memoized rightmost position matches `positionToRecordFailures`, merge the memoized failures into the current `recordedFailures` + +This is critical for correctness in the second phase: without storing failures in the memo table, the second match pass would miss failures that occurred in memoized rule applications, leading to incomplete error messages. + +**Example:** + +Consider the grammar `G { start = "a" "b" | "a" "c" }` with input `"ad"`: + +- Phase 1: Parser tries `"a" "b"`, succeeds on `"a"`, fails on `"b"` at position 1. Then tries `"a" "c"`, succeeds on `"a"` (reused from memo), fails on `"c"` at position 1. Rightmost failure position is 1. +- Phase 2: Re-run with `positionToRecordFailures = 1`. Both `"b"` and `"c"` fail at position 1, so both are recorded. The error message will say: `Expected "b" or "c"`. + +**Fluffy Failures:** + +Some failures are marked as "fluffy" (e.g., implicit whitespace from syntactic rules). These are filtered out when constructing the final error message to make it more useful for end users. +