Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,13 @@ Golden tests are the primary integration testing mechanism:
- `golden.lua` - Generated Lua code
- `eval/golden.txt` - Execution output (if module has a `main` function)

**The golden harness re-implements the IR pipeline.** `compileCorefn` in
`test/Language/PureScript/Backend/Lua/Golden/Spec.hs` calls
`makeUberModule >>> optimizedUberModule` directly — it does *not* go through
`Backend.compileModules`. Any new IR pipeline pass must live inside
`optimizedUberModule` (the shared function), or the golden tests will silently
bypass it.

To add a new golden test:
1. Create `test/ps/golden/Golden/NewTest/Test.purs`
2. Run `cabal test` - it will fail and create `actual.*` files
Expand Down
137 changes: 137 additions & 0 deletions docs/GOLDEN_TESTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Golden testing in pslua

Golden tests are the primary **integration** test for the compiler: they take a
real PureScript module, run it through the whole pipeline, and pin the results
(the IR, the generated Lua, and — when the module is runnable — its actual
output) against checked-in expected files. A change anywhere in the pipeline
shows up as a diff in these files.

- Harness: `test/Language/PureScript/Backend/Lua/Golden/Spec.hs`
- Golden primitive (vendored, customised): `test/Test/Hspec/Golden.hs`
- Test modules: `test/ps/golden/Golden/<Name>/Test.purs`
- Generated artifacts: `test/ps/output/Golden.<Name>.Test/`
- Reset script: `scripts/golden_reset`

## The artifacts

Each test module `Golden.<Name>.Test` maps to a directory
`test/ps/output/Golden.<Name>.Test/`:

| File | Committed? | What it is |
|---|---|---|
| `corefn.json` | yes | CoreFn emitted by `purs` (`spago build -g corefn`). Its `builtWith` stamp tracks the `purs` version. |
| `golden.ir` | yes | Pretty-printed IR `UberModule` — **structural**, a pure function of the code. |
| `golden.lua` | yes | Generated Lua — **structural**. |
| `eval/golden.txt` | yes | Program stdout, normalised. The **semantic oracle** — hand-verified. Present only for runnable modules. |
| `actual.ir` / `actual.lua` / `eval/actual.txt` | no (git-ignored) | What the current run produced; written on every run for diffing. |
| `externs.cbor` | no (git-ignored) | `purs` build artifact. |

A module is treated as an **application** (entry `main` is called) when
`eval/golden.txt` exists, and as a **module** otherwise. That is the only thing
the eval file's presence controls, beyond enabling the eval check.

## What a run does

`cabal test spec` (matched by `-m Goldens`) runs four groups, in order:

1. **compile** (`beforeAll_ compilePs`) — `spago build -u '-g corefn'` in
`test/ps`, producing `corefn.json` for every module.
2. **`compiles corefn files to lua`** — for each module:
- `compileCorefn` reads CoreFn → builds the IR `UberModule` → compares the
pretty-printed IR against `golden.ir`.
- `compileIr` lowers IR → Lua → compares against `golden.lua`.
3. **`golden files should evaluate`** — for each `eval/golden.txt`: runs
`lua <golden.lua>`, asserts exit 0, normalises stdout (strip/trim/drop blank
lines), and compares against `eval/golden.txt`.
4. **`golden files should typecheck`** — runs `luacheck --quiet --std min` over
the generated `golden.lua` files.

> **Important — the harness re-implements the pipeline.** `compileCorefn`
> (Spec.hs) calls `Linker.makeUberModule >>> optimizedUberModule` and
> `compileIr` calls `Lua.fromUberModule >>> optimizeChunk` **directly** — they
> do *not* go through `Backend.compileModules`. So any new IR pipeline pass must
> live inside `optimizedUberModule` (the shared function), or the golden tests
> will silently bypass it — a pass wired only into `Backend.compileModules`
> would run in a real build but stay invisible to these tests.

## The eval golden is the semantic oracle

`golden.ir` and `golden.lua` are *derived*: regenerating them from the current
compiler is always "correct" by construction. `eval/golden.txt` is different —
it is the **hand-verified expected behaviour**. A passing eval test is the only
thing that proves a codegen change preserved semantics; a luacheck pass alone
does not (it only checks the Lua parses/lints).

Therefore: **never regenerate `eval/golden.txt` from the compiler's current
output unless the program's behaviour legitimately changed and you have
re-verified it by hand.** Doing so silently bakes a regression in as the new
"expected" value.

## Updating goldens

### After a codegen/optimizer change (output moved, behaviour unchanged)

Accept the **structural** goldens in place; the eval oracle is left untouched:

```bash
PSLUA_GOLDEN_ACCEPT=1 cabal test spec
```

`PSLUA_GOLDEN_ACCEPT` rewrites a mismatching golden with the actual output and
passes — but **only** for goldens marked `acceptable` (`golden.ir` /
`golden.lua`, built with `acceptableGolden`). `eval/golden.txt` is built with
`defaultGolden` (`acceptable = False`) and is **never** auto-accepted, so it
keeps failing until you fix the regression or update it deliberately. Review the
resulting `git diff` before committing.

This replaces deleting goldens by hand. The equivalent manual workflow:

```bash
find test/ps/output -name golden.ir -delete
find test/ps/output -name golden.lua -delete
cabal test spec # recreates ir/lua; eval runs against the preserved oracle
```

### When the program output itself legitimately changed

Re-verify the new behaviour by hand, then update `eval/golden.txt` for the
affected module(s) explicitly (edit it, or delete just that file and let the run
recreate it after you've confirmed the program is correct).

### `scripts/golden_reset` — use with care

`golden_reset` deletes **every** file named `golden.*` in `test/ps/output`
(including `eval/golden.txt`) and reruns the suite to recreate them. Because it
nukes the oracle, only use it when the expected program output itself changed
across the board and you have re-verified it. For ordinary codegen churn, prefer
`PSLUA_GOLDEN_ACCEPT=1` above.

## Debugging a mismatch

- By default a mismatch prints a **bounded** summary: the first differing line
plus a small window of each side, the line counts, and the path to the full
`actual.*` file. This keeps a run with many mismatches cheap (it does not hold
two full blobs and their diff per failure).
- For the **full** expected/actual diff: `PSLUA_GOLDEN_FULL_DIFF=1 cabal test spec`.
- The complete actual output is always on disk next to the golden
(`actual.ir` / `actual.lua` / `eval/actual.txt`) for manual diffing.

The test runner's RTS is capped in `pslua.cabal`
(`-with-rtsopts=-N4 -c -M16g`): bounding the parallel-GC core count and heap
turns a pathological run (many large mismatches) into a clean heap-overflow
rather than an OOM that takes the machine down. The bounded summary above is the
other half of keeping failures affordable.

## Adding a new golden test

1. Create `test/ps/golden/Golden/<Name>/Test.purs` (module
`Golden.<Name>.Test`). For a runnable test, give it `main :: Effect Unit`.
2. `cd test/ps && spago build -u '-g corefn' && cd ../..` to emit `corefn.json`.
3. For a runnable test, create the oracle by hand:
`test/ps/output/Golden.<Name>.Test/eval/golden.txt` with the expected stdout,
plus `eval/.gitignore` containing `actual.txt`.
4. Run `cabal test spec`. `golden.ir` / `golden.lua` are created on first run
(they pass — "first time execution"); the eval test compares the program's
output against your oracle.
5. Review the generated `golden.ir` / `golden.lua`, then commit `Test.purs`,
`corefn.json`, `golden.ir`, `golden.lua`, `eval/golden.txt`, `eval/.gitignore`.
6 changes: 3 additions & 3 deletions test/Language/PureScript/Backend/Lua/Golden/Spec.hs
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ import Test.Hspec
, shouldNotBe
)
import Test.Hspec.Extra (annotatingWith)
import Test.Hspec.Golden (defaultGolden)
import Test.Hspec.Golden (acceptableGolden, defaultGolden)
import Text.Pretty.Simple
( OutputOptions (..)
, defaultOutputOptionsNoColor
Expand Down Expand Up @@ -100,7 +100,7 @@ spec = do
irTestName ← runIO do
toFilePath <$> makeRelativeToCurrentDir irGolden
it irTestName do
defaultGolden irGolden (Just irActual) do
acceptableGolden irGolden (Just irActual) do
uberModule ← compileCorefn (Tagged (Rel psOutputPath)) moduleName
pure . toStrict $
pShowOpt
Expand All @@ -118,7 +118,7 @@ spec = do
luaTestName ← runIO do
toFilePath <$> makeRelativeToCurrentDir luaGolden
it luaTestName do
defaultGolden luaGolden (Just luaActual) do
acceptableGolden luaGolden (Just luaActual) do
appOrModule ←
(doesFileExist evalGolden) <&> \case
True → AsApplication moduleName (PS.Ident "main")
Expand Down
74 changes: 59 additions & 15 deletions test/Test/Hspec/Golden.hs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
module Test.Hspec.Golden
( Golden (..)
, defaultGolden
, acceptableGolden
)
where

Expand All @@ -27,6 +28,20 @@ hold two full pretty-printed blobs — and their diff — per failure. Set this
fullDiffEnvVar ∷ String
fullDiffEnvVar = "PSLUA_GOLDEN_FULL_DIFF"

{- | Env var that, when set, makes a mismatching golden be /accepted/: the
golden file is rewritten in place with the actual output and the test passes,
à la tasty-golden's @--accept@. This avoids deleting goldens by hand (or with
@scripts/golden_reset@) when a codegen/optimizer change legitimately moves the
output.

Acceptance only applies to goldens marked 'acceptable' (the structural
@golden.ir@ / @golden.lua@). The @eval/golden.txt@ oracle is built with
'defaultGolden' (@acceptable = False@) and is therefore never auto-accepted —
its hand-verified program output must change only by deliberate review.
-}
acceptEnvVar ∷ String
acceptEnvVar = "PSLUA_GOLDEN_ACCEPT"

{- | Golden tests parameters

@
Expand Down Expand Up @@ -62,10 +77,15 @@ data Golden str = Golden
, goldenFile ∷ Path Abs File
-- ^ Where to read/write the golden file for this test.
, actualFile ∷ Maybe (Path Abs File)
-- ^ Where to save the actual file for this test.
-- If it is @Nothing@ then no file is written.
{- ^ Where to save the actual file for this test.
If it is @Nothing@ then no file is written.
-}
, failFirstTime ∷ Bool
-- ^ Whether to record a failure the first time this test is run
, acceptable ∷ Bool
{- ^ Whether a mismatch may be accepted (golden rewritten in place) when
'acceptEnvVar' is set. Keep 'False' for hand-verified oracles.
-}
}

instance Eq str ⇒ Example (Golden str) where
Expand All @@ -86,6 +106,8 @@ fromGoldenResult ∷ GoldenResult → Result
fromGoldenResult = \case
SameOutput →
Result "Golden and Actual output hasn't changed" Success
Accepted →
Result "Golden file accepted (PSLUA_GOLDEN_ACCEPT)" Success
FirstExecutionSucceed →
Result "First time execution. Golden file created." Success
FirstExecutionFail →
Expand Down Expand Up @@ -115,14 +137,30 @@ defaultGolden goldenFile actualFile produceOutput =
, goldenFile
, actualFile
, failFirstTime = False
, acceptable = False
}

{- | Like 'defaultGolden', but the golden may be accepted in place when
'acceptEnvVar' is set (see there). Use for derived/structural goldens whose
content is a pure function of the code under test (e.g. generated IR or Lua),
NOT for hand-verified oracles.
-}
acceptableGolden
∷ Path Abs File
→ Maybe (Path Abs File)
→ IO Text
→ Golden Text
acceptableGolden goldenFile actualFile produceOutput =
(defaultGolden goldenFile actualFile produceOutput) {acceptable = True}

-- | Possible results from a golden test execution
data GoldenResult
= MissmatchOutput String String
| -- | A bounded, line-oriented mismatch summary (the default).
MissmatchSummary String
| SameOutput
| -- | A mismatch that was accepted: the golden file was rewritten in place.
Accepted
| FirstExecutionSucceed
| FirstExecutionFail

Expand Down Expand Up @@ -189,16 +227,22 @@ runGolden Golden {..} = do
if contentGolden == output
then pure SameOutput
else do
wantFull ← isJust <$> lookupEnv fullDiffEnvVar
pure
if wantFull
then
MissmatchOutput
(encodePretty contentGolden)
(encodePretty output)
else
MissmatchSummary $
boundedSummary
actualFile
(encodePretty contentGolden)
(encodePretty output)
accept ← isJust <$> lookupEnv acceptEnvVar
if accept && acceptable
then do
writeToFile goldenFile output
pure Accepted
else do
wantFull ← isJust <$> lookupEnv fullDiffEnvVar
pure
if wantFull
then
MissmatchOutput
(encodePretty contentGolden)
(encodePretty output)
else
MissmatchSummary $
boundedSummary
actualFile
(encodePretty contentGolden)
(encodePretty output)
Loading