purescript-lua · Unisay · Jun 21, 2026 · Jun 21, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -267,6 +267,13 @@ Golden tests are the primary integration testing mechanism:
    - `golden.lua` - Generated Lua code
    - `eval/golden.txt` - Execution output (if module has a `main` function)
 
+**The golden harness re-implements the IR pipeline.** `compileCorefn` in
+`test/Language/PureScript/Backend/Lua/Golden/Spec.hs` calls
+`makeUberModule >>> optimizedUberModule` directly — it does *not* go through
+`Backend.compileModules`. Any new IR pipeline pass must live inside
+`optimizedUberModule` (the shared function), or the golden tests will silently
+bypass it.
+
 To add a new golden test:
 1. Create `test/ps/golden/Golden/NewTest/Test.purs`
 2. Run `cabal test` - it will fail and create `actual.*` files

diff --git a/docs/GOLDEN_TESTING.md b/docs/GOLDEN_TESTING.md
@@ -0,0 +1,137 @@
+# Golden testing in pslua
+
+Golden tests are the primary **integration** test for the compiler: they take a
+real PureScript module, run it through the whole pipeline, and pin the results
+(the IR, the generated Lua, and — when the module is runnable — its actual
+output) against checked-in expected files. A change anywhere in the pipeline
+shows up as a diff in these files.
+
+- Harness: `test/Language/PureScript/Backend/Lua/Golden/Spec.hs`
+- Golden primitive (vendored, customised): `test/Test/Hspec/Golden.hs`
+- Test modules: `test/ps/golden/Golden/<Name>/Test.purs`
+- Generated artifacts: `test/ps/output/Golden.<Name>.Test/`
+- Reset script: `scripts/golden_reset`
+
+## The artifacts
+
+Each test module `Golden.<Name>.Test` maps to a directory
+`test/ps/output/Golden.<Name>.Test/`:
+
+| File | Committed? | What it is |
+|---|---|---|
+| `corefn.json` | yes | CoreFn emitted by `purs` (`spago build -g corefn`). Its `builtWith` stamp tracks the `purs` version. |
+| `golden.ir` | yes | Pretty-printed IR `UberModule` — **structural**, a pure function of the code. |
+| `golden.lua` | yes | Generated Lua — **structural**. |
+| `eval/golden.txt` | yes | Program stdout, normalised. The **semantic oracle** — hand-verified. Present only for runnable modules. |
+| `actual.ir` / `actual.lua` / `eval/actual.txt` | no (git-ignored) | What the current run produced; written on every run for diffing. |
+| `externs.cbor` | no (git-ignored) | `purs` build artifact. |
+
+A module is treated as an **application** (entry `main` is called) when
+`eval/golden.txt` exists, and as a **module** otherwise. That is the only thing
+the eval file's presence controls, beyond enabling the eval check.
+
+## What a run does
+
+`cabal test spec` (matched by `-m Goldens`) runs four groups, in order:
+
+1. **compile** (`beforeAll_ compilePs`) — `spago build -u '-g corefn'` in
+   `test/ps`, producing `corefn.json` for every module.
+2. **`compiles corefn files to lua`** — for each module:
+   - `compileCorefn` reads CoreFn → builds the IR `UberModule` → compares the
+     pretty-printed IR against `golden.ir`.
+   - `compileIr` lowers IR → Lua → compares against `golden.lua`.
+3. **`golden files should evaluate`** — for each `eval/golden.txt`: runs
+   `lua <golden.lua>`, asserts exit 0, normalises stdout (strip/trim/drop blank
+   lines), and compares against `eval/golden.txt`.
+4. **`golden files should typecheck`** — runs `luacheck --quiet --std min` over
+   the generated `golden.lua` files.
+
+> **Important — the harness re-implements the pipeline.** `compileCorefn`
+> (Spec.hs) calls `Linker.makeUberModule >>> optimizedUberModule` and
+> `compileIr` calls `Lua.fromUberModule >>> optimizeChunk` **directly** — they
+> do *not* go through `Backend.compileModules`. So any new IR pipeline pass must
+> live inside `optimizedUberModule` (the shared function), or the golden tests
+> will silently bypass it — a pass wired only into `Backend.compileModules`
+> would run in a real build but stay invisible to these tests.
+
+## The eval golden is the semantic oracle
+
+`golden.ir` and `golden.lua` are *derived*: regenerating them from the current
+compiler is always "correct" by construction. `eval/golden.txt` is different —
+it is the **hand-verified expected behaviour**. A passing eval test is the only
+thing that proves a codegen change preserved semantics; a luacheck pass alone
+does not (it only checks the Lua parses/lints).
+
+Therefore: **never regenerate `eval/golden.txt` from the compiler's current
+output unless the program's behaviour legitimately changed and you have
+re-verified it by hand.** Doing so silently bakes a regression in as the new
+"expected" value.
+
+## Updating goldens
+
+### After a codegen/optimizer change (output moved, behaviour unchanged)
+
+Accept the **structural** goldens in place; the eval oracle is left untouched:
+
+```bash
+PSLUA_GOLDEN_ACCEPT=1 cabal test spec
+```
+
+`PSLUA_GOLDEN_ACCEPT` rewrites a mismatching golden with the actual output and
+passes — but **only** for goldens marked `acceptable` (`golden.ir` /
+`golden.lua`, built with `acceptableGolden`). `eval/golden.txt` is built with
+`defaultGolden` (`acceptable = False`) and is **never** auto-accepted, so it
+keeps failing until you fix the regression or update it deliberately. Review the
+resulting `git diff` before committing.
+
+This replaces deleting goldens by hand. The equivalent manual workflow:
+
+```bash
+find test/ps/output -name golden.ir  -delete
+find test/ps/output -name golden.lua -delete
+cabal test spec            # recreates ir/lua; eval runs against the preserved oracle
+```
+
+### When the program output itself legitimately changed
+
+Re-verify the new behaviour by hand, then update `eval/golden.txt` for the
+affected module(s) explicitly (edit it, or delete just that file and let the run
+recreate it after you've confirmed the program is correct).
+
+### `scripts/golden_reset` — use with care
+
+`golden_reset` deletes **every** file named `golden.*` in `test/ps/output`
+(including `eval/golden.txt`) and reruns the suite to recreate them. Because it
+nukes the oracle, only use it when the expected program output itself changed
+across the board and you have re-verified it. For ordinary codegen churn, prefer
+`PSLUA_GOLDEN_ACCEPT=1` above.
+
+## Debugging a mismatch
+
+- By default a mismatch prints a **bounded** summary: the first differing line
+  plus a small window of each side, the line counts, and the path to the full
+  `actual.*` file. This keeps a run with many mismatches cheap (it does not hold
+  two full blobs and their diff per failure).
+- For the **full** expected/actual diff: `PSLUA_GOLDEN_FULL_DIFF=1 cabal test spec`.
+- The complete actual output is always on disk next to the golden
+  (`actual.ir` / `actual.lua` / `eval/actual.txt`) for manual diffing.
+
+The test runner's RTS is capped in `pslua.cabal`
+(`-with-rtsopts=-N4 -c -M16g`): bounding the parallel-GC core count and heap
+turns a pathological run (many large mismatches) into a clean heap-overflow
+rather than an OOM that takes the machine down. The bounded summary above is the
+other half of keeping failures affordable.
+
+## Adding a new golden test
+
+1. Create `test/ps/golden/Golden/<Name>/Test.purs` (module
+   `Golden.<Name>.Test`). For a runnable test, give it `main :: Effect Unit`.
+2. `cd test/ps && spago build -u '-g corefn' && cd ../..` to emit `corefn.json`.
+3. For a runnable test, create the oracle by hand:
+   `test/ps/output/Golden.<Name>.Test/eval/golden.txt` with the expected stdout,
+   plus `eval/.gitignore` containing `actual.txt`.
+4. Run `cabal test spec`. `golden.ir` / `golden.lua` are created on first run
+   (they pass — "first time execution"); the eval test compares the program's
+   output against your oracle.
+5. Review the generated `golden.ir` / `golden.lua`, then commit `Test.purs`,
+   `corefn.json`, `golden.ir`, `golden.lua`, `eval/golden.txt`, `eval/.gitignore`.
diff --git a/test/Language/PureScript/Backend/Lua/Golden/Spec.hs b/test/Language/PureScript/Backend/Lua/Golden/Spec.hs
@@ -64,7 +64,7 @@ import Test.Hspec
   , shouldNotBe
   )
 import Test.Hspec.Extra (annotatingWith)
-import Test.Hspec.Golden (defaultGolden)
+import Test.Hspec.Golden (acceptableGolden, defaultGolden)
 import Text.Pretty.Simple
   ( OutputOptions (..)
   , defaultOutputOptionsNoColor
@@ -100,7 +100,7 @@ spec = do
         irTestName ← runIO do
           toFilePath <$> makeRelativeToCurrentDir irGolden
         it irTestName do
-          defaultGolden irGolden (Just irActual) do
+          acceptableGolden irGolden (Just irActual) do
             uberModule ← compileCorefn (Tagged (Rel psOutputPath)) moduleName
             pure . toStrict $
               pShowOpt
@@ -118,7 +118,7 @@ spec = do
         luaTestName ← runIO do
           toFilePath <$> makeRelativeToCurrentDir luaGolden
         it luaTestName do
-          defaultGolden luaGolden (Just luaActual) do
+          acceptableGolden luaGolden (Just luaActual) do
             appOrModule ←
               (doesFileExist evalGolden) <&> \case
                 True → AsApplication moduleName (PS.Ident "main")

diff --git a/test/Test/Hspec/Golden.hs b/test/Test/Hspec/Golden.hs
@@ -5,6 +5,7 @@
 module Test.Hspec.Golden
   ( Golden (..)
   , defaultGolden
+  , acceptableGolden
   )
 where
 
@@ -27,6 +28,20 @@ hold two full pretty-printed blobs — and their diff — per failure. Set this
 fullDiffEnvVar ∷ String
 fullDiffEnvVar = "PSLUA_GOLDEN_FULL_DIFF"
 
+{- | Env var that, when set, makes a mismatching golden be /accepted/: the
+golden file is rewritten in place with the actual output and the test passes,
+à la tasty-golden's @--accept@. This avoids deleting goldens by hand (or with
+@scripts/golden_reset@) when a codegen/optimizer change legitimately moves the
+output.
+
+Acceptance only applies to goldens marked 'acceptable' (the structural
+@golden.ir@ / @golden.lua@). The @eval/golden.txt@ oracle is built with
+'defaultGolden' (@acceptable = False@) and is therefore never auto-accepted —
+its hand-verified program output must change only by deliberate review.
+-}
+acceptEnvVar ∷ String
+acceptEnvVar = "PSLUA_GOLDEN_ACCEPT"
+
 {- | Golden tests parameters
 
  @
@@ -62,10 +77,15 @@ data Golden str = Golden
   , goldenFile ∷ Path Abs File
   -- ^ Where to read/write the golden file for this test.
   , actualFile ∷ Maybe (Path Abs File)
-  -- ^ Where to save the actual file for this test.
-  -- If it is @Nothing@ then no file is written.
+  {- ^ Where to save the actual file for this test.
+  If it is @Nothing@ then no file is written.
+  -}
   , failFirstTime ∷ Bool
   -- ^ Whether to record a failure the first time this test is run
+  , acceptable ∷ Bool
+  {- ^ Whether a mismatch may be accepted (golden rewritten in place) when
+  'acceptEnvVar' is set. Keep 'False' for hand-verified oracles.
+  -}
   }
 
 instance Eq str ⇒ Example (Golden str) where
@@ -86,6 +106,8 @@ fromGoldenResult ∷ GoldenResult → Result
 fromGoldenResult = \case
   SameOutput →
     Result "Golden and Actual output hasn't changed" Success
+  Accepted →
+    Result "Golden file accepted (PSLUA_GOLDEN_ACCEPT)" Success
   FirstExecutionSucceed →
     Result "First time execution. Golden file created." Success
   FirstExecutionFail →
@@ -115,14 +137,30 @@ defaultGolden goldenFile actualFile produceOutput =
     , goldenFile
     , actualFile
     , failFirstTime = False
+    , acceptable = False
     }
 
+{- | Like 'defaultGolden', but the golden may be accepted in place when
+'acceptEnvVar' is set (see there). Use for derived/structural goldens whose
+content is a pure function of the code under test (e.g. generated IR or Lua),
+NOT for hand-verified oracles.
+-}
+acceptableGolden
+  ∷ Path Abs File
+  → Maybe (Path Abs File)
+  → IO Text
+  → Golden Text
+acceptableGolden goldenFile actualFile produceOutput =
+  (defaultGolden goldenFile actualFile produceOutput) {acceptable = True}
+
 -- | Possible results from a golden test execution
 data GoldenResult
   = MissmatchOutput String String
   | -- | A bounded, line-oriented mismatch summary (the default).
     MissmatchSummary String
   | SameOutput
+  | -- | A mismatch that was accepted: the golden file was rewritten in place.
+    Accepted
   | FirstExecutionSucceed
   | FirstExecutionFail
 
@@ -189,16 +227,22 @@ runGolden Golden {..} = do
       if contentGolden == output
         then pure SameOutput
         else do
-          wantFull ← isJust <$> lookupEnv fullDiffEnvVar
-          pure
-            if wantFull
-              then
-                MissmatchOutput
-                  (encodePretty contentGolden)
-                  (encodePretty output)
-              else
-                MissmatchSummary $
-                  boundedSummary
-                    actualFile
-                    (encodePretty contentGolden)
-                    (encodePretty output)
+          accept ← isJust <$> lookupEnv acceptEnvVar
+          if accept && acceptable
+            then do
+              writeToFile goldenFile output
+              pure Accepted
+            else do
+              wantFull ← isJust <$> lookupEnv fullDiffEnvVar
+              pure
+                if wantFull
+                  then
+                    MissmatchOutput
+                      (encodePretty contentGolden)
+                      (encodePretty output)
+                  else
+                    MissmatchSummary $
+                      boundedSummary
+                        actualFile
+                        (encodePretty contentGolden)
+                        (encodePretty output)