test: accept structural goldens in place with PSLUA_GOLDEN_ACCEPT#106
Merged
Conversation
Adds `acceptableGolden`, a variant of `defaultGolden` whose mismatch may be accepted — the golden file is rewritten in place and the test passes — when the env var PSLUA_GOLDEN_ACCEPT is set, à la tasty-golden's --accept. Only the derived/structural goldens (golden.ir, golden.lua) are marked acceptable. The hand-verified eval/golden.txt oracle stays on defaultGolden (acceptable = False) and is therefore never auto-accepted: its program output must change only by deliberate review. This is the safety asymmetry that scripts/golden_reset lacks (it deletes the eval oracle too). Replaces hand-deleting goldens for ordinary codegen/optimizer churn: PSLUA_GOLDEN_ACCEPT=1 cabal test spec Also adds docs/GOLDEN_TESTING.md (workflow, artifacts, the eval oracle, and the accept/reset/debug knobs) and a CLAUDE.md note that the golden harness re-implements the IR pipeline (passes must live in optimizedUberModule).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
Adds an
--accept-style escape hatch to the golden harness. When a codegen oroptimizer change legitimately moves the generated IR or Lua, you can rewrite the
affected goldens in place instead of deleting them by hand:
PSLUA_GOLDEN_ACCEPT=1 cabal test specA mismatching golden is overwritten with the actual output and the test passes,
the same way
tasty-golden --acceptdoes.The safety asymmetry
Acceptance is opt-in per golden. The harness gains
acceptableGolden, a variantof
defaultGoldenthat carries anacceptable :: Boolflag:golden.irandgolden.luaare derived: a pure function of the code undertest. They use
acceptableGolden, soPSLUA_GOLDEN_ACCEPTmay rewrite them.eval/golden.txtis the hand-verified semantic oracle. It stays ondefaultGolden(acceptable = False), so it is never auto-accepted. Aprogram-output change still fails the run until you fix the regression or
update the oracle on purpose.
This is the guard
scripts/golden_resetlacks: that script deletes everygolden.*, the eval oracle included, which is how a past regression slippedthrough unnoticed.
PSLUA_GOLDEN_ACCEPTonly touches the structural goldens.Docs
Adds
docs/GOLDEN_TESTING.md: what each artifact is, why the eval golden is theoracle, the accept / reset / full-diff knobs, and how to add a new golden test.
Also adds a
CLAUDE.mdnote that the golden harness re-implements the IRpipeline in
compileCorefn, so any new IR pass must live insideoptimizedUberModuleor the goldens silently bypass it.Scope
Test infrastructure only, no compiler or runtime change. Independent of #105
(the magic-do fix for #46); this branch is cut from
main.Verification
cabal test spec: 233/0 onmainplus this branch.golden.irandeval/golden.txtfor one module, then ran withPSLUA_GOLDEN_ACCEPT=1. Thegolden.irwas accepted and rewritten back to the regenerated output (cleangit diff), while theeval/golden.txttest still failed and the file wasleft untouched.
fourmolu,hlint,nix fmtclean on the touched files.