Testing Strategy
The project uses focused unit tests, backend integration tests against PostgreSQL, frontend component tests, frontend production build checks, and AI eval harnesses.
Backend Tests
Run from server/:
uv run pytest
uv run ruff check .
Backend tests cover auth, schemas, JWTs, uploads, combat, encounters, generation helpers, retrieval/chunking, eval metrics, recap extraction, and integration routes.
Integration tests use a real PostgreSQL database. Keep them scoped and avoid network calls unless the test explicitly targets an integration requiring them.
Frontend Checks
Run from web/:
npm run build
The build catches TypeScript and bundling errors. Component tests live beside components and screens where present.
Docs Checks
Run from the root:
npm run docs:build
This catches broken Docusaurus config, malformed MDX, and broken strict links.
AI Evals
Eval harnesses live in server/evals/. Use them when changing retrieval,
generation prompts, citation logic, encounter math, or NPC quality scoring.
Judge-free metrics should stay deterministic. LLM-judge rubrics should be versioned and validated against golden examples.
What to Test for a Change
| Change | Minimum verification |
|---|---|
| Backend route | Targeted pytest plus full backend test run when practical. |
| Pydantic schema | Schema tests and affected route tests. |
| Serializer/DTO | Backend serializer tests or route tests plus frontend build. |
| Frontend component | Component test if behavior is non-trivial plus frontend build. |
| Database model | Migration upgrade plus affected integration tests. |
| RAG/generation | Unit tests with fake providers plus relevant eval when useful. |
| Documentation | npm run docs:build. |