legitnyhr App
The Challenge
Build a single-team consultancy tool that walks one job-evaluation engagement end-to-end — methodology to PDF evidence package — with regulatory accuracy under the EU Pay Transparency Directive and Polish Labour Code Art. 18³c. Clients never log in; the deliverable is the evidence package. No multi-tenant SaaS complexity, but every output has to be legally defensible and reproducible from immutable snapshots.
The Approach
Engagement-shaped architecture: filesystem + git as storage (no DB), one repo per client engagement, immutable methodology locks for Art. 9 GPG reproducibility, AI assists drafting and analysis while humans make every binding decision. Workspace methodology library ships Hay, IPE 5-factor, and Mercer 3-pillar templates. M6 compa-ratio diagnostic and prescriptive scenarios. M7 wave (May 2026): Categories module with deterministic k-means, Article 9 GPG report with 9-row readiness checklist, 8-phase Harmonogram, 7-sheet Excel export. Liksza published gold fixtures reproduce her gap percentages exactly — that's the regulatory floor.
Results
- Phase 1 deployed 2026-04-29; M7 wave shipped 2026-05-04 → 2026-05-09
- Full engagement flow: methodology → JD scoring → grading → categories → compa-ratio → GPG report → Harmonogram → PDF evidence package
- Liksza (LEX/el. 2024) Tabela 1 + Tabela 4 gap percentages reproduced exactly from gold fixtures
- Versioned, immediately-locked Article 9 reports with 9-row readiness checklist gating Generuj
- PDF evidence packages with Załączniki A (compa-ratio), B (scenarios), C (categories), D (GPG report)
- Three methodology templates shipped: Hay 4-pillar, IPE 5-factor, Mercer 3-pillar
- Provider-abstracted AI client (Anthropic | OpenAI | stub) — switchable per-deployment
The legitnyhr app is the internal consultant tool that runs every legitnyhr engagement end-to-end. It is built for a single team — the four founders of legitnyhr — and the deliverable is never the application itself. The deliverable is the PDF evidence package the client takes home: methodology document, JD scoring matrix, grading bands, worker categories, Article 9 gender pay-gap report, Harmonogram, and the supporting appendices that make the work defensible to a labour inspectorate (PIP).
What It Does
A consultant signs in (Google OAuth, email allowlist), creates an engagement, picks a methodology template from the workspace library (Hay 4-pillar, IPE 5-factor, or Mercer 3-pillar), authors role descriptions, scores each role across the methodology pillars with mandatory rationale on every score, locks the methodology so subsequent reports compute against a stable target, groups roles into worker categories with optional AI k-means suggestions, runs the Article 9 GPG report against locked categories and a salary snapshot, builds the 8-phase Harmonogram for the engagement, and exports the full PDF evidence package along with multi-sheet Excel exports.
AI assists in three places: neutral-language rewrites on JD prose to remove gendered or biased phrasing, structured section-by-section JD drafting suggestions, and deterministic k-means seed suggestions for category grouping. The neutrality rewrites and JD suggestions go through an audit log; every accept/edit/reject decision is captured. Humans make every binding decision — the AI never auto-locks methodology, never auto-publishes reports, never auto-signs evidence packages.
Architecture Decisions
Single-team usage means no multi-tenant SaaS layer, no per-tenant isolation logic, no billing or rate-limiting infrastructure. Storage is filesystem and git: one repository per client engagement, with every score, methodology choice, lock event, and report commit reproducible from history. This matters because Article 9 reports must be defensible months after filing — when a regulator asks how a particular gap percentage was computed, the audit trail has to regenerate the exact same numbers.
Methodology locks are immutable. Once a methodology version is locked, scores against it freeze; subsequent edits fork to a new version (v2, v3) so the historical record stays intact. Worker-category snapshots freeze pillar totals at lock time, which is why the same GPG report can be regenerated and produce identical numbers regardless of subsequent methodology iterations. The app deploys as a single Docker container behind Cloudflare on a Hetzner host, sharing infrastructure with my other Polish-language projects.
For the engagement model the app serves, see legitnyhr.
What Makes It Regulatory-Grade
The hard part of pay-transparency tooling is not the UI — it’s getting the math right under regulatory scrutiny. Liksza’s published 2024 LEX/el. paper became the gold-fixture target: her Tabela 1 (10.4% mean gap) and Tabela 4 (3.3% adjusted gap, 96.7% median) reproduce exactly from her example dataset, which is the regulatory floor. Anything that doesn’t match Liksza is wrong by definition until proven otherwise.
Missing-sex policy follows the directive’s logic precisely: hard-block above 10% missing, exclude below 10%, warn between 5% and 10% — with warnings surfaced in the readiness checklist before the report can be generated. The 9-row readiness checklist gates the GPG Generuj button: confidentiality acknowledgement, JDs published and scored, grading bands locked, categories locked, ≥30 members per reporting unit, snapshot locked, sex coverage ≥90%, variable pay declared, and a notes editor surfaced on every breach row.
UC 127 — the 2026-05-08 Polish MRPiPS consultation response — reshaped the reference frames. Article 9 GPG runs on the previous calendar year (so a 2028 filing reports on 2027 data). Article 8 individual right-to-info runs on a rolling 12 months preceding the request month. Headcount is RJR (annual labour units), not raw FTE — 100 temps × 1 month = 8.33 RJR, not 100. No proration in the GPG calculation itself: gap math runs on raw amounts, not annualised. Each of these has its own validator and warning surface in the tool.
For a sibling regulatory-tooling project, see Prawomat. For the consultancy behind the app, legitnyhr.