Building a Custom Diagnostic Panel for a Forensic-Psychiatry Clinic

A forensic-psychiatry clinic doesn’t run a test. It runs a shifting battery of them — anxiety and depression screens, substance-use measures, psychosis and delusion scales, a clinician-rated symptom inventory, suicide-risk triage, an aggression-syndrome inventory — and it runs them repeatedly, on the same people, over months. The clinic I built for, Klinika Psychiatrii Sądowej (KPS), needed all of that as routine work: administer each instrument, score it correctly, store the result, and see whether a patient is actually changing between assessments.

The obvious way to build this is one tool per test. It’s also the wrong way. Ten instruments built as ten tools means ten places to maintain scoring logic, ten security surfaces, ten report formats, and a fresh development project every time the clinic wants to add the eleventh. So I didn’t build ten tools. I built one engine. This is the full case study; here’s how it works and why it generalizes.

One engine, many instruments

The core decision was to make a test a definition, not code. Each instrument in the panel is a version-controlled file that declares everything the engine needs: the items, the scales, which items are reverse-keyed, the norms, the cutoffs, and how results should be displayed. A single pure scoring function reads that definition and produces a result. Adding an instrument means writing a definition and importing it — not touching the scoring engine, the security layer, or the reporting code.

That declarative design is what made the battery cheap to grow. Standard scales — GAD-7, PHQ-9, AUDIT, DAST-10, PQ-16, CAPE-42, PDI-21, BPRS-18 — slot straight in. The awkward ones get a custom-scoring escape hatch: the C-SSRS suicide-risk triage and IPSA’s sex-specific sten norms don’t fit the standard scoring path, so they bypass it without forcing a redesign for everything else. One engine, roughly ten instruments live, and the marginal cost of the next one is an afternoon of authoring rather than a sprint of engineering.

Scoring it right: norms, cutoffs, and real change

A raw score is meaningless until it’s placed against a reference. The engine normalizes every result the way the instrument’s manual specifies — sten, T-score, or percentile — and applies the published clinical cutoffs, so a clinician reads “above threshold for X,” not an uninterpreted number.

The harder problem is change over time. Re-test a patient and their score will move, but most of that movement is measurement error, not progress. Treating a raw before-and-after as if it were real is one of the most common mistakes in repeat testing. So the panel reports a Reliable Change Index: it scales the difference between two administrations by the instrument’s measurement error and tells the clinician whether the change is large enough to be genuine. This is just Classical Test Theory taken seriously — reliability isn’t a number you report once at validation and forget, it’s what decides whether this patient actually improved.

Special-category data, secured from the first commit

This is forensic psychological assessment on the most sensitive data there is — special-category health records under RODO Article 9, in a setting where every result may have to be defended. Security here was not a later phase. Field-level encryption at rest, an append-only audit log, passwordless authentication, and need-to-know access control were in from the start, because a forensic record that leaks, or that can’t account for who viewed it, is a failure no matter how good the psychometrics are. Building the security in first is cheaper and far safer than retrofitting it onto a system that already holds real patient data.

What this generalizes to

KPS is the first engagement of badania.org, my practice for custom psychometric test development — and it’s a clean illustration of the whole offer. Most measurement problems worth paying for aren’t “pick a questionnaire.” They’re “we need to measure this specific thing, in this population, under these constraints, and report it defensibly.” That’s a methodology problem and a software problem at once, and the value comes from the same person owning both — so the construct definition, the scoring, and the deployed panel never drift out of sync.

Sometimes the right answer is still an off-the-shelf instrument; I’ve written about when custom earns its longer build cycle and when it doesn’t. But when the tool you need doesn’t exist yet — clinical, research, or organizational — building it properly is exactly the work badania.org takes on. If that’s your problem, get in touch or see my CV for the broader context.

Building a Custom Diagnostic Panel for a Forensic-Psychiatry Clinic

One engine, many instruments

Scoring it right: norms, cutoffs, and real change

Special-category data, secured from the first commit

What this generalizes to

Custom or Off-the-Shelf Psychometric Instrument?

HiPo Programs: Why Tenure Predicts Only Two Traits

Four Workplace Personality Types: A Two-Study Replication

Measuring Employee Engagement Beyond the Gallup Q12

One engine, many instruments

Scoring it right: norms, cutoffs, and real change

Special-category data, secured from the first commit

What this generalizes to

Related reading

Custom or Off-the-Shelf Psychometric Instrument?

HiPo Programs: Why Tenure Predicts Only Two Traits

Four Workplace Personality Types: A Two-Study Replication

Measuring Employee Engagement Beyond the Gallup Q12