EXAMPLE REPORT
This is what a Design Pilot delivers.
The actual ForskAI design-risk report for one planned study — the verdict, the pattern across plausible worlds, the assumptions behind it, and the ranked fixes. The study below is illustrative, not a real cohort. Use Download PDF to file it as paper.
forskai · design-risk report· stage: report
Can the planned short-form palliative-care attitudes survey detect a group difference between concurrent- and non-concurrent-framing respondents?
forskai-2026-0001 · v1 · psychometric · standardised_mean_difference 0.4 SD · N=96 · mann_whitney
Illustrative. Figures here are illustrative/demo data under stated assumptions, not a measured analysis.
The Likert short form recovers the 0.40 SD group difference in only ~39% of expected-scenario runs; the ceiling structurally caps detectable variation.
Shown alongside, never instead of, the metrics below.
Scenario grid — the pattern across plausible worlds
Bars show recovered power per scenario (the null bar shows the false-positive rate). The marker is the target — 80% power, or the 5% false-positive line for the null.
Recovery metrics
| Scenario | Power | Type I | Rel. bias | Coverage | Conv. | Identif. |
|---|---|---|---|---|---|---|
| Expected | 39% | 5% | 6% | 94% | 100% | clear |
| Optimistic | 71% | — | 5% | 95% | 100% | clear |
| Conservative | 16% | — | 9% | 93% | 100% | clear |
| Null (false-positive check) | 5% | 5% | — | 95% | 100% | clear |
| Measurement stress | 28% | — | 12% | 92% | 100% | clear |
Assumptions
| Assumption | Value | Source | Grade | Sensitivity |
|---|---|---|---|---|
| group_difference | 0.4 | Calibrated to published item means/SDs (Astrom et al., 2026) | C | 0.2–0.6 (high) |
| short_form_reliability | 0.88 | Univariate screen of published item statistics | C | 0.8–0.92 (medium) |
| likert_ceiling | 27 | Reported item means/SDs; high ceiling on values items | B | 17–40 (high) |
Design-risk diagnosis
- fatal Ceiling effect compresses detectable variation · measurement · under expected, conservative, measurement_stress
- major Planned N too small for the recoverable effect on this instrument · sampling · under expected, conservative
Recommended fixes — the Fix Ladder
- 1
Make the free-text concept-coverage score (0–25) the primary measure; treat the Likert short form as secondary/descriptive.
The coverage measure separates framing groups at near-certain power where the Likert items recover the same effect at ~0.15.
low cost · measurement - 2
Pilot the free-text prompt wording (three variants) before the confirmatory study.
Prompt wording shifts measured coverage and the non-concurrent-framing rate; choosing the least biasing wording protects the estimand.
medium cost · measurement - 3
Increase N if the Likert comparison must remain primary.
Raises power, but cannot overcome the ceiling on the values items.
high cost · sample
signed off methodologist
What this report is — and is not. This report tests whether the planned design has a plausible path to recovering the intended signal under stated assumptions. It does not validate the construct, guarantee study success, replace ethical or statistical review, or provide clinical advice. A PASS is design evidence under the stated assumptions — never a validity claim about people, and never a promise of results.
engine 0.0.0-illustrative · seed 20260620 · sims/scenario 1500 · thresholds forskai-default-2026
Bring us one planned study.
We will show where it can recover, where it is at risk, and what has to change.
Request a design review