EXAMPLE REPORT

This is what a Design Pilot delivers.

The actual ForskAI design-risk report for one planned study — the verdict, the pattern across plausible worlds, the assumptions behind it, and the ranked fixes. The study below is illustrative, not a real cohort. Use Download PDF to file it as paper.

← forskai.com

forskai · design-risk report· stage: report

Can the planned short-form palliative-care attitudes survey detect a group difference between concurrent- and non-concurrent-framing respondents?

forskai-2026-0001 · v1 · psychometric · standardised_mean_difference 0.4 SD · N=96 · mann_whitney

Illustrative. Figures here are illustrative/demo data under stated assumptions, not a measured analysis.

FAIL

The Likert short form recovers the 0.40 SD group difference in only ~39% of expected-scenario runs; the ceiling structurally caps detectable variation.

Signal Recovery Index41/100

Shown alongside, never instead of, the metrics below.

recovery probability
bias control
coverage
estimator stability
measurement adequacy
missingness robustness
pipeline integrity

Scenario grid — the pattern across plausible worlds

Expected
39%
Optimistic
71%
Conservative
16%
Null (false-positive check)
5%
Measurement stress
28%

Bars show recovered power per scenario (the null bar shows the false-positive rate). The marker is the target — 80% power, or the 5% false-positive line for the null.

Recovery metrics

ScenarioPowerType IRel. biasCoverageConv.Identif.
Expected39%5%6%94%100%clear
Optimistic71%5%95%100%clear
Conservative16%9%93%100%clear
Null (false-positive check)5%5%95%100%clear
Measurement stress28%12%92%100%clear

Assumptions

AssumptionValueSourceGradeSensitivity
group_difference0.4Calibrated to published item means/SDs (Astrom et al., 2026)C0.2–0.6 (high)
short_form_reliability0.88Univariate screen of published item statisticsC0.8–0.92 (medium)
likert_ceiling27Reported item means/SDs; high ceiling on values itemsB17–40 (high)

Design-risk diagnosis

  • fatal Ceiling effect compresses detectable variation · measurement · under expected, conservative, measurement_stress
  • major Planned N too small for the recoverable effect on this instrument · sampling · under expected, conservative

Recommended fixes — the Fix Ladder

  1. 1

    Make the free-text concept-coverage score (0–25) the primary measure; treat the Likert short form as secondary/descriptive.

    The coverage measure separates framing groups at near-certain power where the Likert items recover the same effect at ~0.15.

    low cost · measurement
  2. 2

    Pilot the free-text prompt wording (three variants) before the confirmatory study.

    Prompt wording shifts measured coverage and the non-concurrent-framing rate; choosing the least biasing wording protects the estimand.

    medium cost · measurement
  3. 3

    Increase N if the Likert comparison must remain primary.

    Raises power, but cannot overcome the ceiling on the values items.

    high cost · sample

signed off methodologist

What this report is — and is not. This report tests whether the planned design has a plausible path to recovering the intended signal under stated assumptions. It does not validate the construct, guarantee study success, replace ethical or statistical review, or provide clinical advice. A PASS is design evidence under the stated assumptions — never a validity claim about people, and never a promise of results.

engine 0.0.0-illustrative · seed 20260620 · sims/scenario 1500 · thresholds forskai-default-2026

Bring us one planned study.

We will show where it can recover, where it is at risk, and what has to change.

Request a design review

or see an example report →