sample size calculation

with exercises in GPower

Wilfried Cools & Tim Pauwels

April 22, 2024

Workshop

  • to introduce the key ideas
  • to help you see the bigger picture
  • to offer first practical experience: GPower

 

Feedback

  • Help us improve this document

    wilfried.cools@vub.be

at SQUARE

  • Ask us for help

    we offer consultancy
    on methodology, statistics and its communication

    square.research.vub.be

Program

 

  • Part I:
    understand the reasoning
    • introduce building blocks
    • highlight how the relate
    • focus on t-test only
    • a few exercises in GPower

 

  • Part II:
    explore more complex situations
    • go beyond the t-test
    • simple but common
    • many exercises in GPower
    • not one formula for all

Sample size calculation: demarcation

 

  • How many observations will be sufficient ?
    • avoid too many, because typically observations imply a cost
      • money / time → limited resources
      • risk / harm / patient burden → ethical constraints
    • have enough
  • To offer strong enough statistical inference !
    • linked to standard error
      • testing → power [probability to detect effect]
      • estimation → accuracy [size of confidence interval]

Sample size calculation: a difficult design issue

 

  • Part of the design of a study
    • before data collection
    • requires understanding:
      • parameters: effect size of interest
      • data: future data properties
      • model: relation outcome and its conditions under which observed
    • decision based on (highly) incomplete information, thus based on (strong) assumptions

 

  • Not always possible nor meaningful !
    • confirmatory studies easier than exploratory
    • experiments (control) easier than observational
    • not obvious for complex models
      → simulation
    • not obvious for predictive models
      → no standard error

Sample size calculation: if not possible

 

  • If not possible in a meaningful way
    use alternative justification

    • common practice
    • feasibility

    •  
  • Avoid retrospective power analyses
    → OK for future study only

    Hoenig, J., & Heisey, D. (2001). The Abuse of Power:
    The Pervasive Fallacy of Power Calculations for Data Analysis. The American Statistician, 55, 19–24.

 

  • Because less strong,
    put more weight on non-statistical justification
    • low cost
    • importance

Simple example confirmatory experiment

 

  • Does my radiotherapy work ?
    • aim: show my radiotherapy reduces tumor size
    • method: compare treatment and control group
      • tumor induced in N mice
      • random assignment mice
    • data: tumor sizes after 20 days
    • analysis: unpaired t-test

 

  • Sample size question
    • how many mice are required
    • to show treatment reduces tumor size more
    • assuming effect size:
      • my radiotherapy works if 25% more reduction treatment group
      • 20% ~ 4mm (control) versus 5mm (treatment)
    • with 80% probability (+ type I error probability .05)

Reference example

 

  • Apriori specifications
    • intend to perform a statistical test
    • comparing 2 equally sized groups
    • to detect difference of at least 2
    • assuming an uncertainty of 4 SD on each average
    • which results in an effect size of .5
    • evaluated on a Student t-distribution
    • allowing for a type I error prob. of .05 \((\alpha)\)
    • allowing for a type II error prob. of .2 \((\beta)\)
  • Sample size
    conditional on specifications being true

Difference detected approximately 80% of the times.

Note

  • This reference example used throughout the workshop !!

Formula you could use

 

  • Specifications for this particular case:
    • sample size (n → ?)
    • difference ( \(\Delta\) =signal → 2)
    • uncertainty ( \(\sigma\) =noise → 4)
    • type I errors ( \(\alpha\).05, so \(Z_{ \alpha /2}\) → -1.96)
    • type II errors ( \(\beta\).2, so \(Z_ \beta\) → -0.84)
  • Sample size = 2 groups x 63 observations = 126

 

\(n = \frac{(Z_{\alpha/2}+Z_\beta)^2 * 2 * \sigma^2}{ \Delta^2} = \frac{(-1.96-0.84)^2 * 2 * 4^2}{2^2} = 62.79\)

  • Formula’s are test and statistic specific but logic remains same
  • This and other formula’s implemented in various tools, our focus: GPower

GPower: a useful tool

 

  • Use it
    • implements wide variety of tests
    • free @ http://www.gpower.hhu.de/
    • popular and well established
    • implements various visualizations
    • documented -fairly- well
  • Maybe not use it
    • not all tests are included, the simpler
    • not without flaws
    • other tools exist (some paying)

 

GPower statistical tests

 

  • Test family - statistical tests [in window]
    • Exact Tests (8)
    • \(t\)-tests (11) → reference
    • \(z\)-tests (2)
    • \(\chi^2\)-tests (7)
    • \(F\)-tests (16)
  • Focus on the density functions

 

  • Tests [in menu]
    • correlation & regression (15)
    • means (19) → reference
    • proportions (8)
    • variances (2)
  • Focus on the type of parameters

GPower input

 

  • ~ reference example input
    • t-test : difference two independent averages
    • apriori: calculate sample size
    • effect size = standardized difference (Cohen’s \(d\))
      • Determine =>
        • \(d\) = |difference| / SD_pooled
        • \(d\) = |0-2| / 4 = .5
    • \(\alpha\) = .05; two-tailed ( \(\alpha\) /2 → .025 & .975 )
    • \(power = 1-\beta\) = .8
    • allocation ratio N2/N1 = 1 (equally sized groups)

GPower output

 

  • ~ reference example output
    • sample size \((n)\) = 64 x 2 = (128)
    • degrees of freedom \((df)\) = 126 (128-2)
    • power ≥ .80 (1- \(\beta\)) = 0.8015
    • distributions: central + non-central
    • critical t = 1.979
      • decision boundary given \(\alpha\) and \(df\)
        qt(.975,126)
    • non centrality parameter ( \(\delta\) ) = 2.8284
      • shift Ha (true) away from Ho (null)
        2/(4*sqrt(2))*sqrt(64)

GPower protocol

 

  • Summary for future reference or communication
    • central and non-central distributions (figure)
    • protocol of power analysis (text)

 

  • File/Edit save or print file (copy-paste)

 

Non-centrality parameter ( \(\delta\) ), shift Ha from Ho

 

  • non-centrality parameter \(\delta\) combines SIZES
    • assumed effect size ((standardized) signal)
    • conditional on sample size (information)
  • \(\delta\) determines overlap Ho and Ha: bigger ncp less overlap
    • \(\delta\) as violation of Ho → shift (location/shape)
    • power = probability beyond \(\color{green}{cut off}\) at Ho evaluated on Ha
    • push with sample size
  • Ha acts as \(\color{blue}{truth}\) assumed difference of e.g. .5 SD
    • Ha ~ t(ncp=2.828,df)
  • Ho acts as \(\color{red}{benchmark}\): typically no difference, no relation
    • set \(\color{green}{cutoff}\) on Ho ~ t(ncp=0,df) using \(\alpha\)

 

Alternative: divide by N

 

  • Sample sizes determine shape, not location
    • divide by n: sample size ~ standard error
      • peakedness of both distributions
      • often preferred didactically
    • non-centrality parameter: sample size ~ location
      • standardized distributions
      • often preferred in software / algorithms
  • Formula’s same (DIY: two equations for critical value)

 

 

\(n = \frac{(Z_{\alpha/2}+Z_\beta)^2 * 2 * \sigma^2}{d^2}\)

Type I/II error probability

 

  • Inference (test) based on cut-off’s (density → AUC=1)

  • Type I error: incorrectly reject Ho (false positive):

    • cut-off at Ho, error prob. \(\alpha\) controlled
    • one/two tailed → one/both sides informative ?
  • Type II error: incorrectly fail to reject Ho (false negative):

    • cut-off at Ho, error prob. \(\beta\) obtained from Ha
    • Ha assumed known in a power analyses
  • power = 1 - \(\beta\) = probability correct rejection (true positive)

 

  • Inference versus truth
    • infer: effect exists vs. unsure
    • truth: effect exist vs. does not
infer=Ha infer=Ho sum
truth=Ho \(\alpha\) 1- \(\alpha\) 1
truth=Ha 1- \(\beta\) \(\beta\) 1

Create plot

 

  • Create a plot
    • X-Y plot for range of values
    • assumes calculated analysis
      • ~ reference example
    • specify Y-axis / X-axis / curves and constant
      • beware of order !
  • Plot sample size (y-axis)
    • by type I error \(\alpha\) (x-axis) → from .01 to .2 in steps of .01
    • for 4 values of power (curves) → with values .8 in steps of .05
    • assume effect size (constant) → .5 from reference example

 

  • Notice Table option

Errors: exercises