with exercises in GPower
April 22, 2024
Feedback
Help us improve this document
wilfried.cools@vub.be
at SQUARE
Ask us for help
we offer consultancy
on methodology, statistics and its communication
square.research.vub.be
If not possible in a meaningful way
use alternative justification
Avoid retrospective power analyses
→ OK for future study only
Hoenig, J., & Heisey, D. (2001). The Abuse of Power:
The Pervasive Fallacy of Power Calculations for Data Analysis. The American Statistician, 55, 19–24.
Difference detected approximately 80% of the times.
Note
?
)2
)4
).05
, so \(Z_{ \alpha /2}\) → -1.96).2
, so \(Z_ \beta\) → -0.84)
\(n = \frac{(Z_{\alpha/2}+Z_\beta)^2 * 2 * \sigma^2}{ \Delta^2} = \frac{(-1.96-0.84)^2 * 2 * 4^2}{2^2} = 62.79\)
GPower
reference
reference
~ reference example input
Determine =>
0-2
| / 4
= .5
.05
; two-tailed .8
1
(equally sized groups)
~ reference example output
128
)qt(.975,126)
Ha
(true) away from Ho
(null) 2/(4*sqrt(2))*sqrt(64)
Ha
from Ho
effect size
((standardized) signal)sample size
(information)Ho
and Ha
: bigger ncp less overlap
Ho
→ shift (location/shape)Ho
evaluated on Ha
push
with sample sizeHa
acts as \(\color{blue}{truth}\) assumed difference of e.g. .5 SD
Ha ~ t(ncp=2.828,df)
Ho
acts as \(\color{red}{benchmark}\): typically no difference, no relation
Ho ~ t(ncp=0,df)
using \(\alpha\)
\(n = \frac{(Z_{\alpha/2}+Z_\beta)^2 * 2 * \sigma^2}{d^2}\)
Inference (test) based on cut-off’s (density → AUC=1)
Type I error: incorrectly reject Ho
(false positive):
Ho
, error prob. \(\alpha\) controlledType II error: incorrectly fail to reject Ho
(false negative):
Ho
, error prob. \(\beta\) obtained from Ha
Ha
assumed known in a power analysespower = 1 - \(\beta\) = probability correct rejection (true positive)
infer=Ha | infer=Ho | sum | |
truth=Ho | \(\alpha\) | 1- \(\alpha\) | 1 |
truth=Ha | 1- \(\beta\) | \(\beta\) | 1 |
X-Y plot for range of values
~ reference example
Plot power instead of sample size
What is relation type I and II error ?
What would be difference between curves for \(\alpha\) = 0 ?
Ho
, known
Comparing the control group and two treatments
Pairwise comparisons, typically not an omnibus
use reference example
(C = 0, T1 = 2), and extend with group 3 with T2 = 4 (same sd)
Estimate / guestimate of minimal magnitude of interest
Typically standardized: signal to noise ratio
Part of non-centrality (as is sample size) → pushing away Ha
~ Practical relevance
d-family
(differences) and r-family
(associations)
Cohen, J. (1992).
A power primer. Psychological Bulletin, 112, 155–159.
Cohen, J. (1988).
Statistical power analysis for the behavioral sciences (2nd ed).
Famous Cohen conventions
Ellis, P. D. (2010).
The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results.
more than 70 different effect sizes… most of them related
Determine
Call:
lm(formula = y ~ factor(group), data = .dta)
Residuals:
Min 1Q Median 3Q Max
-10.6795 -2.6556 0.5043 2.6463 8.8380
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.421e-16 5.000e-01 0.000 1.00000
factor(group)2 2.000e+00 7.071e-01 2.828 0.00544 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4 on 126 degrees of freedom
Multiple R-squared: 0.0597, Adjusted R-squared: 0.05224
F-statistic: 8 on 1 and 126 DF, p-value: 0.005444
Df Sum Sq Mean Sq F value Pr(>F)
group 1 128 128 8 0.00544 **
Residuals 126 2016 16
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] 0.00544392
t
-2.828427
reference example
:
reference example
:
Choice of effect size matters → justify choice !!
Choice of effect size depends on aim of the study
Choice of effect size dependent on statistical test of interest
...
reference example
reference example
:
~
\(\alpha\), power
, \(\Delta\)power
~
\(\alpha\), \(n\), \(\Delta\)power
, \(\alpha\) ~
\(\beta\:/\:\alpha\), \(\Delta\), \(n\)~
power
, \(\Delta\), \(n\)~
\(\alpha\), power
, \(n\)
reference example
:
G*Power
So far, comparing two independent means
From now on, selected topics beyond independent t-test
with small exercises
When comparing 2 dependent groups (eg., before/after treatment) → account for correlations
Correlation are typically obtained from pilot data, earlier research
GPower: matched pairs (t-test / means, difference 2 dependent means)
Use reference example
,
reference
with sample size 2x64, but correlation .5 ?
Test difference two independent proportions → [0..1]
Simplest version of a logistic regression on two groups
GPower: Fisher Exact Test (exact / proportions, difference 2 independent proportions)
Testing whether two proportions are the same
GPower: Fisher Exact Test (exact / proportions, difference 2 independent proportions)
Plot 5 power curves
Explain curve minimum, relation sample size ?
Repeat for one-tailed, difference ?
Test difference two dependent proportions → [0..1] categorical shift
GPower: McNemar test (exact / proportions, difference 2 dependent proportions)
Testing whether proportions of discordance are the same
When non-normally distributed residuals are expected, not possible to circumvent (eg., transformations)
Only considers ranks or uses permutations → price is efficiency and flexibility
Requires a parent distribution (alternative hypothesis), ‘min ARE’ should be default
GPower: two groups → Wilcoxon-Mann-Whitney (t-test / means, diff. 2 indep. means)
Use reference example
Differences between groups ~ relation with grouping (categorization)
Example: d = .5 ~ r = .243 (note: slope \(\beta = {r*\sigma_y} / {\sigma_x}\))
GPower: regression coefficient (t-test / regression, one group size of slope)
A relation as ratio between and within group variance ~ explained variance R2
Different but related effect sizes \(f^2\) = \({R^2/{(1-R^2)}}\)
GPower: regression coefficient (t-test / regression, fixed model single regression coef)
Use reference example
Multiple groups, at least two differ → not one effect size d
F-test statistic & effect size f
, ratio of variances \(\sigma_{between}^2 / \sigma_{within}^2\)
GPower: one-way Anova (F-test / Means, ANOVA - fixed effects, omnibus, one way)
use reference example
include a third group (group 1 = 0, group 2 = 2)
\(\sigma_{contrast} = \frac{|\sum{\mu_i * c_i}|}{\sqrt{N \sum_i^k c_i^2 / n_i}}\)
\(f = \sqrt{\frac{\sigma_{contrast}^2}{\sigma_{error}^2}}\)
GPower: one-way ANOVA (F-test / Means, ANOVA-fixed effects,special,main,interaction)
For the reference example
extended, with contrasts \(f_{T1-C}\)=.25, \(f_{T2-C}\)=.50 and \(f_{(T2+T1)/2-C}\)=.3535
When the factor is time: repeated measures
Beware: effect sizes obtained from literature may/may not include correlation
GPower: repeated measures (F-test / Means, repeated measures within factors)
For reference example
with effect size f = .25 (1/16 explained versus unexplained)
Get effect size: in-house shiny app
Use reference example
for treatment (C-T1-T2) and add type (B1-B2)
GPower: multiway ANOVA (F-test / Means, ANOVA-fixed effects,special,main,interaction)
Use reference example
for treatment (C-T1-T2) and add type (B1-B2)
what are the sample sizes
When repeated measures are obtained for different groups
GPower: repeated measures (F-test / Means, repeated measures between factors)
For reference example
When differences between groups depend on time
Get effect size: in-house shiny app
Use reference example
for control-treatment (C-T1), and 2 or 4 time points
Options: different effect sizes are possible
GPower: repeated measures (F-test / Means, repeated measures within-between factors)
Use effect sizes previous exercise, part 2 and 3
Test difference of two independent correlations → [-1..1]
Use Fisher Z transformations to normalize
Correlations easier to differentiate as they are more different from 0
GPower: z-tests / correlation & regressions: 2 indep. Pearson r’s
Testing whether two correlations are the same
GPower not for always sufficient
Tests too difficult to specify in GPower
Tests not included in GPower
Tests without formula
dta$y <- dta$y+rnorm(length(dta$y),0,4)
res <- t.test(data=dta,y~X)
mean(sims['p.val',] < .05)
gr <- rep(c('T','C'),64)
y <- ifelse(gr=='C',0,2)
dta <- data.frame(y=y,X=gr)
cutoff <- qt(.025,nrow(dta))
my_sim_function <- function(){
dta$y <- dta$y+rnorm(length(dta$y),0,4) # generate (with sd=4)
res <- t.test(data=dta,y~X) # analyze
c(res$estimate %*% c(-1,1),res$statistic,res$p.value)
}
sims <- replicate(10000,my_sim_function()) # many iterations
dimnames(sims)[[1]] <- c('diff','t.stat','p.val')
mean(sims['p.val',] < .05) # p-values 0.8029
mean(sims['t.stat',] < cutoff) # t-statistics 0.8029
mean(sims['diff',] > sd(sims['diff',])*cutoff*(-1)) # differences 0.8024
Sample size calculation is a design issue, not a statistical one
It typically focuses on ensuring sufficient data to result in sufficiently strong statistical inference
Sample size depends on effect size, type I & II errors, and the statistical test of interest
Effect sizes express the amount of signal compared to the background noise
GPower deals with not too complex models
Thank you for your attention.