1 + 1:30 introduce researchers to key ideas (know how), to help you reason about it (why), and make sure you are able to (get it done)
Wilfried @ SQUARE
square.research.vub.be
April 02, 2023
Goal
Target audience
Feedback
1 + 1:30 introduce researchers to key ideas (know how), to help you reason about it (why), and make sure you are able to (get it done)
Part I: understand the reasoning
Part II: explore more complex situations
GPower
1:30
first focus on essence with simple example then extend and exercise
How many observations will be sufficient ?
Linked to statistical inference (using standard error)
5:00 if going slow
It is about answering your research question while avoiding avoidable costs, only works when focused on inference because of the standard error
Before data collection, during design of study
Not always possible nor meaningful !
Alternative justifications often more realistic:
8:00
What do you want !?!! because about how to ensure you get it ! And what will the data look like, in practice, not easy because unknown, voodoo Maybe not always so important because maybe often it is not possible nor meaningful Then focus on what you can do... explain, convince Show you have given it careful thought
Example: does this method work for reducing tumor size ?
evaluation of radiotherapy to reduce a tumor in mice
comparing treatment group with control (=conditions)
intended analysis: unpaired t-test to compare averages for treatment and control
SAMPLE SIZE CALCULATION:
2:00
Just a first possible example where all is straightforward. It considers the goal, the statistical test.
Apriori specifications
Sample size
conditional on specifications being true
2:40
Another example, with values used throughout the workshop. WRITE delta 2 sigma 4 so effect size .5 alpha .05 beta .2 thus power .8 n ?
For this particular case:
?
)2
)4
).05
, so Zα/2 → -1.96).2
, so Zβ → -0.84)
Sample size = 2 groups x 63 observations = 126
Note: formula's are test and statistic specific
logic remains same
This and other formula's implemented in various tools
our focus: GPower
n=(Zα/2+Zβ)2∗2∗σ2Δ2
n=(−1.96−0.84)2∗2∗4222=62.79
2:00
It is simple to extract the sample size using only these numbers. The alpha and beta is interpreted on a normal distribution, as cut-off values for probabilities by quantiles. This is the simplest case, not the t-distribution which depends on the degrees of freedom.
4 components and 2 distributions
Calculate sample size based on effect size, and type I / II error
2:00
One of the distributions reflects the absence of effect, the other combines the size of the effect and the information available to try and detect that effect. The actual distributions depend on the statistical test of interest. The shift depends on both; effect size and sample size. The shift has consequences for how much of the Ha distribution is beyond the cut-off at Ho distribution. The only issue is how far the distribution shifts...
Use it
Maybe not use it
2:30
GPower because it offers calculations for different tests, no need to study formulas. There are good reasons to use it, but... not all is perfect.
reference
reference
1:30
Before focus on one of the 11 t-test, or one of the 19 means comparisons, however you want to look at it. Various other tests exist, categorized in one of two ways.
~ reference example input
0-2
| / 4
= .5
.05
.8
1
2:00
For the reference example the input is given, t effect sizes are specified with 'determine'.
We choose a test, type, to get sample size, we use effect size 2/4, alpha .05 and beta .2.
SHOW MARKER
~ reference example output
128
)qt(.975,126)
Ha
(true) away from Ho
(null) 2/(4*sqrt(2))*sqrt(64)
2:30
The result is 'almost' the same as before, with the normal distribution, but slightly less efficient. The critical t depends on the degrees of freedom (or sample size). The resulting non-centrality parameter (shift) combines effect size and sample size.
SHOW MARKER
t tests - Means: Difference between two independent means (two groups)
Analysis: A priori: Compute required sample size
Input:
Tail(s) = Two
Effect size d
= 0.5000000
α
err prob = 0.05
Power (1-β
err prob) = .8
Allocation ratio N2/N1 = 1
Output:
Noncentrality
parameter δ = 2.8284271
Critical t
= 1.9789706
Df = 126
Sample size group 1 = 64
Sample size group 2 = 64
Total sample size
= 128
Actual power = 0.8014596
00:30
Maybe convenient is that you can copy-paste the resulting output (and input) into a text file, to communicate to others or yourself later.
Ha
from Ho
Ho
acts as benchmark → eg., no difference
Ho ~ t(ncp=0,df)
using α, Ho
if test returns implausible
valueHa
acts as truth → eg., difference of .5 SD
Ha ~ t(ncp!=0,df)
Ho
→ shift (location/shape)δ, the non-centrality parameter
effect size
(target or signal)sample size
(information)Ho
evaluated on Ha
4:00
All depends on the difference between the distribution assuming no effect, and the one representing the effect of interest. The shift is quantified by the non-centrality parameter, which combines sample and effect size.
Ha
is NOT interchangeable with Ho
Cut-off at Ho
using α
Ha
unknown)Ha
If fail to reject then remain in doubt
Ho
) != P(Ho
|statistic)Equivalence testing → Ha
for 'no effect'
Ho
that smaller than 0 - | Δ | AND bigger than 0 + | Δ |07:00
While simply the difference matters, between Ho and Ha, in statistics they are not the interchangeable. The alternative is just an assumed effect.
n=(Zα/2+Zβ)2∗2∗σ2d2
n=(−1.96−0.84)2∗2∗4222
n=62.79
2:30
The non-centrality parameter combines effect and sample size, alternatively sample size could be looked at separately. Here the shape changes with growing sample size.
Inference test based on cut-off's (density → AUC=1)
Type I error: incorrectly reject Ho
(false positive):
Ho
, error prob. α controlledType II error: incorrectly fail to reject Ho
(false negative):
Ho
, error prob. β obtained from Ha
Ha
assumed known in a power analysespower = 1 - β = probability correct rejection (true positive)
Inference versus truth
infer=Ha | infer=Ho | sum | |
truth=Ho | `α` | 1- `α` | 1 |
truth=Ha | 1- `β` | `β` | 1 |
3:00
Inference is based on the cut-off values, and so errors are possible.
Either it is incorrectly after the cut-off, considered from Ho,
or it is incorrectly before the cut-off.
Moving the cut-off makes one error bigger and the other smaller,
but not with equal amounts !
Given a 'truth', the probability sums up to one, you are either right or wrong.
~ reference example
2:00 + 3:00
order is important do it yourself after I did
1:00 + 3:00
red is power .8, so type II is .2, divided by 4 is .05 for alpha sample size range changes, change one building block, Y-axis responds, same curves if type I error up, power up, so II down, given the rest, but not all same strength of change if you do not allow for any type I, then power is 0, because infinity on t-distribution
Popular choices
α & β inversely related
2:00
popular choices, in ratios / percentages inversely related, so make a choice what error you want to avoid most look at surfaces, .025 * 8 for .2
Defined on the Ho, known
Multiple testing
Interim analysis
5:00 + 1:00
alpha on the Ho, so under control, assumes variation only due to sampling if multiple tests, each time possible error, prob error at least once increases compensate for multiple testing, 1 minus each time correct, bonferoni is simple way to get that with interim, also multiple testing, make decision, not only sampling, account for that alpha spending, different boundaries with adjusted alphas that total alpha not in Gpower, have a look at susanne
5:00
Estimate/guestimate of minimal magnitude of interest
Typically standardized: signal to noise ratio (noise provides scale)
Part of non-centrality (as is sample size) → pushing away Ha
~ practical relevance (not statistical significance)
2 main families of effect sizes (test specific)
d-family
(differences) and r-family
(associations)4:00
third building block, effect, magnitude standardized so that meaningful to interpret and compare signal to noise ratio, 2/4 = .5, really is .5 standard deviations means, difference on scale of pooled sd's while part of non centrality, does not include sample size statistical significance does include sample size
Vd=4Vr(1−r2)3; Vr=42Vd(d2+4)3; Vd=Vln(OR)∗3π2
1:00
Effect sizes are test specific
GPower helps with Determine
GPOWER t-f-r-... the determine button opens a window to help specify the effects size given certain values, others are calculated and transferred to the main window
reference example
: d = standardized difference less noise, better signal to noise ratio effect size bigger, less sample size THUS slightly larger t cut off no clear relation with ncp because effect size + sample size - more noise, opposite with difference in sd, bigger has more impact, much lower compensates a bit higher
reference example
: power by effect size, beware of changes after including the 6 power curves effect sizes .2 to 1.2, in steps of whatever, maybe .1 the sd 4 situation, comes with 64 observations, blue, effect size .5 the sd 2, is effect size 1 on 34, red doubling the effect size shows increase in power, but not for all the same
For the reference example
:
? no idea why n1 ≠ n2
after calculate plot, to change allocation ratio
allocation of /2 or *2 is same, just largest group differs, and can differ if standard deviations differ but does not show, so, maybe not OK effect size does not influence the increase much (multiplication)
2 10 18 28 50 38 98 160 238 412 144 382 632 955 1638
Choice of effect size matters → justify choice !!
Choice of effect size depends on aim of the study
Choice of effect size dependent on statistical test of interest
the most important is importance, if you know what matters, you can power your study to detect that usually just go to literature, find what already found, ensures realistic values, but not necessarily relevant ones never use significance itself, it is meaningless, it depends on sample size and is therefore not an effect size.
also here, one effect size can be transformed into the next, d to f, to f2 many more transformations at psychometrica
Experts / patients → use if possible → importance
minimally clinically relevant effect
Literature (earlier study / systematic review) → beware of publication bias → realistic
Pilot → guestimate dispersion estimate (not effect size → small sample)
Internal pilot → conditional power (sequential)
Guestimate uncertainty...
...
Turn to Cohen → use if everything else fails (rules of thumb)
easier said than done, often it is and remains difficult you can ask experts or patients even, for example to get a pain threshold literature, ok, if it is relevant, but maybe a bit over optimistic a pilot can help to get an idea of the dispersion, not the effect because too few data an internal pilot is possible, maybe get an estimate of the sd along the way to re-calibrate or just try your best to guess, maybe from an assumed range ? avoid rules of thumb of cohen
Building blocks:
each parameter
conditional on others
~
α, power
, Δpower
~
α, n, Δ power
, α ~
β/α, Δ, n~
power
, Δ, n~
α, power
, nAll four building blocks combined, and one obtained based on the others. So far, worked with apriori, to get the sample size. But also popular, to get the power, then you need alpha, n and delta, (post hoc) OR the relation between alpha and beta (compromise) Not sure why you would extract alpha, this is typically under control but you could use delta, often done, but maybe not always ok, see what effect size is possible with the available data.
reference example
:power for the reference was .8, we find it as such with half the size of sample, the effect size goes up a bit .5 to .7714 when using a ratio, it is .1 and .4, or .05 and .2,
reference example
:reference
case# calculator
m1=0;m2=2;s1=4;s2=4
alpha=.025;N=128
var=.5*s1^2+.5*s2^2
d=abs(m1-m2)/sqrt(2*var)*sqrt(N/2)
tc=tinv(1-alpha,N-1)
power=1-nctcdf(tc,N-1,d)
R
Ho
( Z1−α/2 )Ha
(non-central).n <- 64.df <- 2*.n-2.ncp <- 2 / (4 * sqrt(2)) * sqrt(.n).power <- 1 - pt( qt(.975,df=.df), df=.df, ncp=.ncp ) - pt( qt(.025,df=.df), df=.df, ncp=.ncp)round(.power,4)
## [1] 0.8015
You can calculate in Gpower, but, why would you do that. In R, get the cutoff on Ho, get probability on Ha, simple The two sided, has one side almost 0
So far, comparing two independent means
From now on, selected topics beyond independent t-test
with small exercises
Look into GPower manual
27 tests → effect size, non-centrality parameter and example !!
If 2 dependent groups (eg., before/after treatment) → account for correlations
Correlation typically obtained from pilot data, earlier research
GPower: matched pairs (t-test / means, difference 2 dependent means)
reference example
,Expect non-normally distributed residuals, not possible to avoid (eg., transformations)
Only considers ranks or uses permutations → price is efficiency and flexibility
Requires parent distribution (alternative hypothesis), 'min ARE' should be default
GPower: two groups → Wilcoxon-Mann-Whitney (t-test / means, diff. 2 indep. means)
reference example
reference example
, with normal parent distribution, how much efficiency is lost ?reference example
, with normal parent distribution, how much efficiency is lost ?Differences between groups → relation observations & grouping (categorization)
Example → d = .5 → r = .243 (note: slope β=r∗σy/σx)
Difference between groups or relation → ratio between and within group variance
GPower: regression coefficient (t-test / regression, fixed model single regression coef)
reference example
, regression style (sd of effect and error, but squared)Note:
reference example
, regression style (sd of effect and error, but squared)reference example
, regression style (sd of effect and error, but squared)Multiple groups → not one effect size d
F-test statistic & effect size f
, ratio of variances σ2between/σ2within
σ2between = variance between groups differences
σ2within = variance within group differences
Example: one control and two treatments
reference example
+ 1 groupDifference between some groups → at least two differ
GPower: one-way Anova (F-test / Means, ANOVA - fixed effects, omnibus, one way)
reference example
, just 2 groups C and T1 (size=64)!reference example
, just 2 groups C and T1 (size=64)!reference example
, just 2 groups C and T1 (size=64)!Assume one control, and two treatments
Apply Bonferroni correction for original 3 group example (0, 2, 4)
Contrasts are linear combinations → planned comparison
Effect sizes for planned comparisons must be calculated !!
Each contrast
Multiple testing correction may be appropriate
group means μi
pre-specified coefficients ci
sample sizes ni
total sample size N
σcontrast=|∑μi∗ci|√N∑kic2i/ni
GPower: one-way ANOVA (F-test / Means, ANOVA-fixed effects,special,main,interaction)
Obtain effect sizes for contrasts (assume equally sized for convenience)
Sample size for each contrast, each 1 df
Multiple main effects and possibly interaction effects (eg., treatment and type)
GPower: multiway ANOVA (F-test / Means, ANOVA-fixed effects,special,main,interaction)
reference example
,reference example
,reference example
,If repeated measures → account for correlations within
Possible to focus on:
Correlation within unit (eg., within subject)
Beware: effect size could include or exclude correlation
GPower: repeated measures (F-test / Means, repeated measures...)
GPower: repeated measures (F-test / Means, repeated measures within factors)
Use effect size f = .25 (1/16 explained versus unexplained)
GPower: repeated measures (F-test / Means, repeated measures between factors)
Use effect size f = .25 (1/16 explained versus unexplained)
GPower: repeated measures (F-test / Means, repeated measures within-between factors)
Option: calculate effect sizes: http://apps.icds.be/effectSizes/
If comparing two independent correlations
Use Fisher Z transformations to normalize first
GPower: z-tests / correlation & regressions: 2 indep. Pearson r's
Note that dependent correlations are more difficult, see manual
If comparing two independent proportions → bounded between 0 and 1
GPower: Fisher Exact Test (exact / proportions, difference 2 independent proportions)
Effect sizes in odds ratio, relative risk, difference proportion
For odds ratio = 2, with p2 reference probability .6
Plot power over proportions .5 to 1
Include 5 curves, sample sizes 328, 428, 528...
With type I error .05
Explain curve minimum, relation sample size ?
Repeat for one-tailed, difference ?
If comparing two dependent proportions → categorical shift
GPower: McNemar test (exact / proportions, difference 2 dependent proportions)
Various statistical tests difficult to specify in GPower
Various statistical tests not included in GPower
Various statistical tests no formula to offer sample size
gr <- rep(c('T','C'),64)y <- ifelse(gr=='C',0,2)dta <- data.frame(y=y,X=gr)cutoff <- qt(.025,nrow(dta))my_sim_function <- function(){ dta$y <- dta$y+rnorm(length(dta$X),0,4) # generate (with sd=4) res <- t.test(data=dta,y~X) # analyze c(res$estimate %*% c(-1,1),res$statistic,res$p.value)}sims <- replicate(10000,my_sim_function()) # many iterationsdimnames(sims)[[1]] <- c('diff','t.stat','p.val')mean(sims['p.val',] < .05) # p-values 0.8029mean(sims['t.stat',] < cutoff) # t-statistics 0.8029mean(sims['diff',] > sd(sims['diff',])*cutoff*(-1)) # differences 0.8024
Complex statistical models
Sample size calculations (design) for simpler research aim
Example:
Sample size calculation is a design issue, not a statistical one
Building blocks: sample & effect sizes, type I & II errors
Effect sizes express the amount of signal compared to the background noise
GPower deals with not too complex models
Methodological and statistical support to help make a difference
SQUARE provides complementary support in methodology and statistics to our research community, for both individual researchers and research groups, in order to get the best out of them
SQUARE aims to address all questions related to quantitative research, and to further enhance the quality of both the research and how it is communicated
website: https://square.research.vub.be/ includes information on who we serve, and how
booking: https://square.research.vub.be/bookings for individual consultations
Goal
Target audience
Feedback
1 + 1:30 introduce researchers to key ideas (know how), to help you reason about it (why), and make sure you are able to (get it done)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
s | Toggle scribble toolbox |
Esc | Back to slideshow |
Wilfried @ SQUARE
square.research.vub.be
April 02, 2023
Goal
Target audience
Feedback
1 + 1:30 introduce researchers to key ideas (know how), to help you reason about it (why), and make sure you are able to (get it done)
Part I: understand the reasoning
Part II: explore more complex situations
GPower
1:30
first focus on essence with simple example then extend and exercise
How many observations will be sufficient ?
Linked to statistical inference (using standard error)
5:00 if going slow
It is about answering your research question while avoiding avoidable costs, only works when focused on inference because of the standard error
Before data collection, during design of study
Not always possible nor meaningful !
Alternative justifications often more realistic:
8:00
What do you want !?!! because about how to ensure you get it ! And what will the data look like, in practice, not easy because unknown, voodoo Maybe not always so important because maybe often it is not possible nor meaningful Then focus on what you can do... explain, convince Show you have given it careful thought
Example: does this method work for reducing tumor size ?
evaluation of radiotherapy to reduce a tumor in mice
comparing treatment group with control (=conditions)
intended analysis: unpaired t-test to compare averages for treatment and control
SAMPLE SIZE CALCULATION:
2:00
Just a first possible example where all is straightforward. It considers the goal, the statistical test.
Apriori specifications
Sample size
conditional on specifications being true
2:40
Another example, with values used throughout the workshop. WRITE delta 2 sigma 4 so effect size .5 alpha .05 beta .2 thus power .8 n ?
For this particular case:
?
)2
)4
).05
, so Zα/2 → -1.96).2
, so Zβ → -0.84)
Sample size = 2 groups x 63 observations = 126
Note: formula's are test and statistic specific
logic remains same
This and other formula's implemented in various tools
our focus: GPower
n=(Zα/2+Zβ)2∗2∗σ2Δ2
n=(−1.96−0.84)2∗2∗4222=62.79
2:00
It is simple to extract the sample size using only these numbers. The alpha and beta is interpreted on a normal distribution, as cut-off values for probabilities by quantiles. This is the simplest case, not the t-distribution which depends on the degrees of freedom.
4 components and 2 distributions
Calculate sample size based on effect size, and type I / II error
2:00
One of the distributions reflects the absence of effect, the other combines the size of the effect and the information available to try and detect that effect. The actual distributions depend on the statistical test of interest. The shift depends on both; effect size and sample size. The shift has consequences for how much of the Ha distribution is beyond the cut-off at Ho distribution. The only issue is how far the distribution shifts...
Use it
Maybe not use it
2:30
GPower because it offers calculations for different tests, no need to study formulas. There are good reasons to use it, but... not all is perfect.
reference
reference
1:30
Before focus on one of the 11 t-test, or one of the 19 means comparisons, however you want to look at it. Various other tests exist, categorized in one of two ways.
~ reference example input
0-2
| / 4
= .5
.05
.8
1
2:00
For the reference example the input is given, t effect sizes are specified with 'determine'.
We choose a test, type, to get sample size, we use effect size 2/4, alpha .05 and beta .2.
SHOW MARKER
~ reference example output
128
)qt(.975,126)
Ha
(true) away from Ho
(null) 2/(4*sqrt(2))*sqrt(64)
2:30
The result is 'almost' the same as before, with the normal distribution, but slightly less efficient. The critical t depends on the degrees of freedom (or sample size). The resulting non-centrality parameter (shift) combines effect size and sample size.
SHOW MARKER
t tests - Means: Difference between two independent means (two groups)
Analysis: A priori: Compute required sample size
Input:
Tail(s) = Two
Effect size d
= 0.5000000
α
err prob = 0.05
Power (1-β
err prob) = .8
Allocation ratio N2/N1 = 1
Output:
Noncentrality
parameter δ = 2.8284271
Critical t
= 1.9789706
Df = 126
Sample size group 1 = 64
Sample size group 2 = 64
Total sample size
= 128
Actual power = 0.8014596
00:30
Maybe convenient is that you can copy-paste the resulting output (and input) into a text file, to communicate to others or yourself later.
Ha
from Ho
Ho
acts as benchmark → eg., no difference
Ho ~ t(ncp=0,df)
using α, Ho
if test returns implausible
valueHa
acts as truth → eg., difference of .5 SD
Ha ~ t(ncp!=0,df)
Ho
→ shift (location/shape)δ, the non-centrality parameter
effect size
(target or signal)sample size
(information)Ho
evaluated on Ha
4:00
All depends on the difference between the distribution assuming no effect, and the one representing the effect of interest. The shift is quantified by the non-centrality parameter, which combines sample and effect size.
Ha
is NOT interchangeable with Ho
Cut-off at Ho
using α
Ha
unknown)Ha
If fail to reject then remain in doubt
Ho
) != P(Ho
|statistic)Equivalence testing → Ha
for 'no effect'
Ho
that smaller than 0 - | Δ | AND bigger than 0 + | Δ |07:00
While simply the difference matters, between Ho and Ha, in statistics they are not the interchangeable. The alternative is just an assumed effect.
n=(Zα/2+Zβ)2∗2∗σ2d2
n=(−1.96−0.84)2∗2∗4222
n=62.79
2:30
The non-centrality parameter combines effect and sample size, alternatively sample size could be looked at separately. Here the shape changes with growing sample size.
Inference test based on cut-off's (density → AUC=1)
Type I error: incorrectly reject Ho
(false positive):
Ho
, error prob. α controlledType II error: incorrectly fail to reject Ho
(false negative):
Ho
, error prob. β obtained from Ha
Ha
assumed known in a power analysespower = 1 - β = probability correct rejection (true positive)
Inference versus truth
infer=Ha | infer=Ho | sum | |
truth=Ho | `α` | 1- `α` | 1 |
truth=Ha | 1- `β` | `β` | 1 |
3:00
Inference is based on the cut-off values, and so errors are possible.
Either it is incorrectly after the cut-off, considered from Ho,
or it is incorrectly before the cut-off.
Moving the cut-off makes one error bigger and the other smaller,
but not with equal amounts !
Given a 'truth', the probability sums up to one, you are either right or wrong.
~ reference example
2:00 + 3:00
order is important do it yourself after I did
1:00 + 3:00
red is power .8, so type II is .2, divided by 4 is .05 for alpha sample size range changes, change one building block, Y-axis responds, same curves if type I error up, power up, so II down, given the rest, but not all same strength of change if you do not allow for any type I, then power is 0, because infinity on t-distribution
Popular choices
α & β inversely related
2:00
popular choices, in ratios / percentages inversely related, so make a choice what error you want to avoid most look at surfaces, .025 * 8 for .2
Defined on the Ho, known
Multiple testing
Interim analysis
5:00 + 1:00
alpha on the Ho, so under control, assumes variation only due to sampling if multiple tests, each time possible error, prob error at least once increases compensate for multiple testing, 1 minus each time correct, bonferoni is simple way to get that with interim, also multiple testing, make decision, not only sampling, account for that alpha spending, different boundaries with adjusted alphas that total alpha not in Gpower, have a look at susanne
5:00
Estimate/guestimate of minimal magnitude of interest
Typically standardized: signal to noise ratio (noise provides scale)
Part of non-centrality (as is sample size) → pushing away Ha
~ practical relevance (not statistical significance)
2 main families of effect sizes (test specific)
d-family
(differences) and r-family
(associations)4:00
third building block, effect, magnitude standardized so that meaningful to interpret and compare signal to noise ratio, 2/4 = .5, really is .5 standard deviations means, difference on scale of pooled sd's while part of non centrality, does not include sample size statistical significance does include sample size
Vd=4Vr(1−r2)3; Vr=42Vd(d2+4)3; Vd=Vln(OR)∗3π2
1:00
Effect sizes are test specific
GPower helps with Determine
GPOWER t-f-r-... the determine button opens a window to help specify the effects size given certain values, others are calculated and transferred to the main window
reference example
: d = standardized difference less noise, better signal to noise ratio effect size bigger, less sample size THUS slightly larger t cut off no clear relation with ncp because effect size + sample size - more noise, opposite with difference in sd, bigger has more impact, much lower compensates a bit higher
reference example
: power by effect size, beware of changes after including the 6 power curves effect sizes .2 to 1.2, in steps of whatever, maybe .1 the sd 4 situation, comes with 64 observations, blue, effect size .5 the sd 2, is effect size 1 on 34, red doubling the effect size shows increase in power, but not for all the same
For the reference example
:
? no idea why n1 ≠ n2
after calculate plot, to change allocation ratio
allocation of /2 or *2 is same, just largest group differs, and can differ if standard deviations differ but does not show, so, maybe not OK effect size does not influence the increase much (multiplication)
2 10 18 28 50 38 98 160 238 412 144 382 632 955 1638
Choice of effect size matters → justify choice !!
Choice of effect size depends on aim of the study
Choice of effect size dependent on statistical test of interest
the most important is importance, if you know what matters, you can power your study to detect that usually just go to literature, find what already found, ensures realistic values, but not necessarily relevant ones never use significance itself, it is meaningless, it depends on sample size and is therefore not an effect size.
also here, one effect size can be transformed into the next, d to f, to f2 many more transformations at psychometrica
Experts / patients → use if possible → importance
minimally clinically relevant effect
Literature (earlier study / systematic review) → beware of publication bias → realistic
Pilot → guestimate dispersion estimate (not effect size → small sample)
Internal pilot → conditional power (sequential)
Guestimate uncertainty...
...
Turn to Cohen → use if everything else fails (rules of thumb)
easier said than done, often it is and remains difficult you can ask experts or patients even, for example to get a pain threshold literature, ok, if it is relevant, but maybe a bit over optimistic a pilot can help to get an idea of the dispersion, not the effect because too few data an internal pilot is possible, maybe get an estimate of the sd along the way to re-calibrate or just try your best to guess, maybe from an assumed range ? avoid rules of thumb of cohen
Building blocks:
each parameter
conditional on others
~
α, power
, Δpower
~
α, n, Δ power
, α ~
β/α, Δ, n~
power
, Δ, n~
α, power
, nAll four building blocks combined, and one obtained based on the others. So far, worked with apriori, to get the sample size. But also popular, to get the power, then you need alpha, n and delta, (post hoc) OR the relation between alpha and beta (compromise) Not sure why you would extract alpha, this is typically under control but you could use delta, often done, but maybe not always ok, see what effect size is possible with the available data.
reference example
:power for the reference was .8, we find it as such with half the size of sample, the effect size goes up a bit .5 to .7714 when using a ratio, it is .1 and .4, or .05 and .2,
reference example
:reference
case# calculator
m1=0;m2=2;s1=4;s2=4
alpha=.025;N=128
var=.5*s1^2+.5*s2^2
d=abs(m1-m2)/sqrt(2*var)*sqrt(N/2)
tc=tinv(1-alpha,N-1)
power=1-nctcdf(tc,N-1,d)
R
Ho
( Z1−α/2 )Ha
(non-central).n <- 64.df <- 2*.n-2.ncp <- 2 / (4 * sqrt(2)) * sqrt(.n).power <- 1 - pt( qt(.975,df=.df), df=.df, ncp=.ncp ) - pt( qt(.025,df=.df), df=.df, ncp=.ncp)round(.power,4)
## [1] 0.8015
You can calculate in Gpower, but, why would you do that. In R, get the cutoff on Ho, get probability on Ha, simple The two sided, has one side almost 0
So far, comparing two independent means
From now on, selected topics beyond independent t-test
with small exercises
Look into GPower manual
27 tests → effect size, non-centrality parameter and example !!
If 2 dependent groups (eg., before/after treatment) → account for correlations
Correlation typically obtained from pilot data, earlier research
GPower: matched pairs (t-test / means, difference 2 dependent means)
reference example
,Expect non-normally distributed residuals, not possible to avoid (eg., transformations)
Only considers ranks or uses permutations → price is efficiency and flexibility
Requires parent distribution (alternative hypothesis), 'min ARE' should be default
GPower: two groups → Wilcoxon-Mann-Whitney (t-test / means, diff. 2 indep. means)
reference example
reference example
, with normal parent distribution, how much efficiency is lost ?reference example
, with normal parent distribution, how much efficiency is lost ?Differences between groups → relation observations & grouping (categorization)
Example → d = .5 → r = .243 (note: slope β=r∗σy/σx)
Difference between groups or relation → ratio between and within group variance
GPower: regression coefficient (t-test / regression, fixed model single regression coef)
reference example
, regression style (sd of effect and error, but squared)Note:
reference example
, regression style (sd of effect and error, but squared)reference example
, regression style (sd of effect and error, but squared)Multiple groups → not one effect size d
F-test statistic & effect size f
, ratio of variances σ2between/σ2within
σ2between = variance between groups differences
σ2within = variance within group differences
Example: one control and two treatments
reference example
+ 1 groupDifference between some groups → at least two differ
GPower: one-way Anova (F-test / Means, ANOVA - fixed effects, omnibus, one way)
reference example
, just 2 groups C and T1 (size=64)!reference example
, just 2 groups C and T1 (size=64)!reference example
, just 2 groups C and T1 (size=64)!Assume one control, and two treatments
Apply Bonferroni correction for original 3 group example (0, 2, 4)
Contrasts are linear combinations → planned comparison
Effect sizes for planned comparisons must be calculated !!
Each contrast
Multiple testing correction may be appropriate
group means μi
pre-specified coefficients ci
sample sizes ni
total sample size N
σcontrast=|∑μi∗ci|√N∑kic2i/ni
GPower: one-way ANOVA (F-test / Means, ANOVA-fixed effects,special,main,interaction)
Obtain effect sizes for contrasts (assume equally sized for convenience)
Sample size for each contrast, each 1 df
Multiple main effects and possibly interaction effects (eg., treatment and type)
GPower: multiway ANOVA (F-test / Means, ANOVA-fixed effects,special,main,interaction)
reference example
,reference example
,reference example
,If repeated measures → account for correlations within
Possible to focus on:
Correlation within unit (eg., within subject)
Beware: effect size could include or exclude correlation
GPower: repeated measures (F-test / Means, repeated measures...)
GPower: repeated measures (F-test / Means, repeated measures within factors)
Use effect size f = .25 (1/16 explained versus unexplained)
GPower: repeated measures (F-test / Means, repeated measures between factors)
Use effect size f = .25 (1/16 explained versus unexplained)
GPower: repeated measures (F-test / Means, repeated measures within-between factors)
Option: calculate effect sizes: http://apps.icds.be/effectSizes/
If comparing two independent correlations
Use Fisher Z transformations to normalize first
GPower: z-tests / correlation & regressions: 2 indep. Pearson r's
Note that dependent correlations are more difficult, see manual
If comparing two independent proportions → bounded between 0 and 1
GPower: Fisher Exact Test (exact / proportions, difference 2 independent proportions)
Effect sizes in odds ratio, relative risk, difference proportion
For odds ratio = 2, with p2 reference probability .6
Plot power over proportions .5 to 1
Include 5 curves, sample sizes 328, 428, 528...
With type I error .05
Explain curve minimum, relation sample size ?
Repeat for one-tailed, difference ?
If comparing two dependent proportions → categorical shift
GPower: McNemar test (exact / proportions, difference 2 dependent proportions)
Various statistical tests difficult to specify in GPower
Various statistical tests not included in GPower
Various statistical tests no formula to offer sample size
gr <- rep(c('T','C'),64)y <- ifelse(gr=='C',0,2)dta <- data.frame(y=y,X=gr)cutoff <- qt(.025,nrow(dta))my_sim_function <- function(){ dta$y <- dta$y+rnorm(length(dta$X),0,4) # generate (with sd=4) res <- t.test(data=dta,y~X) # analyze c(res$estimate %*% c(-1,1),res$statistic,res$p.value)}sims <- replicate(10000,my_sim_function()) # many iterationsdimnames(sims)[[1]] <- c('diff','t.stat','p.val')mean(sims['p.val',] < .05) # p-values 0.8029mean(sims['t.stat',] < cutoff) # t-statistics 0.8029mean(sims['diff',] > sd(sims['diff',])*cutoff*(-1)) # differences 0.8024
Complex statistical models
Sample size calculations (design) for simpler research aim
Example:
Sample size calculation is a design issue, not a statistical one
Building blocks: sample & effect sizes, type I & II errors
Effect sizes express the amount of signal compared to the background noise
GPower deals with not too complex models
Methodological and statistical support to help make a difference
SQUARE provides complementary support in methodology and statistics to our research community, for both individual researchers and research groups, in order to get the best out of them
SQUARE aims to address all questions related to quantitative research, and to further enhance the quality of both the research and how it is communicated
website: https://square.research.vub.be/ includes information on who we serve, and how
booking: https://square.research.vub.be/bookings for individual consultations