Chapter 3 Oneway ANOVA Page 1. Review of previous tests 3-2 2. What is ANOVA? 3-3 3. Terminology in ANOVA 3-4 4. Understanding the F-distribution 3-6 Sampling distribution of the variance Confidence intervals for the variance F-test for the equality of two variances 5. Three ways to understand ANOVA 3-16 ANOVA as a generalization of the t-test The structural model approach The variance partitioning approach 6. A recap of the F-test 3-33 7. The ANOVA table 3-34 8. Confidence intervals in ANOVA 3-35 9. Effect sizes in ANOVA 3-37 10. ANOVA in SPSS 3-40 11. Examples 3-44 12. Power 3-54 Power for t-tests Power for ANOVA Appendix A. Post-Hoc Power Analyses 3-58 3-1 © 2006 A. Karpinski Oneway ANOVA 1. Review of the tests we have covered so far • One sample with interval scale DV o One sample z-test Used to compare a sample mean to a hypothesized value when the population is normally distributed with a known variance. o One sample t-test Used to compare a sample mean to a hypothesized value when the population is normally distributed (or large) with unknown variance. • Two-independent samples with interval scale DV o Two independent samples t-test Used to compare the difference of two sample means to a hypothesized value (usually zero) when both populations are normally distributed with unknown but equal variances. o Welch’s two independent samples t-test Used to compare the difference of two sample means to a hypothesized value (usually zero) when both populations are normally distributed with unknown variances that may or may not be equal. • Two-independent samples tests ordinal DV o Mann-Whitney U test A non-parametric test used to measure the separation between two sets of sample scores (using the rank of the observations). Can also be used in place of the two independent samples t-test when the data do not satisfy t-test assumptions. • Two (or more) nominal variables o Pearson Chi-square test of independence A non-parametric test used to test the independence of (or association between) two or more variables. 3-2 © 2006 A. Karpinski 2. What is an Analysis of Variance (ANOVA)? Because sometimes, two groups are just not enough . . . • An Advertising Example: What makes an advertisement more memorable? Three conditions: o Color Picture Ad o Black and White Picture Ad o No Picture Ad o DV was preference for the ad on an 11 point scale 12 10 8 6 d A 4 r o e f c n 2 e r e ef Pr 0 N = 7 7 7 Color Picture Black & White Pictur No Picture Type of Ad ANOVA Preference for Ad Sum of Squares df Mean Square F Sig. Between Groups 25.810 2 12.905 2.765 .090 Within Groups 84.000 18 4.667 Total 109.810 20 3-3 © 2006 A. Karpinski 3. Terminology in ANOVA/Experimental Design • Overview of Experimental Design • Terminology o Factor = Independent variable o Level = Different amounts/aspects of the IV o Cell = A specific combination of levels of the IVs • A one-way ANOVA is a design with only one factor Factor A Level 1 Level 2 Level 3 Level 4 Level 5 x x x x x 11 12 13 14 15 x x x x x 21 22 23 24 25 x x x x x 31 32 33 34 35 x x x x x 41 42 43 44 45 x x x x 51 52 53 54 x 62 X . X . X . X . X . 1 2 3 4 5 n n n n n 1 2 3 4 5 x i = indicator for subject within level j ij j = indicator for level of factor A Note that the null hypothesis is now a bit less intuitive: H : µ=µ =µ =µ =µ 0 1 2 3 4 5 H : Not all µs are equal 1 i The alternative hypothesis is NOT µ≠µ ≠µ ≠µ ≠µ 1 2 3 4 5 The null and alternative hypotheses must be: • mutually exclusive • exhaustive The overall test of this null hypothesis is referred to as the omnibus F-test. 3-4 © 2006 A. Karpinski • A two way ANOVA has two factors. It is usually specified as an A*B design A = the number of levels of the first factor B = the number of levels of the second factor x i = indicator for subject within level jk ijk j = indicator for level of factor A k = indicator for level of factor B o Example of a 4x3 design Factor A Level A1 Level A2 Level A3 Level A4 Level B1 X . X . X . X . X .. 11 21 31 41 1 Level B2 X . X . X . X . X .. 12 22 32 42 2 Level B3 X . X . X . X . X .. 13 23 33 43 3 X . . X . . X . . X . . X … 1 2 3 4 o Let’s take a closer look at cell 23 x 123 x 223 x 323 . . . x n23 X . 23 o And now there are multiple effects to test The effect of Factor A H : µ..=µ. .=µ. .=µ. . 0 1 2 3 4 The effect of Factor B H : µ.. =µ.. =µ.. 0 1 2 3 The effect of the combination of Factor A and Factor B To keep things simple, we will stick to the one-way ANOVA design for as long as possible! 3-5 © 2006 A. Karpinski 4. Understanding the F-distribution • Let’s take a step back and examine the sampling distribution of s2 o We’ll start by making no assumptions. x −µ= x −µ+(X − X ) Add and subtract X i i = x −X + X −µ i Re-arrange terms =(x −X )+(X −µ) i o Now square both sides of the equation: [And remember from high school algebra: (a+b)2 =a2+b2+2ab] (x −µ)2 =[(x − X )+(X −µ)]2 i i =(x −X )2+(X −µ)2+2(x − X )(X −µ) i i o This equation is true for each of the n observations in the sample. Next, let’s add all n equations and simplify: n n ∑(x −µ)2 =∑[(x − X )2+(X −µ)2 +2(x − X )(X −µ)] i i i i=1 i=1 n n n =∑(x − X )2 +∑(X −µ)2+∑2(x −X )(X −µ) i i i=1 i=1 i=1 o Note that 2 and (X −µ) are constants with respect to summation over i. Constants can be moved outside of the summation n n n n ∑(x −µ)2 =∑(x −X )2 +(X −µ)2∑1+2(X −µ)∑(x − X ) i i i i=1 i=1 i=1 i=1 o We can use two facts to simplify this equation: n ∑1= n i=1 n ∑(x − X )=0 i i=1 n n ∑(x −µ)2 =∑(x −X )2 +n(X −µ)2+0 i i i=1 i=1 3-6 © 2006 A. Karpinski o Next, let’s divide both sides of the equation by σ2 n n ∑(x −µ)2 ∑(x −X )2 i i n(X −µ)2 i=1 = i=1 + σ2 σ2 σ2 o And then rearrange the terms n  x −µ 2 1 n  (X −µ) 2 ∑ i  = ∑(x −X )2 +  (eq. 3-1)    σ  σ2 i  σ n  i=1 i=1 o Up to this point, we have made no assumptions about X. To make additional progress, we now have to make a few assumptions • X is normally distributed. That is X ~ N(µ,σ) • Each x in the sample is independently sampled i n x −µ2 o First, let’s consider the left side of eq. 3-1: ∑ i   σ  i=1 x −µ i is the familiar form of a z-score σ 2 n x −µ Hence ∑ i  is the sum of n squared z-scores  σ  i=1 • From our review of the Chi-square distribution we know that o One squared z-score has a chi-square distribution with 1df o The sum of N squared z-scores have a chi-square distribution with N degrees of freedom • Now we can say the left hand side of eq 3-1 has a Chi-square distribution 2 n x −µ n ∑ i  = ∑(z )2 ~χ2  σ  i n i=1 i=1 3-7 © 2006 A. Karpinski   2 (X −µ) o Next, let’s consider      σ n  • We know that the sampling distribution of the mean for data sampled from a normal distribution is also normally distributed:  σ  X ~ Nµ ,   n  (X −µ) • Hence, is also a z-score σ n   2 (X −µ) •   is a single squared z-score. But squared z-scores follow a    σ n    2 (X −µ) chi-square distribution. So we know that   ~χ2    σ n  1 o Putting the pieces together, we can rewrite eq. 3-1 as 1 n χ2 ~ ∑(x −X )2+χ2 n σ2 i 1 i=1 1 n χ2−χ2 ~ ∑(x − X )2 n 1 σ2 i i=1 o Because of the additivity of independent chi-squared variables, this equation simplifies to: 1 n χ2 ~ ∑(x − X )2 n−1 σ2 i i=1 o Now let’s divide both sides of the equation by n-1 n ∑(x − X )2 χ2 1 i n−1 ~ i=1 n−1 σ2 n −1 3-8 © 2006 A. Karpinski o We recognize σˆ 2 and substitute it into the equation χ2 1 n−1 ~ σˆ 2 n−1 σ2 o Rearranging, we finally obtain: σ2χ2 σˆ 2 ~ n−1 n−1 • In other words, with the assumptions of normality and independence, σˆ 2 has a chi-squared distribution. (But notice, σ2 must be known!) • Sampling Distribution of the Variance o Assumption: X is drawn from a normally distributed population: X ~ N(µ,σ ) X X Then for a sample of size n: σ2χ2 σˆ 2 ~ x n−1 n−1 o Facts about the Chi-square distribution: ( ) E χ2 = n n ( ) Var χ2 = 2n n o We can use these facts to check if σˆ 2 is an unbiased and consistent estimator of the population variance. • What is the expected value of σˆ 2? σ 2χ2  E(σˆ 2) = E n−1    n −1  σ2 ( ) = E χ2 n−1 n−1 σ2 = (n−1) n−1 =σ2 σˆ 2 is an unbiased estimator of σ2 3-9 © 2006 A. Karpinski • What is the variance of the sampling distribution of σˆ 2?  σ2χ2  Var(σˆ 2)=Var n−1    n−1  2  σ2  ( ) =  Var χ2 n−1 n−1 σ4 = 2(n−1) (n−1)2 2σ4 = σˆ 2 is a consistent estimator of σ2 ( ) n−1 o Example #1: Suppose we have a sample n=10 from X ~ N(0,4) [σ2 =16] σ2χ2 16χ2 σˆ 2 ~ x n−1 = 9 =1.778*χ2 n−1 9 9 2σ4 2*256 E(σˆ 2) =σ2 =16 Var(σˆ 2)= = =56.889 (n−1) 9 Simulated Sampling Distribution of the Variance (n=10) 0.2 0.15 0.1 0.05 0 1 5 9 3 7 1 5 9 3 7 1 1 2 2 2 3 3 Sample Variance 3-10 © 2006 A. Karpinski

Chapter 3 Oneway ANOVA Page 1. Review of previous tests 3-2 2. What is ANOVA? PDF

61 Pages

2007

0.37 MB

English

#anova

Checking for file health...

Preview Chapter 3 Oneway ANOVA Page 1. Review of previous tests 3-2 2. What is ANOVA?

Chapter 3 Oneway ANOVA Page 1. Review of previous tests 3-2 2. What is ANOVA? 3-3 3. Terminology in ANOVA 3-4 4. Understanding the F-distribution 3-6 Sampling distribution of the variance Confidence intervals for the variance F-test for the equality of two variances 5. Three ways to understand ANOVA 3-16 ANOVA as a generalization of the t-test The structural model approach The variance partitioning approach 6. A recap of the F-test 3-33 7. The ANOVA table 3-34 8. Confidence intervals in ANOVA 3-35 9. Effect sizes in ANOVA 3-37 10. ANOVA in SPSS 3-40 11. Examples 3-44 12. Power 3-54 Power for t-tests Power for ANOVA Appendix A. Post-Hoc Power Analyses 3-58 3-1 © 2006 A. Karpinski Oneway ANOVA 1. Review of the tests we have covered so far • One sample with interval scale DV o One sample z-test Used to compare a sample mean to a hypothesized value when the population is normally distributed with a known variance. o One sample t-test Used to compare a sample mean to a hypothesized value when the population is normally distributed (or large) with unknown variance. • Two-independent samples with interval scale DV o Two independent samples t-test Used to compare the difference of two sample means to a hypothesized value (usually zero) when both populations are normally distributed with unknown but equal variances. o Welch’s two independent samples t-test Used to compare the difference of two sample means to a hypothesized value (usually zero) when both populations are normally distributed with unknown variances that may or may not be equal. • Two-independent samples tests ordinal DV o Mann-Whitney U test A non-parametric test used to measure the separation between two sets of sample scores (using the rank of the observations). Can also be used in place of the two independent samples t-test when the data do not satisfy t-test assumptions. • Two (or more) nominal variables o Pearson Chi-square test of independence A non-parametric test used to test the independence of (or association between) two or more variables. 3-2 © 2006 A. Karpinski 2. What is an Analysis of Variance (ANOVA)? Because sometimes, two groups are just not enough . . . • An Advertising Example: What makes an advertisement more memorable? Three conditions: o Color Picture Ad o Black and White Picture Ad o No Picture Ad o DV was preference for the ad on an 11 point scale 12 10 8 6 d A 4 r o e f c n 2 e r e ef Pr 0 N = 7 7 7 Color Picture Black & White Pictur No Picture Type of Ad ANOVA Preference for Ad Sum of Squares df Mean Square F Sig. Between Groups 25.810 2 12.905 2.765 .090 Within Groups 84.000 18 4.667 Total 109.810 20 3-3 © 2006 A. Karpinski 3. Terminology in ANOVA/Experimental Design • Overview of Experimental Design • Terminology o Factor = Independent variable o Level = Different amounts/aspects of the IV o Cell = A specific combination of levels of the IVs • A one-way ANOVA is a design with only one factor Factor A Level 1 Level 2 Level 3 Level 4 Level 5 x x x x x 11 12 13 14 15 x x x x x 21 22 23 24 25 x x x x x 31 32 33 34 35 x x x x x 41 42 43 44 45 x x x x 51 52 53 54 x 62 X . X . X . X . X . 1 2 3 4 5 n n n n n 1 2 3 4 5 x i = indicator for subject within level j ij j = indicator for level of factor A Note that the null hypothesis is now a bit less intuitive: H : µ=µ =µ =µ =µ 0 1 2 3 4 5 H : Not all µs are equal 1 i The alternative hypothesis is NOT µ≠µ ≠µ ≠µ ≠µ 1 2 3 4 5 The null and alternative hypotheses must be: • mutually exclusive • exhaustive The overall test of this null hypothesis is referred to as the omnibus F-test. 3-4 © 2006 A. Karpinski • A two way ANOVA has two factors. It is usually specified as an A*B design A = the number of levels of the first factor B = the number of levels of the second factor x i = indicator for subject within level jk ijk j = indicator for level of factor A k = indicator for level of factor B o Example of a 4x3 design Factor A Level A1 Level A2 Level A3 Level A4 Level B1 X . X . X . X . X .. 11 21 31 41 1 Level B2 X . X . X . X . X .. 12 22 32 42 2 Level B3 X . X . X . X . X .. 13 23 33 43 3 X . . X . . X . . X . . X … 1 2 3 4 o Let’s take a closer look at cell 23 x 123 x 223 x 323 . . . x n23 X . 23 o And now there are multiple effects to test The effect of Factor A H : µ..=µ. .=µ. .=µ. . 0 1 2 3 4 The effect of Factor B H : µ.. =µ.. =µ.. 0 1 2 3 The effect of the combination of Factor A and Factor B To keep things simple, we will stick to the one-way ANOVA design for as long as possible! 3-5 © 2006 A. Karpinski 4. Understanding the F-distribution • Let’s take a step back and examine the sampling distribution of s2 o We’ll start by making no assumptions. x −µ= x −µ+(X − X ) Add and subtract X i i = x −X + X −µ i Re-arrange terms =(x −X )+(X −µ) i o Now square both sides of the equation: [And remember from high school algebra: (a+b)2 =a2+b2+2ab] (x −µ)2 =[(x − X )+(X −µ)]2 i i =(x −X )2+(X −µ)2+2(x − X )(X −µ) i i o This equation is true for each of the n observations in the sample. Next, let’s add all n equations and simplify: n n ∑(x −µ)2 =∑[(x − X )2+(X −µ)2 +2(x − X )(X −µ)] i i i i=1 i=1 n n n =∑(x − X )2 +∑(X −µ)2+∑2(x −X )(X −µ) i i i=1 i=1 i=1 o Note that 2 and (X −µ) are constants with respect to summation over i. Constants can be moved outside of the summation n n n n ∑(x −µ)2 =∑(x −X )2 +(X −µ)2∑1+2(X −µ)∑(x − X ) i i i i=1 i=1 i=1 i=1 o We can use two facts to simplify this equation: n ∑1= n i=1 n ∑(x − X )=0 i i=1 n n ∑(x −µ)2 =∑(x −X )2 +n(X −µ)2+0 i i i=1 i=1 3-6 © 2006 A. Karpinski o Next, let’s divide both sides of the equation by σ2 n n ∑(x −µ)2 ∑(x −X )2 i i n(X −µ)2 i=1 = i=1 + σ2 σ2 σ2 o And then rearrange the terms n  x −µ 2 1 n  (X −µ) 2 ∑ i  = ∑(x −X )2 +  (eq. 3-1)    σ  σ2 i  σ n  i=1 i=1 o Up to this point, we have made no assumptions about X. To make additional progress, we now have to make a few assumptions • X is normally distributed. That is X ~ N(µ,σ) • Each x in the sample is independently sampled i n x −µ2 o First, let’s consider the left side of eq. 3-1: ∑ i   σ  i=1 x −µ i is the familiar form of a z-score σ 2 n x −µ Hence ∑ i  is the sum of n squared z-scores  σ  i=1 • From our review of the Chi-square distribution we know that o One squared z-score has a chi-square distribution with 1df o The sum of N squared z-scores have a chi-square distribution with N degrees of freedom • Now we can say the left hand side of eq 3-1 has a Chi-square distribution 2 n x −µ n ∑ i  = ∑(z )2 ~χ2  σ  i n i=1 i=1 3-7 © 2006 A. Karpinski   2 (X −µ) o Next, let’s consider      σ n  • We know that the sampling distribution of the mean for data sampled from a normal distribution is also normally distributed:  σ  X ~ Nµ ,   n  (X −µ) • Hence, is also a z-score σ n   2 (X −µ) •   is a single squared z-score. But squared z-scores follow a    σ n    2 (X −µ) chi-square distribution. So we know that   ~χ2    σ n  1 o Putting the pieces together, we can rewrite eq. 3-1 as 1 n χ2 ~ ∑(x −X )2+χ2 n σ2 i 1 i=1 1 n χ2−χ2 ~ ∑(x − X )2 n 1 σ2 i i=1 o Because of the additivity of independent chi-squared variables, this equation simplifies to: 1 n χ2 ~ ∑(x − X )2 n−1 σ2 i i=1 o Now let’s divide both sides of the equation by n-1 n ∑(x − X )2 χ2 1 i n−1 ~ i=1 n−1 σ2 n −1 3-8 © 2006 A. Karpinski o We recognize σˆ 2 and substitute it into the equation χ2 1 n−1 ~ σˆ 2 n−1 σ2 o Rearranging, we finally obtain: σ2χ2 σˆ 2 ~ n−1 n−1 • In other words, with the assumptions of normality and independence, σˆ 2 has a chi-squared distribution. (But notice, σ2 must be known!) • Sampling Distribution of the Variance o Assumption: X is drawn from a normally distributed population: X ~ N(µ,σ ) X X Then for a sample of size n: σ2χ2 σˆ 2 ~ x n−1 n−1 o Facts about the Chi-square distribution: ( ) E χ2 = n n ( ) Var χ2 = 2n n o We can use these facts to check if σˆ 2 is an unbiased and consistent estimator of the population variance. • What is the expected value of σˆ 2? σ 2χ2  E(σˆ 2) = E n−1    n −1  σ2 ( ) = E χ2 n−1 n−1 σ2 = (n−1) n−1 =σ2 σˆ 2 is an unbiased estimator of σ2 3-9 © 2006 A. Karpinski • What is the variance of the sampling distribution of σˆ 2?  σ2χ2  Var(σˆ 2)=Var n−1    n−1  2  σ2  ( ) =  Var χ2 n−1 n−1 σ4 = 2(n−1) (n−1)2 2σ4 = σˆ 2 is a consistent estimator of σ2 ( ) n−1 o Example #1: Suppose we have a sample n=10 from X ~ N(0,4) [σ2 =16] σ2χ2 16χ2 σˆ 2 ~ x n−1 = 9 =1.778*χ2 n−1 9 9 2σ4 2*256 E(σˆ 2) =σ2 =16 Var(σˆ 2)= = =56.889 (n−1) 9 Simulated Sampling Distribution of the Variance (n=10) 0.2 0.15 0.1 0.05 0 1 5 9 3 7 1 5 9 3 7 1 1 2 2 2 3 3 Sample Variance 3-10 © 2006 A. Karpinski

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.