**8.3**** ****Two-Way Balanced ANOVA** (.NET, C#, CSharp, VB, Visual Basic, F#)

Class **TwoWayAnova** performs a balanced
two-way analysis of variance. Two-way analysis
of variance is a direct extension of one-way analysis of variance (Section 8.1).
In this case, data are grouped according to two factors—for example,
*sex* and *age
group*—rather than a single factor. The total variability
is partitioned into components associated with each of the two factors,
their interaction, and the residual (or error).

**Creating Two-Way ANOVA Objects**

A **TwoWayAnova**
instance is constructed from data in a data frame. Three column indices
are specified in the data frame: the column containing the first factor,
the column containing the second factor, and the column containing the
numeric data. For example, this code groups the numeric data in column
3 of **DataFrame**
df by factors constructed from columns 0 and 4:

Code Example – C# ANOVA

var anova = new TwoWayAnova( df, 0, 4, 3 );

**Factor** objects
are constructed from the factor columns using the **DataFrame**
method GetFactor(), which creates a sorted
array of the unique values (Section 2.10).
The indicated data column must be of type **DFNumericColumn**.

**NOTE—****Class
TwoWayAnova throws an InvalidArgumentException if the data contains missing
values (NaNs).**

Once you've constructed a **TwoWayAnova**,
you can display the complete ANOVA table:

Code Example – C# ANOVA

Console.WriteLine( anova );

For example:

Source Deg of Freedom SumOfSq Mean Square F P FactorA 1 1782.0450 1782.0450 14.2121 0.0008 FactorB 1 2838.8113 2838.8113 22.6399 0.0001 Interaction 1 108.0450 108.0450 0.8617 0.3612 Error 28 3510.9075 125.3896 . . Total 31 8239.8088 . . .

Class **TwoWayAnovaTable** is provided
for summarizing the information in a traditional two-way ANOVA table.
Class **TwoWayAnovaTable** derives
from **DataFrame**. An instance of
**TwoWayAnovaTable** can be obtained
from a **TwoWayAnova** object using
the AnovaTable property. For example:

Code Example – C# ANOVA

TwoWayAnovaTable myTable = anova.AnovaTable;

Class **TwoWayAnovaTable**
provides the following member functions and read

● DegreesOfFreedom() gets the degrees of freedom for a specified factor.

● ErrorDegreesOfFreedom gets the number of degrees of freedom for the error.

● InteractionDegreesOfFreedom gets the number of degrees of freedom for the interactions.

● TotalDegreesOfFreedom gets the total number of degrees of freedom.

● SumOfSquares() gets the sum of squares for a specified factor.

● InteractionSumOfSquares gets the sum of squares for the interaction.

● ErrorSumOfSquares gets the sum of squares for the error.

● TotalSumOfSquares gets the total sum of squares.

● MeanSquare() gets the mean square for a specified factor.

● InteractionMeanSquare gets the mean square for the interaction.

● ErrorMeanSquare gets the mean square for the error.

● Fstatistic() gets the *F*
statistic for a specified factor.

● InteractionFstatistic gets the *F*
statistic for the interaction.

● FstatisticPvalue() gets the *p*-value
for the *F* statistic for a specified
factor.

● InteractionFstatisticPvalue gets the *p*-value for the *F*
statistic for the interaction.

Factors are identified to accessor methods by name,
which corresponds to the name of the column in the original data frame
that was used to create the **Factor**.
For instance, if one factor in the ANOVA is named Dosage,
this code gets the *F *statistic
and *p*

Code Example – C# ANOVA

double Fstatistic = anova.AnovaTable.Fstatistic( "Dosage" ); double Pvalue = anova.AnovaTable.FstatisticPvalue( "Dosage" );

Class **TwoWayAnova**
provides the GetCellData() method for accessing
the data in a cell, as defined by a specified level of each of the factors
in the ANOVA. For example, if anova has
factor Sex with levels Male
and Female, and factor AgeGroup
with levels Child, Adult,
and Senior, this code gets the data for
adult females:

Code Example – C# ANOVA

DFNumericColumn data = anova.GetCellData( "Sex", "Female", "AgeGroup", "Adult" );

A copy of the data is returned as a **DFNumericColumn** object.

**Grand Mean, Cell Means, and Group Means**

Class **TwoWayAnova**
provides the following properties and member functions for accessing
the grand mean, cell means, and group means:

● GrandMean gets the grand mean. The grand mean is the mean of all the data.

● GetMeanForCell() returns the mean for a specified cell.

● GetMeanForFactorLevel() returns the mean for a specified factor level.

Again, factors and factor levels are identified to accessor methods by name. For example, if anova has factor Sex with levels Male and Female, and factor AgeGroup with levels Child, Adult, and Senior, this code gets the mean for all males:

Code Example – C# ANOVA

double meanM = anova.GetMeanForFactorLevel( "Sex", "Male" );

This code gets the mean for male children:

Code Example – C# ANOVA

double meanMChild = anova.GetMeanForCell( "Sex", "Male", "AgeGroup", "Child" );

**NMath Stats**
solves the two-way ANOVA problem using multiple linear regression. If
all you wish to know is the information in the standard ANOVA table,
you can safely ignore the regression details, but properties and member
functions are provided for retrieving information about the underlying
regression parameters.

To solve the two-way ANOVA problem using multiple linear
regression, **NMath Stats**
creates a series of *dummy variables*
to encode the different levels of each of the two factors. The specific
encoding used, known as *effects encoding*, encodes dummy
variables so that the coefficients of the dummy variables in the regression
model quantify deviations of each group from the grand mean.^{1}

In the effects encoding, dummy variables are defined to encode the levels of a factor, like so:

and so on, up to for group .

For example, suppose we have an experimental design
with two factors: FactorA and FactorB. FactorA
has two levels, labelled A1 and A1. Effects encoding defines *one*
dummy variable for FactorA:

FactorB has three levels,
labelled B1, B2,
and B3. Effects encoding defines *two* dummy variable for FactorB:

Combined, these three dummy variables completely identify all the combinations of FactorA and FactorB. The multiple regression model is then:

where

● the intercept is an estimate of the grand mean

● estimates the difference between the grand mean and the mean of A1

● is the difference between the grand mean and the mean of A2

● estimates the difference between the grand mean and the mean of B1

● estimates the difference between the grand mean and the mean of B2

● estimates the difference between the grand mean and the mean of B3

**NMath Stats**
includes several classes that derive from **LinearRegressionParameter**,
and provide access to the dummy variable regression parameters in an ANOVA
analysis of variance:

● Class **AnovaRegressionParameter** provides
a SumOfSquares property that gets the sum
of squares due to a parameter.

● Class **AnovaRegressionFactorParam** derives
from **AnovaRegressionParameter** and
provides the additional properties FactorName,
which gets the name of the ANOVA factor encoded by a dummy variable,
FactorLevel, which gets the level of the
ANOVA factor encoded by a dummy variable, and Encoding,
which gets the actual encoding. The encoding is the value the dummy variable
assumes when an ANOVA observation is made with the factor at that level.

● Class **AnovaRegressionInteractionParam**
also derives from **AnovaRegressionParameter**
and provides the additional properties FactorAName
and FactorALevel, which get the name and
level of the first factor in the interaction, and FactorBName
and FactorBLevel, which get the name and
level of the second factor in the interaction.

Of course, these classes also inherit from **LinearRegressionParameter** methods
such as TStatisticPValue(), TStatistic(), TStatisticCriticalValue(),
and ConfidenceInterval() for testing statistical
hypotheses regarding parameter values in a linear regression (Section 6.5).

Instances of these classes cannot be constructed independently.
Instead, they are returned by properties and member functions on class
**TwoWayAnova**:

● RegressionInterceptParameter gets the intercept
parameter in the linear regression as an **AnovaRegressionParameter**.

● GetRegressionFactorParameter() returns the **AnovaRegressionFactorParam** associated
with a specified factor level.

● RegressionFactorParameters gets a complete array
of **AnovaRegressionFactorParam** estimates
for the different factor levels.

● GetRegressionInteractionParameter() returns the
**AnovaRegressionInteractionParam**
associated with the specified interaction.

● RegressionInteractionParameters gets a complete
array of **AnovaRegressionInteractionParam**
estimates for the interactions.

For example, this code gets the regression parameter for FactorA at level A1:

Code Example – C# ANOVA

AnovaRegressionFactorParam param anova.GetRegressionFactorParameter( "FactorA", "A1" ); Console.WriteLine( param );

Example output:

Value : 4.375 Standard Error : 1.63741694728596 t-Statistic for parameter = 0 : 2.67189124141632 p-value for t-Statistic : 0.0155516784650136 0.05 confidence interval : [9.3491E-001, 7.8151E+000]

Note that method GetRegressionFactorParameter() may return null. In the effects encoding method, there are dummy variables defined to encode the levels of a factor. Hence, one level does not have a dummy variable associated with it in the linear regression, and a null reference may be returned even though a valid factor level is specified. Thus:

Code Example – C# ANOVA

AnovaRegressionFactorParam param = anova.GetRegressionFactorParameter( "FactorA", "A2" ); // param == null

Similarly, method GetRegressionInteractionParameter() may return null. If there are different levels for the first factor and different levels for the second factor, there are dummy variables corresponding to the interactions. Hence, some interactions do not have a dummy variable associated with them in the linear regression, and a null reference may be returned even though valid interactions are specified.

This code prints out the intercept regression parameter, all factor regression parameters, and all interaction regression parameters:

Code Example – C# ANOVA

Console.WriteLine( "Intercept" ); Console.WriteLine( anova.RegressionInterceptParameter ); Console.WriteLine(); AnovaRegressionFactorParam[] factorParams = anova.RegressionFactorParameters; for ( int i = 0; i < factorParams.Length; i++ ) { Console.WriteLine( factorParams[i].FactorLevel ); Console.WriteLine( factorParams[i] ); Console.WriteLine(); } AnovaRegressionInteractionParam[] interactionParams = anova.RegressionInteractionParameters; for ( int i = 0; i < interactionParams.Length; i++ ) { Console.WriteLine( interactionParams[i].FactorALevel + " x " + interactionParams[i].FactorBLevel ); Console.WriteLine( interactionParams[i] ); Console.WriteLine(); }

Example output:

Intercept Value : 28.875 Standard Error : 1.63741694728596 t-Statistic for parameter = 0: 17.6344821933477 p-value for t-Statistic : 8.35997937542743E-13 0.05 confidence interval : [2.5435E+001, 3.2315E+001] A1 Value : 4.375 Standard Error : 1.63741694728596 t-Statistic for parameter = 0: 2.67189124141632 p-value for t-Statistic : 0.0155516784650136 0.05 confidence interval : [9.3491E-001, 7.8151E+000] B1 Value : 25.5 Standard Error : 2.31565725411135 t-Statistic for parameter = 0: 11.0119923640365 p-value for t-Statistic : 1.98637151171965E-09 0.05 confidence interval : [2.0635E+001, 3.0365E+001] B2 Value : -7.25 Standard Error : 2.31565725411135 t-Statistic for parameter = 0: -3.13086057408882 p-value for t-Statistic : 0.00577563474636933 0.05 confidence interval : [-1.2115E+001, -2.3850E+000] A1 x B1 Value : 6 Standard Error : 2.31565725411135 t-Statistic for parameter = 0: 2.59105702683213 p-value for t-Statistic : 0.0184427158909004 0.05 confidence interval : [1.1350E+000, 1.0865E+001] A1 x B2 Value : -0.999999999999999 Standard Error : 2.31565725411135 t-Statistic for parameter = 0: -0.431842837805354 p-value for t-Statistic : 0.670984111233603 0.05 confidence interval : [-5.8650E+000, 3.8650E+000]

- S. A. Glantz and B. K. Slinker,
*Primer of Applied Regression & Analysis of Variance*(2nd ed.), NewYork, McGraw-Hill, 2001, pp. 357-358.