Given an input contingency table,
fun.chisq.test()
offers three quantities to evaluate
non-parametric functional dependency of the column variable Y on the row variable X. They include the functional
chi-squared test statistic (χf2),
statistical significance (p-value), and effect size (function
index ξf).
We explain their differences in analogy to those statistics returned
from cor.test()
, the R function for the test of
correlation, and the t-test.
We chose both tests because they are widely used and well understood.
Another choice could be the Pearson’s chi-squared test plus a statistic
called Cramer’s V, analogous to correlation coefficient, but not as
popularly used. The table below summarizes the differences among the
quantities and their analogous counterparts in correlation and t tests.
Quantity | Measure functional dependency? | Affected by sample size? | Affected by table size? | Measure statistical significance? | Counterpart in correlation test | Counterpart in two-sample t-test |
---|---|---|---|---|---|---|
χf2 | Yes | Yes | Yes | No | t-statistic | t-statistic |
p-value | Yes | Yes | Yes | Yes | p-value | p-value |
ξf | Yes | No | No | No | correlation coefficient | mean difference |
The test statistic χf2
measures deviation of Y from a
uniform distribution contributed by X. It is maximized when there is a
functional relationship from X
to Y. This statistic is also
affected by sample size and the size of the contingency table. It
summarizes the strength of both functional dependency and support from
the sample. A strong function supported by few samples may have equal
χf2
to a weak function supported by many samples. It is analogous to the
test statistic (not to be confused with correlation coefficient) in
cor.test()
, or the t statistic from the t-test.
The p-value of χf2
overcomes the table size factor and making tables of different sizes or
sample sizes comparable. However, its null distribution (chi-squared or
normalized) is only asymptotically true. It is analogous to the role of
the p-value of
cor.test()
.
The function index ξf measures
only the strength of functional dependency normalized by sample
and table sizes without considering statistical significance. When the
sample size is small, the index can be unreliable; when the sample size
is large, it is a direct measure of functional dependency and is
comparable across tables. It is analogous to the role of correlation
coefficient in cor.test()
, or fold change in t-test for differential gene
expression analysis.