Measuring functional dependency model-free

Given an input contingency table, fun.chisq.test() offers three quantities to evaluate non-parametric functional dependency of the column variable Y on the row variable X. They include the functional chi-squared test statistic (χ_f²), statistical significance (p-value), and effect size (function index ξ_f).

We explain their differences in analogy to those statistics returned from cor.test(), the R function for the test of correlation, and the t-test. We chose both tests because they are widely used and well understood. Another choice could be the Pearson’s chi-squared test plus a statistic called Cramer’s V, analogous to correlation coefficient, but not as popularly used. The table below summarizes the differences among the quantities and their analogous counterparts in correlation and t tests.

**Comparison of the three quantities returned from `fun.chisq.test()`.**
Quantity	Measure functional dependency?	Affected by sample size?	Affected by table size?	Measure statistical significance?	Counterpart in correlation test	Counterpart in two-sample t-test
χ_f²	Yes	Yes	Yes	No	t-statistic	t-statistic
p-value	Yes	Yes	Yes	Yes	p-value	p-value
ξ_f	Yes	No	No	No	correlation coefficient	mean difference

The test statistic χ_f² measures deviation of Y from a uniform distribution contributed by X. It is maximized when there is a functional relationship from X to Y. This statistic is also affected by sample size and the size of the contingency table. It summarizes the strength of both functional dependency and support from the sample. A strong function supported by few samples may have equal χ_f² to a weak function supported by many samples. It is analogous to the test statistic (not to be confused with correlation coefficient) in cor.test(), or the t statistic from the t-test.

The p-value of χ_f² overcomes the table size factor and making tables of different sizes or sample sizes comparable. However, its null distribution (chi-squared or normalized) is only asymptotically true. It is analogous to the role of the p-value of cor.test().

The function index ξ_f measures only the strength of functional dependency normalized by sample and table sizes without considering statistical significance. When the sample size is small, the index can be unreliable; when the sample size is large, it is a direct measure of functional dependency and is comparable across tables. It is analogous to the role of correlation coefficient in cor.test(), or fold change in t-test for differential gene expression analysis.