Title: | Pattern Analysis Across Contingency Tables |
---|---|
Description: | Statistical hypothesis testing of pattern heterogeneity via differences in underlying distributions across multiple contingency tables. Five tests are included: the comparative chi-squared test (Song et al. 2014) <doi:10.1093/nar/gku086> (Zhang et al. 2015) <doi:10.1093/nar/gkv358>, the Sharma-Song test (Sharma et al. 2021) <doi:10.1093/bioinformatics/btab240>, the heterogeneity test, the marginal-change test (Sharma et al. 2020) <doi:10.1145/3388440.3412485>, and the strength test (Sharma et al. 2020) <doi:10.1145/3388440.3412485>. Under the null hypothesis that row and column variables are statistically independent and joint distributions are equal, their test statistics all follow an asymptotically chi-squared distribution. A comprehensive type analysis categorizes the relation among the contingency tables into type null, 0, 1, and 2 (Sharma et al. 2020) <doi:10.1145/3388440.3412485>. They can identify heterogeneous patterns that differ in either the first order (marginal) or the second order (differential departure from independence). Second-order differences reveal more fundamental changes than first-order differences across heterogeneous patterns. |
Authors: | Ruby Sharma [aut] |
Maintainer: | Joe Song <[email protected]> |
License: | LGPL (>= 3) |
Version: | 0.1.3 |
Built: | 2025-01-25 03:18:58 UTC |
Source: | https://github.com/cran/DiffXTables |
Across given contingency tables, the test admits any type of differences in either the joint or marginal distributions underlying the tables.
cp.chisq.test( tables, method=c("chisq", "nchisq", "default", "normalized"), log.p = FALSE )
cp.chisq.test( tables, method=c("chisq", "nchisq", "default", "normalized"), log.p = FALSE )
tables |
a list of at least two nonnegative matrices or data frames representing contingency tables of the same dimensions. |
method |
a character string to specify the method to compute the chi-squared statistic and its p-value. The default is Note: |
log.p |
logical; if |
The comparative chi-squared test determines whether the patterns underlying multiple contingency tables are heterogeneous. Its null test statistic is proved to asymptotically follow the chi-squared distribution (Song et al. 2014; Zhang et al. 2015). This test is different from the heterogeneity test (Zar 2009).
Two methods are provided to compute the chi-squared statistic and its p-value. When method = "chisq"
(or "default"
), the p-value is computed using the chi-squared distribution; when method =
"nchisq"
(or "normalized"
) a normalized statistic is obtained by shifting and scaling the original chi-squared test statistic and a p-value is computed using the standard normal distribution
(George et al. 2005). The normalized test is more conservative on the degrees of freedom.
Either test statistic is minimized to zero if and only if the input tables are linearly scaled versions of each other.
The test is recommended to determine whether multiple contingency tables have the same distributions, regardless of independence of row and column variables in each table.
A list with class "htest
" containing the following components:
statistic |
chi-squared test statistic if |
parameter |
degrees of freedom of the chi-squared statistic. |
p.value |
p-value of the comparative chi-squared test. By default, it is computed by the chi-squared distribution ( |
Joe Song
George EP, Hunter JS, Hunter WG, Bins R, Kirlin IV K, Carroll D (2005).
Statistics for Experimenters: Design, Innovation, and Discovery.
Wiley, New York.
Song M, Zhang Y, Katzaroff AJ, Edgar BA, Buttitta L (2014).
“Hunting complex differential gene interaction patterns across molecular contexts.”
Nucleic Acids Research, 42(7), e57.
doi:10.1093/nar/gku086.
Zar JH (2009).
Biostatistical Analysis, 5th edition.
Prentice Hall, New Jersey.
Zhang Y, Liu ZL, Song M (2015).
“ChiNet uncovers rewired transcription subnetworks in tolerant yeast for advanced biofuels conversion.”
Nucleic Acids Research, 43(9), 4393–4407.
doi:10.1093/nar/gkv358.
The Sharma-Song test sharma.song.test
.
The heterogeneity test heterogeneity.test
.
# Two second-order differential tables: tables <- list( matrix(c(4,0,0, 0,4,0, 0,0,4), nrow=3), matrix(c(0,4,4, 4,0,4, 4,4,0), nrow=3) ) cp.chisq.test(tables) # Three tables differ in the first-order but not second-order: tables <- list( matrix(c(2, 4, 6, 8, 3, 6, 9, 12, 4, 8, 12, 16), nrow=4), matrix(c( 2, 1, 3, 7, 2, 1, 3, 7, 10, 5, 15, 35), nrow=4), matrix(c(40, 16, 72, 16, 45, 18, 81, 18, 25, 10, 45, 10), nrow=4) ) cp.chisq.test(tables)
# Two second-order differential tables: tables <- list( matrix(c(4,0,0, 0,4,0, 0,0,4), nrow=3), matrix(c(0,4,4, 4,0,4, 4,4,0), nrow=3) ) cp.chisq.test(tables) # Three tables differ in the first-order but not second-order: tables <- list( matrix(c(2, 4, 6, 8, 3, 6, 9, 12, 4, 8, 12, 16), nrow=4), matrix(c( 2, 1, 3, 7, 2, 1, 3, 7, 10, 5, 15, 35), nrow=4), matrix(c(40, 16, 72, 16, 45, 18, 81, 18, 25, 10, 45, 10), nrow=4) ) cp.chisq.test(tables)
Across given contingency tables, the test admits any type of differences in either the joint or marginal distributions of the tables.
heterogeneity.test(tables)
heterogeneity.test(tables)
tables |
a list of at least two non-negative matrices or data frames representing contingency tables of the same dimensions. |
The heterogeneity test determines whether the patterns underlying multiple contingency tables are heterogeneous or differential. The chi-squared distribution is used for the null distribution of its test statistic (Zar, 2010).
A list with class "htest
" containing the following components:
statistic |
heterogeneity test statistic. |
parameter |
degrees of freedom of used for the null distribution of the heterogeneity test statistic. |
p.value |
p-value of the heterogeneity test, computed using the chi-squared distribution. |
Zar, J. H. (2010) Biostatistical Analysis, 5th Ed., New Jersey: Prentice Hall.
The comparative chi-squared test cp.chisq.test
.
The Sharma-Song test sharma.song.test
.
# Two second-order differential tables: tables <- list( matrix(c(4,0,0, 0,4,0, 0,0,4), nrow=3), matrix(c(0,4,4, 4,0,4, 4,4,0), nrow=3) ) heterogeneity.test(tables) # Three tables differ in the first-order but not second-order: tables <- list( matrix(c(2, 4, 6, 8, 3, 6, 9, 12, 4, 8, 12, 16), nrow=4), matrix(c( 2, 1, 3, 7, 2, 1, 3, 7, 10, 5, 15, 35), nrow=4), matrix(c(40, 16, 72, 16, 45, 18, 81, 18, 25, 10, 45, 10), nrow=4) ) heterogeneity.test(tables)
# Two second-order differential tables: tables <- list( matrix(c(4,0,0, 0,4,0, 0,0,4), nrow=3), matrix(c(0,4,4, 4,0,4, 4,4,0), nrow=3) ) heterogeneity.test(tables) # Three tables differ in the first-order but not second-order: tables <- list( matrix(c(2, 4, 6, 8, 3, 6, 9, 12, 4, 8, 12, 16), nrow=4), matrix(c( 2, 1, 3, 7, 2, 1, 3, 7, 10, 5, 15, 35), nrow=4), matrix(c(40, 16, 72, 16, 45, 18, 81, 18, 25, 10, 45, 10), nrow=4) ) heterogeneity.test(tables)
The test detects change in either row or column marginal distributions across given contingency tables.
marginal.change.test(tables)
marginal.change.test(tables)
tables |
a list of at least two non-negative matrices or data frames representing contingency tables of the same dimensions. |
The marginal change test determines whether the patterns underlying multiple input contingency tables have changed row or column marginal distributions. Its test statistic is proved to asymptotically follow the chi-squared distribution under the null hypothesis of same row and marginal distributions across tables (Sharma et al. 2020).
The test statistic is minimized to zero if and only if observed row marginal distributions are the same across tables and so do the column marginal distributions.
A list with class "htest
" containing the following components:
statistic |
the chi-squared test statistic. |
parameter |
the degrees of freedom of the null chi-squared distribution. |
p.value |
the p-value for the test, computed using the chi-squared distribution. |
Ruby Sharma and Joe Song
Sharma R, Luo X, Kumar S, Song M (2020). “Three Co-Expression Pattern Types across Microbial Transcriptional Networks of Plankton in Two Oceanic Waters.” In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB '20. ISBN 9781450379649, doi:10.1145/3388440.3412485.
sharma.song.test
, strength.test
, and
type.analysis
.
# Two first-order differential tables: tables <- list( matrix(c(30,0,0, 0,10,0, 0,0,20), nrow=3), matrix(c(10,0,0, 0,20,0, 0,0,30), nrow=3) ) marginal.change.test(tables) # Tables differ in the second-order but not first-order: tables <- list( matrix(c(4,0,0, 0,4,0, 0,0,4), nrow=3), matrix(c(0,0,4, 0,4,0, 4,0,0), nrow=3) ) marginal.change.test(tables)
# Two first-order differential tables: tables <- list( matrix(c(30,0,0, 0,10,0, 0,0,20), nrow=3), matrix(c(10,0,0, 0,20,0, 0,0,30), nrow=3) ) marginal.change.test(tables) # Tables differ in the second-order but not first-order: tables <- list( matrix(c(4,0,0, 0,4,0, 0,0,4), nrow=3), matrix(c(0,0,4, 0,4,0, 4,0,0), nrow=3) ) marginal.change.test(tables)
The test detects differential departure from independence via second-order difference in joint distributions underlying two or more contingency tables.
sharma.song.test( tables, null.table.marginal = c("observed", "uniform"), compensated = FALSE )
sharma.song.test( tables, null.table.marginal = c("observed", "uniform"), compensated = FALSE )
tables |
a list of at least two non-negative matrices or data frames representing contingency tables of the same dimensions. |
null.table.marginal |
a character string to specify marginal distributions of null tables. The options are |
compensated |
a logical value to compensate for the Cochran's condition. It is only used if |
The Sharma-Song test determines whether the patterns underlying multiple input contingency tables are second-order differential. The test statistic measures differential departure form independence. Its null test statistic is proved to asymptotically follow the chi-squared distribution. For full detail of the test, see (Sharma et al. 2021).
If null.table.marginal
is set to "observed"
, the null hypothesis uses the observed marginals. The compensated
parameter, if set to TRUE
, adds a small constant to each entry of the tables to address the Cochran's condition that the expected count in any table entry is 5 or less.
If the null.table.marginal
is set to "uniform"
, the null tables are set to have uniform marginals. No longer suffering from the Cochran's condition, it detects second-order differential patterns with additional robustness.
A list with class "htest
" containing the following components:
statistic |
the Sharma-Song chi-squared test statistic. |
parameter |
degrees of freedom of the chi-squared test statistic. |
p.value |
p-value of the Sharma-Song test, computed using the chi-squared distribution. |
Ruby Sharma and Joe Song
Sharma R, Kumar S, Song M (2021). “Fundamental gene network rewiring at the second order within and across mammalian systems.” Bioinformatics. doi:10.1093/bioinformatics/btab240.
cp.chisq.test
, heterogeneity.test
, strength.test
, marginal.change.test
, and
type.analysis
.
# Two second-order differential tables: tables <- list( matrix(c(4,0,0, 0,4,0, 0,0,4), nrow=3), matrix(c(0,4,4, 4,0,4, 4,4,0), nrow=3) ) sharma.song.test(tables) # Three tables differ in the first-order but not second-order: tables <- list( matrix(c(2, 4, 6, 8, 3, 6, 9, 12, 4, 8, 12, 16), nrow=4), matrix(c( 2, 1, 3, 7, 2, 1, 3, 7, 10, 5, 15, 35), nrow=4), matrix(c(40, 16, 72, 16, 45, 18, 81, 18, 25, 10, 45, 10), nrow=4) ) sharma.song.test(tables, null.table.marginal = "uniform")
# Two second-order differential tables: tables <- list( matrix(c(4,0,0, 0,4,0, 0,0,4), nrow=3), matrix(c(0,4,4, 4,0,4, 4,4,0), nrow=3) ) sharma.song.test(tables) # Three tables differ in the first-order but not second-order: tables <- list( matrix(c(2, 4, 6, 8, 3, 6, 9, 12, 4, 8, 12, 16), nrow=4), matrix(c( 2, 1, 3, 7, 2, 1, 3, 7, 10, 5, 15, 35), nrow=4), matrix(c(40, 16, 72, 16, 45, 18, 81, 18, 25, 10, 45, 10), nrow=4) ) sharma.song.test(tables, null.table.marginal = "uniform")
Generate contingency tables that are first-order, second-order or full-order differential in the joint distribution of row and column variables.
simulate_diff_tables( K = 2, nrow = 3, ncol = 3, n = 100, B = 100, type = c("second-order", "first-order", "full-order") )
simulate_diff_tables( K = 2, nrow = 3, ncol = 3, n = 100, B = 100, type = c("second-order", "first-order", "full-order") )
K |
the number of tables that are differential. It must be an integer greater than one. |
nrow |
the number of rows for all tables to be generated. It must be an integer greater than one. |
ncol |
the number of columns for all tables to be generated. It must be an integer greater than one. |
n |
the sample size for each table to be generated. It must be a positive integer. |
B |
the number of iterations indicating the level of differentiality. It must be a positive integer. The greater the value, the stronger the differentiality across tables. |
type |
the type of differential tables to be generated. Options are |
The function randomly generates contingency tables differential in the joint distribution of the row and column variables. Specifically, three types of differential contingency tables can be simulated:
First-order differential contingency tables only differ in row or column marginal distribution. Such tables are differential in joint distribution, but different from second-order differential tables.
Second-order differential contingency tables differ in joint distribution The difference is not attributed to row or column marginal distributions.
Full-order differential contingency tables are tables that are both first-order and second-order differential.
The simulation starts with randomly generated probability tables where row and column variables are independent. The probability tables are modified to K
tables such that they represent specific distributions that strictly satisfy the type requirement. Finally, contingency tables are generated using multinomial distribution using these probability tables parameters and the required sample size.
A list containing the following components:
contingency.tables |
a list of |
probability.tables |
a list of |
method |
a string that specifies the type of the differential tables. |
Ruby Sharma and Joe Song
Differential tables are simulated to evaluate the following tests for comparing contingency tables:
The Sharma-Song test sharma.song.test
The comparative chi-squared test cp.chisq.test
The heterogeneity test heterogeneity.test
The simulate_tables
function in package FunChisq can generate a variety of tables.
# Three first-order differential tables: res <- simulate_diff_tables(K = 3, nrow = 4, ncol = 3, n = 150, B = 200, type = "first-order") print(res) # Two second-order differential tables: res <- simulate_diff_tables(K = 2, nrow = 2, ncol = 5, n = 100, B = 100, type = "second-order") print(res) # Four full-order differential tables: res <- simulate_diff_tables(K = 4, nrow = 3, ncol = 4, n = 250, B = 200, type = "full-order") print(res)
# Three first-order differential tables: res <- simulate_diff_tables(K = 3, nrow = 4, ncol = 3, n = 150, B = 200, type = "first-order") print(res) # Two second-order differential tables: res <- simulate_diff_tables(K = 2, nrow = 2, ncol = 5, n = 100, B = 100, type = "second-order") print(res) # Four full-order differential tables: res <- simulate_diff_tables(K = 4, nrow = 3, ncol = 4, n = 250, B = 200, type = "full-order") print(res)
The test determines the total strength of association in multiple contingency tables.
strength.test(tables)
strength.test(tables)
tables |
a list of at least two non-negative matrices or data frames representing contingency tables. |
The strength test determines total amount of association in multiple input contingency tables. Its test statistic asymptotically follows the chi-squared distribution under the null hypothesis of each table having independent row and column variables (Sharma et al. 2020).
The test statistic is minimized to zero if and only if row and column variables are empirically independent of each other in every table.
This test is considered a zeroth-order test in the function type.analysis
that characterizes the difference across multiple contingency tables.
A list with class "htest
" containing the following components:
statistic |
the strength test statistic. |
parameter |
the degrees of freedom of null chi-squared distribution. |
p.value |
the p-value for the test, computed using the null chi-squared distribution. |
Ruby Sharma and Joe Song
Sharma R, Luo X, Kumar S, Song M (2020). “Three Co-Expression Pattern Types across Microbial Transcriptional Networks of Plankton in Two Oceanic Waters.” In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB '20. ISBN 9781450379649, doi:10.1145/3388440.3412485.
A second-order different test sharma.song.test
. A first-order difference test marginal.change.test
. The comparative type of differences across contingency tables type.analysis
.
# Both tables have strong association: tables <- list( matrix(c(30,0,0, 0,10,0, 0,0,20), nrow=3), matrix(c(10,0,0, 0,20,0, 0,0,30), nrow=3) ) strength.test(tables) # One table has strong association: tables <- list( matrix(c(4,0,0, 0,4,0, 0,0,4), nrow=3), matrix(c(4,0,4, 8,4,8, 4,0,4), nrow=3) ) strength.test(tables) # Both tables has no association: tables <- list( matrix(c(4,0,4, 8,4,8, 4,0,4), nrow=3), matrix(c(4,0,4, 8,4,8, 4,0,4), nrow=3) ) strength.test(tables)
# Both tables have strong association: tables <- list( matrix(c(30,0,0, 0,10,0, 0,0,20), nrow=3), matrix(c(10,0,0, 0,20,0, 0,0,30), nrow=3) ) strength.test(tables) # One table has strong association: tables <- list( matrix(c(4,0,0, 0,4,0, 0,0,4), nrow=3), matrix(c(4,0,4, 8,4,8, 4,0,4), nrow=3) ) strength.test(tables) # Both tables has no association: tables <- list( matrix(c(4,0,4, 8,4,8, 4,0,4), nrow=3), matrix(c(4,0,4, 8,4,8, 4,0,4), nrow=3) ) strength.test(tables)
Four types (0, 1, 2, null) are assigned to a collection of contingency tables to categorize their differences.
type.analysis(tables, alpha = 0.05)
type.analysis(tables, alpha = 0.05)
tables |
a list of at least two non-negative matrices or data frames representing contingency tables of the same dimensions. |
alpha |
a numerical value representing the significance level for all involved hypothesis tests. Default is 0.05. |
The function determines whether differences across patterns underlying multiple input contingency tables are type 0, 1, 2, or null. The function calls strength.test
, marginal.change.test
, and sharma.song.test
to obtain three p-values and use them to decide on the type of difference among input contingency tables (Sharma et al. 2020).
Type null: association absent from the input tables. These tables can still differ in joint distribution, but a strong relationship is lacking between row and column variables. No mechanisms are implied.
Type 0: association present and patterns are conserved. These tables show strong row and column association but have no difference in distribution. Conserved mechanisms with conserved trajectories are implied.
Type 1: association present; the difference is up to the first order. These tables show strong row and column association, differ in marginal distribution, and do not differ deviation from joint distribution to product of marginals. Conserved mechanisms with differential trajectories are implied. Differences in trajecotry can be due to changed stimuli.
Type 2: association present; the difference is up to the second order. These tables show strong row and column association and differ in deviation from joint distribution to the product of marginals. Differential mechanisms are implied.
A list with class "DiffXTableTypeAnalysis
" containing the following components:
Zeroth.order |
a list with class " |
First.order |
a list with class " |
Second.order |
a list with class " |
Type |
the type of differences across the input contingency tables. Possible values are 0, 1, 2, and |
Ruby Sharma and Joe Song
Sharma R, Luo X, Kumar S, Song M (2020). “Three Co-Expression Pattern Types across Microbial Transcriptional Networks of Plankton in Two Oceanic Waters.” In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB '20. ISBN 9781450379649, doi:10.1145/3388440.3412485.
strength.test
, marginal.change.test
, and sharma.song.test
.
A related function is the comparative chi-squared test cp.chisq.test
.
# Type-null tables: tables <- list( matrix(c(7, 4, 1, 3, 6, 9, 2, 4, 4), nrow=3), matrix(c(2, 1, 2, 2, 0, 8, 6, 2, 7), nrow=3) ) type.analysis(tables, alpha = 0.05) # Type-0 tables: tables <- list( matrix(c(30,0,0, 0,10,0, 0,0,20), nrow=3), matrix(c(30,0,0, 0,10,0, 0,0,20), nrow=3) ) type.analysis(tables, alpha = 0.05) # Type-1 differential tables: tables <- list( matrix(c(30,0,0, 0,10,0, 0,0,20), nrow=3), matrix(c(10,0,0, 0,20,0, 0,0,30), nrow=3) ) type.analysis(tables, alpha = 0.05) # Type-2 differential tables: tables <- list( matrix(c(4,0,0, 0,4,0, 0,0,4), nrow=3), matrix(c(0,4,4, 4,0,4, 4,4,0), nrow=3) ) type.analysis(tables, alpha = 0.05)
# Type-null tables: tables <- list( matrix(c(7, 4, 1, 3, 6, 9, 2, 4, 4), nrow=3), matrix(c(2, 1, 2, 2, 0, 8, 6, 2, 7), nrow=3) ) type.analysis(tables, alpha = 0.05) # Type-0 tables: tables <- list( matrix(c(30,0,0, 0,10,0, 0,0,20), nrow=3), matrix(c(30,0,0, 0,10,0, 0,0,20), nrow=3) ) type.analysis(tables, alpha = 0.05) # Type-1 differential tables: tables <- list( matrix(c(30,0,0, 0,10,0, 0,0,20), nrow=3), matrix(c(10,0,0, 0,20,0, 0,0,30), nrow=3) ) type.analysis(tables, alpha = 0.05) # Type-2 differential tables: tables <- list( matrix(c(4,0,0, 0,4,0, 0,0,4), nrow=3), matrix(c(0,4,4, 4,0,4, 4,4,0), nrow=3) ) type.analysis(tables, alpha = 0.05)