Title: | Trajectory Presence and Heterogeneity in Multivariate Data |
---|---|
Description: | Testing for trajectory presence and heterogeneity on multivariate data. Two statistical methods (Tenha & Song 2022) <doi:10.1371/journal.pcbi.1009829> are implemented. The tree dimension test quantifies the statistical evidence for trajectory presence. The subset specificity measure summarizes pattern heterogeneity using the minimum subtree cover. There is no user tunable parameters for either method. Examples are included to illustrate how to use the methods on single-cell data for studying gene and pathway expression dynamics and pathway expression specificity. |
Authors: | Lovemore Tenha [aut] |
Maintainer: | Joe Song <[email protected]> |
License: | LGPL (>= 3) |
Version: | 0.0.2 |
Built: | 2025-03-07 03:19:19 UTC |
Source: | https://github.com/cran/TreeDimensionTest |
Computes tree dimension measure, tree dimension test effect, number leafs and tree diameter from MST of a given dataset
compute.stats(x, MST = c("boruvka", "exact"), dim.reduction = c("pca", "none"))
compute.stats(x, MST = c("boruvka", "exact"), dim.reduction = c("pca", "none"))
x |
matrix of input data. Rows as observations and columns as features |
MST |
name of MST to be used in test. There are 2 options; "exact" MST and "boruvka" which is faster for large samples |
dim.reduction |
string parameter with value "pca" to perform dimensionality reduction or "none" to not perform dimensionality reduction |
A list with the following components:
tdt_measure The tree dimension value for the given input data
tdt_effect Effect size for tree dimension
leaves Number of leaf/degree1 vertices in the MST of the data
diameter The tree diameter of MST, where each edge is of unit length
original_dimension If "pca" is selected, the number of dimensions in the original dataset
pca_components If "pca" is selected, the number of pca components selected after dimensionality reduction
mst A vector of edges of the mst computed on x. Length of vector is always even.
Computes empirical null distribution of S statistic and parameters for lognormal approximation for input of size rows * columns using multivariate normal randomization
empirical.distributions(rows, cols, perm = 100, MST = c("boruvka", "exact"))
empirical.distributions(rows, cols, perm = 100, MST = c("boruvka", "exact"))
rows |
number of rows for data representing null case. Rows represent sample size. |
cols |
number of columns for data representing null case. Columns represent variables. |
perm |
number of simulations to compute null distribution. Default is 100. |
MST |
name of MST to be used in computing distribution. There are two options; "exact" MST and "boruvka" which is faster for large samples |
A list with the following components:
dist A vector with null distribution of s statistic
meanlog The meanlog parameter estimation for the lognormal distribution on empirical null distribution S.
sdlog The sdlog parameter estimation for lognormal distribution on empirical null distribution of S.
Plots an Euclidean minimum spanning tree from given input data.
## S3 method for class 'treedim' plot( x, ..., node.col = "orange", node.size = 5, main = "MST plot", legend.cord = c(-1.2, 1.1) )
## S3 method for class 'treedim' plot( x, ..., node.col = "orange", node.size = 5, main = "MST plot", legend.cord = c(-1.2, 1.1) )
x |
An object of type "treedim"; returned from test.trajectory, compute.stats or separability |
... |
ignore |
node.col |
vector of colors for the observations in x (vertices) |
node.size |
numerical value to represent size of nodes in the plot |
main |
title for the plot |
legend.cord |
vector of the xy coordinates for the legend c(x,y) |
result plots a minimum spanning tree for input data x
Computes homogeneity of labeled observations with multiple label types.
separability(x, labels)
separability(x, labels)
x |
input data matrix, with rows as observations and columns as features |
labels |
a vector of labels for the observations. A label could be a type of the observation e.g cell type in single-cell data |
A list with the following components:
label_separability A vector of separability scores for each of the label types. A high score denotes high separability
overall_separability Overall average separability score for all the labels
Computes the statistical significance for the presence of trajectory in multivariate data.
test.trajectory( x, perm = 100, MST = c("boruvka", "exact"), dim.reduction = c("pca", "none") )
test.trajectory( x, perm = 100, MST = c("boruvka", "exact"), dim.reduction = c("pca", "none") )
x |
matrix of input data. Rows as observations and columns as features. |
perm |
number of simulations to compute null distribution parameters by maximum likelihood estimation. |
MST |
the MST algorithm to be used in test. There are two options: "exact" MST and "boruvka" which is approximate but faster for large samples. |
dim.reduction |
string parameter with value "pca" to perform dimensionality reduction or "none" to not perform dimensionality reduction before the test. |
If the input data is already after dimension reduction, use
dim.reduction="none"
. The method is described in
(Tenha and Song 2022).
A list with the following components:
tdt_measure The tree dimension value for the given input data
statistic The S statistic calculated on the input data. S statistic is derived from tree dimension
tdt_effect Effect size for tree dimension
leaves Number of leaf/degree1 vertices in the MST of the data
diameter The tree diameter of MST, where each edge is of unit length
p.value The pvalue for the S statistic. Pvalue measures presence of trajectory in input x.
original_dimension If "pca" is selected, the number of dimensions in the original dataset
pca_components If "pca" is selected, the number of pca components selected after dimensionality reduction
mst A vector of edges of the mst computed on x. Length of vector is always even.
Tenha L, Song M (2022). “Inference of trajectory presence by tree dimension and subset specificity by subtree cover.” PLOS Computational Biology, 18(2), e1009829. doi:10.1371/journal.pcbi.1009829.