Package 'TreeDimensionTest' reference manual

Package 'TreeDimensionTest'

Title:	Trajectory Presence and Heterogeneity in Multivariate Data
Description:	Testing for trajectory presence and heterogeneity on multivariate data. Two statistical methods (Tenha & Song 2022) <doi:10.1371/journal.pcbi.1009829> are implemented. The tree dimension test quantifies the statistical evidence for trajectory presence. The subset specificity measure summarizes pattern heterogeneity using the minimum subtree cover. There is no user tunable parameters for either method. Examples are included to illustrate how to use the methods on single-cell data for studying gene and pathway expression dynamics and pathway expression specificity.
Authors:	Lovemore Tenha [aut] , Joe Song [aut, cre]
Maintainer:	Joe Song <[email protected]>
License:	LGPL (>= 3)
Version:	0.0.2
Built:	2025-03-07 03:19:19 UTC
Source:	https://github.com/cran/TreeDimensionTest

Title:

Trajectory Presence and Heterogeneity in Multivariate Data

Description:

Testing for trajectory presence and heterogeneity on multivariate data. Two statistical methods (Tenha & Song 2022) <doi:10.1371/journal.pcbi.1009829> are implemented. The tree dimension test quantifies the statistical evidence for trajectory presence. The subset specificity measure summarizes pattern heterogeneity using the minimum subtree cover. There is no user tunable parameters for either method. Examples are included to illustrate how to use the methods on single-cell data for studying gene and pathway expression dynamics and pathway expression specificity.

Authors:

Lovemore Tenha [aut]

, Joe Song [aut, cre]

Maintainer:

Joe Song <[email protected]>

License:

LGPL (>= 3)

Version:

0.0.2

Built:

2025-03-07 03:19:19 UTC

Source:

https://github.com/cran/TreeDimensionTest

Help Index

Tree Dimension Test Related Statistics

Description

Computes tree dimension measure, tree dimension test effect, number leafs and tree diameter from MST of a given dataset

Usage

compute.stats(x, MST = c("boruvka", "exact"), dim.reduction = c("pca", "none"))
compute.stats(x, MST = c("boruvka", "exact"), dim.reduction = c("pca", "none"))

Arguments

`x`	matrix of input data. Rows as observations and columns as features
`MST`	name of MST to be used in test. There are 2 options; "exact" MST and "boruvka" which is faster for large samples
`dim.reduction`	string parameter with value "pca" to perform dimensionality reduction or "none" to not perform dimensionality reduction

Value

A list with the following components:

tdt_measure The tree dimension value for the given input data
tdt_effect Effect size for tree dimension
leaves Number of leaf/degree1 vertices in the MST of the data
diameter The tree diameter of MST, where each edge is of unit length
original_dimension If "pca" is selected, the number of dimensions in the original dataset
pca_components If "pca" is selected, the number of pca components selected after dimensionality reduction
mst A vector of edges of the mst computed on x. Length of vector is always even.

Empirical Null Distribution of Tree Dimension Test

Description

Computes empirical null distribution of S statistic and parameters for lognormal approximation for input of size rows * columns using multivariate normal randomization

Usage

empirical.distributions(rows, cols, perm = 100, MST = c("boruvka", "exact"))
empirical.distributions(rows, cols, perm = 100, MST = c("boruvka", "exact"))

Arguments

`rows`	number of rows for data representing null case. Rows represent sample size.
`cols`	number of columns for data representing null case. Columns represent variables.
`perm`	number of simulations to compute null distribution. Default is 100.
`MST`	name of MST to be used in computing distribution. There are two options; "exact" MST and "boruvka" which is faster for large samples

Value

A list with the following components:

dist A vector with null distribution of s statistic
meanlog The meanlog parameter estimation for the lognormal distribution on empirical null distribution S.
sdlog The sdlog parameter estimation for lognormal distribution on empirical null distribution of S.

Visualizing Euclidean Minimum Spanning Trees

Description

Plots an Euclidean minimum spanning tree from given input data.

Usage

## S3 method for class 'treedim'
plot(
  x,
  ...,
  node.col = "orange",
  node.size = 5,
  main = "MST plot",
  legend.cord = c(-1.2, 1.1)
)
## S3 method for class 'treedim'
plot(
  x,
  ...,
  node.col = "orange",
  node.size = 5,
  main = "MST plot",
  legend.cord = c(-1.2, 1.1)
)

Arguments

`x`	An object of type "treedim"; returned from test.trajectory, compute.stats or separability
`...`	ignore
`node.col`	vector of colors for the observations in x (vertices)
`node.size`	numerical value to represent size of nodes in the plot
`main`	title for the plot
`legend.cord`	vector of the xy coordinates for the legend c(x,y)

Value

result plots a minimum spanning tree for input data x

Separability of Labeled Data Points

Description

Computes homogeneity of labeled observations with multiple label types.

Usage

separability(x, labels)
separability(x, labels)

Arguments

`x`	input data matrix, with rows as observations and columns as features
`labels`	a vector of labels for the observations. A label could be a type of the observation e.g cell type in single-cell data

Value

A list with the following components:

label_separability A vector of separability scores for each of the label types. A high score denotes high separability
overall_separability Overall average separability score for all the labels

Tree Dimension Test

Description

Computes the statistical significance for the presence of trajectory in multivariate data.

Usage

test.trajectory(
  x,
  perm = 100,
  MST = c("boruvka", "exact"),
  dim.reduction = c("pca", "none")
)
test.trajectory(
  x,
  perm = 100,
  MST = c("boruvka", "exact"),
  dim.reduction = c("pca", "none")
)

Arguments

`x`	matrix of input data. Rows as observations and columns as features.
`perm`	number of simulations to compute null distribution parameters by maximum likelihood estimation.
`MST`	the MST algorithm to be used in test. There are two options: "exact" MST and "boruvka" which is approximate but faster for large samples.
`dim.reduction`	string parameter with value "pca" to perform dimensionality reduction or "none" to not perform dimensionality reduction before the test.

Details

If the input data is already after dimension reduction, use dim.reduction="none". The method is described in (Tenha and Song 2022).

Value

A list with the following components:

tdt_measure The tree dimension value for the given input data
statistic The S statistic calculated on the input data. S statistic is derived from tree dimension
tdt_effect Effect size for tree dimension
leaves Number of leaf/degree1 vertices in the MST of the data
diameter The tree diameter of MST, where each edge is of unit length
p.value The pvalue for the S statistic. Pvalue measures presence of trajectory in input x.
original_dimension If "pca" is selected, the number of dimensions in the original dataset
pca_components If "pca" is selected, the number of pca components selected after dimensionality reduction
mst A vector of edges of the mst computed on x. Length of vector is always even.

References

Tenha L, Song M (2022). “Inference of trajectory presence by tree dimension and subset specificity by subtree cover.” PLOS Computational Biology, 18(2), e1009829. doi:10.1371/journal.pcbi.1009829.