High-dimensional data in phylogenetics

What is high-dimensional data?

The concept of high-dimensionality applies to data sets that contain a high number of variables, so high that it often surpasses the population sample size.

For example, when the number of genes affecting a phenotypes is higher than the number of people the genetic data was drawn from.

This is a statistical problem.

Check these: https://www.statisticshowto.com/dimensionality/#:~:text=High%20Dimensional%20means%20that%20the,tens%20of%20hundreds%20of%20samples.

From the previous post: «Each added variable results in an exponential decrease in predictive power.»

Book: https://www.springer.com/gp/book/9783642201912

When is high-dimensionality a problem in phylogenetics?

In phylogenetics, we usually have as sample size the number of tips on the phylogeny. Should we be using also the number of nodes?

Avatar
Luna L. Sánchez Reyes
Postdoctoral Research Scholar
University of California, Merced

Related