High-dimensional data in phylogenetics
What is high-dimensional data?
The concept of high-dimensionality applies to data sets that contain a high number of variables, so high that it often surpasses the population sample size.
For example, when the number of genes affecting a phenotypes is higher than the number of people the genetic data was drawn from.
This is a statistical problem.
From the previous post: «Each added variable results in an exponential decrease in predictive power.»
When is high-dimensionality a problem in phylogenetics?
In phylogenetics, we usually have as sample size the number of tips on the phylogeny. Should we be using also the number of nodes?