# High-dimensional data in phylogenetics

What is high-dimensional data?

The concept of high-dimensionality applies to data sets that contain a high number of variables, so high that it often surpasses the population sample size.

For example, when the number of genes affecting a phenotypes is higher than the number of people the genetic data was drawn from.

This is a statistical problem.

From the previous post: «Each added variable results in an exponential decrease in predictive power.»

When is high-dimensionality a problem in phylogenetics?

In phylogenetics, we usually have as sample size the number of tips on the phylogeny. Should we be using also the number of nodes?