AI MACHINE LEARNING: CLASSIFICATION WITH PHYLOGENETIC TREES & HIERARCHICAL CLUSTERING (UNSUPERVISED)

Related AI/ML Methods: Segmentation Clustering, K-Nearest Neighbor
Related Traditional Methods: Segmentation Clustering

In this method, the algorithm runs phylogenetic trees for data classification by applying a hierarchical clustering algorithm. Phylogenetic trees are typically used in biomedical and genetic research, such as looking at DNA sequences. This method is unsupervised, and the algorithm is applied to figure out how to cluster a set of data that is unordered without being provided with any training data having the correct responses. The result is a hierarchical cluster with multiple fully nested sets where the smallest sets are the individual elements of the set, and the largest set is the entire dataset.

To apply a phylogenetic tree using hierarchical clustering, the dataset is typically a set of sequences or distance matrices. Simply enter the variables you need to classify and enter the number of clusters desired. The input variable is a sample genetic DNA sequence, and the results show a hierarchical 5-level phylogenetic tree (Figure 9.61). The 5-level tree (vertical lines indicate branching events, and there are 5 vertical lines starting from the first branch to the longest path). The required model inputs look like the following:

Figure 9.61: AI/ML Phylogenetic Tree