UI/UX Designer and Prototypist
Title_Card.png

Jellyfish S18 RDNA: Phylogenetic Analysis

A case study and exploration into the world of Bioinformatics for my Information Design final project in December 2016. 

Introduction

Introduction

The S18 rDNA gene is responsible for encoding the small ribosomal subunit's RNA component. 

This component is incredibly important for cellular function.  Without it, a normal ribosomal function could not be possible and polypeptides could not be produced.  Therefore, this gene must be found in all eukaryotic species.  Additionally, it should be highly conserved.  Jellyfish are an early branch of life constituting and early and relatively undeveloped example of complex life.  They are extremely diverse and their simplicity allows for high degrees of divergence; they are genetically robust.  Their lack of specialized tissues and utilization of radial body plans yields a high degree of genetic flexibility.  Jellyfish S18 should remain highly conserved over time.  This gives higher fidelity genetic phyllogenetic analysis.  My goal is is to analyse the phyllogentic relationships and genetic similarities to organize and comprehend how jellyfish have changed and diversified over time.  

 The following data study contains graphical, hierarchical data visualizations of jellyfish species based on the genetic data of their S18 rDNA gene.    Species are given hexagonal shapes allowing for easy readability and automatic scaling throu

The following data study contains graphical, hierarchical data visualizations of jellyfish species based on the genetic data of their S18 rDNA gene.  

Species are given hexagonal shapes allowing for easy readability and automatic scaling through geometric alignment.  Genetic difference is expressed through color change.  

Current informational user interfaces do not demonstrate grouping or clustering yielding poorly represented large scale relationships while making discrete pieces too insignificant to compare.  These visualizations prioritize easy readability with strong multi-dimensional group clustering.  

Readability, Scaling & Compression

Readability, Scaling & Compression

Each species name is parsed from a fasta file to allow for direct identification of species within the tree structure.  The hexagon objects allow for high conjoinment flexibility and natural grouping. 

Shorter names and flexible positioning allow for the top-down tree formation to be made more compact while maintaining comprehension.  The tree above displays both the phylogenetic data, represented in the structure, and the percent genetic differences to each species closest ancestor over time, represented in color.  The history of the species can be visually tracked through this color difference.  The stacked-cladogram style positioning allows for the length of the gene and nucleotide breakdowns to be displayed.  

Percent Identity

Percent Identity

This alternative, hierarchical view allows the user to focus on the genetic timeline as it pertains to the percent identity change over time. 

The top most species is the oldest, or most closely related to the oldest common ancestor.  Percent identity is shown in hue change; the smaller the change is, the more genetic material the two species share.  

A custom python script was implemented to compare the content of the two genes.  It reads through multiple species aligned genes to compare each species to the previous species, searching for the percentage of identical nucleotide placements.

Size Similarity

Size Similarity

The chart above is a sequence legnth similarity graph using dynamically sized hexagons to visually illustrate the fluctuation in the number of nucleotides the S18 houses in each species. 

The colors correspond to both the Percent Identity chart and the Phylogenetic Tree views of this data set.  A similar custom python script was implemented to calculate the number of nucleotides, or length, of the S18 gene in each species. 

Nucleotide Percentage Area Line Graph & Isotope Nucleotides

Nucleotide Percentage Area Line Graph & Isotope Nucleotides

The area line graph demonstrates the change in nucleotide identities across species by projected age of the species in an alternative view.

With this perspective, the user can see the impact from a change in the nucleotide makeup of each species, in what direction mutations are taking place.  

The isolated visualization of each species can be used to illustrate the amount of each nucleotide present.  Each color grouping of hexagons represents a scaled percentage of a given nucleotide.  This simplified, isotype-esque viewpoint can give the user an idea of the distrobution of nucleotides at a quick glance.  

Nucleotide Percentages Bar Graph

Nucleotide Percentages Bar Graph

This graph illustrates the change in nucleotide identities across species by projected age of the species, highlighting both the relative sequence length and percentage breakdown of the nucleotide presence. 

A custom python script was written and employed to read each sourced fasta file and output a text file containing each species gene length and nucleotide percentages. 

In Action

In Action

The Stauromedusae family tree demonstrates the two types of grouping conclusions one can draw: diversity grouping and taxonomy grouping. 

The color indicates that there are two local groupings with locally low diversity with a large amount of genetic change between those two groups, as well as significant gene size changes.  

The tree diverges into two distinct branches, one much longer than the other.  These two groups are significantly different, a quality difficult to analyze with current software.