Earle, Denise (2010) Dendrogram seriation in data visualisation: algorithms and applications. PhD thesis, National University of Ireland Maynooth.
PDF
Denise_Earle_PhD_Thesis_2010.pdf
Download (85MB)
Denise_Earle_PhD_Thesis_2010.pdf
Download (85MB)
Abstract
Seriation is a data analytic tool for obtaining a permutation of a set of objects
with the goal of revealing structural information within the set of objects. The
purpose of this thesis is to investigate and develop tools for seriation with the
goal of using these tools to enhance data visualisation.
The particular focus of this thesis is on dendrogram seriation algorithms.
A dendrogram is a tree-like structure used for visualising the results of a hierarchical
clustering and the order of the leaves in a dendrogram provides a
permutation of a set of objects. Dendrogram seriation algorithms rearrange
the leaves of a dendrogram in order to find a permutation that optimises a
given criterion.
Dendrogram seriation algorithms are widely used, however, the research in
this area is often confusing because of inconsistent or inadequate terminology.
This thesis proposes new notation and terminology with the goal of better
understanding and comparing dendrogram seriation algorithms.
Seriation criteria measure the goodness of a permutation of a set of objects.
Popular seriation criteria include the path length of a permutation and measuring
anti-Robinson form in a symmetric matrix. This thesis proposes two
new seriation criteria, lazy path length and banded anti-Robinson form,
and demonstrates their effectiveness in improving a variety of visualisations.
The main contribution of this thesis is a new dendrogram seriation algorithm.
This algorithm improves on other dendrogram seriation algorithms and
is also flexible because it allows the user to either choose from a variety of seriation
criteria, including the new criteria mentioned above, or to input their
own criteria.
Finally, this thesis performs a comparison of several seriation algorithms,
the results of which show that the proposed algorithm performs competitively
against other algorithms. This leads to a set of general guidelines for choosing
the most appropriate seriation algorithm for different seriation interests and
visualisation settings.
Item Type: | Thesis (PhD) |
---|---|
Keywords: | Dendrogram seriation; data visualisation; algorithms and applications; |
Academic Unit: | Faculty of Science and Engineering > Mathematics and Statistics |
Item ID: | 2442 |
Depositing User: | IR eTheses |
Date Deposited: | 14 Feb 2011 15:11 |
URI: | https://mu.eprints-hosting.org/id/eprint/2442 |
Use Licence: | This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here |
Repository Staff Only (login required)
Downloads
Downloads per month over past year