Abstract
The talk will start with an introduction to data visualisation for large
dimensional data and the use of parallel coordinates. This motivates the use
of information theoretic ideas to understand multivariate data and a new
histogramming algorithm based on the Shannon entropy of the data which
quantifies the meaning of the expressions "underbinned" and "overbinned". A
data driven approach is then used to answer key questions; how are the
variables related: how many dimensions are needed to describe a dataset:
what variables are key to a particular class of events and why this is
relevant to data mining algorithms: and is there a fundamental limit to how
two classes of events can be separated ?