Master project of A.R. Offringa

Groningen, 2008

The project results are:


Two related subjects are discussed: classifying astronomical data sets using various methods and a new technique to visualise a data set.

The construction of the new visualisation technique, which visualises information produced by the Learning Vector Quantisation method, is discussed and several examples are given. The visualisation technique provides a new way of presenting information within a data set with the concept of prototypes, and is efficient for finding outliers and relations in a data set. Improvements are proposed and additional research is required for conclusions about its use.

C4.5 decision trees, Learning Vector Quantisation (LVQ) and Self-Organising Maps (SOM) are used as classification methods and their performances and properties are compared. The classification and visualisation methods are used to get a better insight in the astronomical ESO-LV data set consisting of over 15,000 galaxies labelled according to their morphological galaxy type and an unpublished data set of 4,500 galaxies of 2dF-WFI observations of which the morphological types have been determined a priori for 100 galaxies that were easy to classify by hand. LVQ performance is significantly improved by applying simulated annealing on the parameters of LVQ. Also, issues with unbalanced priors in the data set are solved by changes in the LVQ method. When optimised, the LVQ and C4.5 methods have near equal performances. The relevance of features in the data set is considered and compared between the methods. The methods show some remarkable differences.

Some new features were proposed to identify calibration errors in radio frequency interferometry, aimed to find calibration errors that cause artifacts in images produced by the Low Frequency Array (LOFAR) telescope. Features that use shapelet decomposition and variants on the images in the Fourier space were tested for usefulness, but were found to be insufficiently discriminating.