John van Geest Cancer Research Centre
Bioinformatics
The advent of Post Genomic technologies generates large amounts of complex biological data. This data commonly contains large numbers of components (genes, proteins or single nucleotide poly-morphisms) that are non linear, interact and contain a significant amount of noise. The robust analysis of this type of data for the identification of biomarkers that predict the presence of disease, disease outcome or response to therapy requires the development of advanced datamining algorithms that can cope with these data features. The bioinformatics group in the JvGCRC have developed and continue to optimise bioinformatics approaches based on Artificial Neural Networks (ANNs) for such analysis.
Artificial Neural Networks (ANNs) are a form of machine learning capable of accurately datamining complex biological data containing large numbers of interacting non linear components (Kothari and Heekuck, 1993). The result of the datamining produces a panel of biomarkers incorporated into a model that allow accurate classification of clinical groups within a population. A number of studies have indicated the ANN approach can produce generalised models with a greater accuracy than conventional statistical techniques in medical diagnostics (Tafeit and Reibnegger, 1999; Reckwitz et al, 2000).
Algorithms for Biomarker Discovery
The Bioinformatics group have developed the ANN based approaches applied within this project over the last 12 years. This approach is used to identify an optimised panel of markers from highly dimensional complex post genomic data. A method (the subject of a patent application) has been developed by the group that determines the smallest possible subset of biomarkers, from a highly dimensional data set, that explain a particular biomedical question. This optimises the analysis to enable the cohort to predict well for new (unseen by the model) cases.
The methods developed have been applied successfully to the identification and modelling of biomarkers addressing a wide range of biomedical questions. They have been used to analyse data from a wide range of sources including proteomic, genomic,micro RNA, immuno-histochemical and cytokine data, and have been applied to problems such as the prediction of response to therapy, prediction of disease outcome and discrimination between disease and control across a number of biomedical systems. The application of ANNs to our research is given in the individual projects outlined in the review report.
Systems Biology Algorithms
A further approach developed by the group focuses on modelling pathways within post genomic data. This approach allows a set of biomarkers to be studied to identify potential interactions and pathways within the context of cancer, based on ANN technologies and is complementary to other systems biology tools such as Ingenuity Pathways analysis. The approach has been used in the group to identify gene interactions associated with progression in breast cancer (Lemetre et al, 2008) and in cytokines associated with response to therapy in prostate cancer immunotherapy. The bioinformatic tools are important for work outlined here using gene silencing approaches to study signalling pathways and for analysis of LC-MALDI sequence data.

