Raivo Kolde PhD defence

-  Methods for re-using public gene expression data

Klipi teostus: Maria Gaiduk 16.06.2014 3704 vaatamist Arvutiteadus


Public gene expression databases contain data about more than million biological samples, from hundreds of tissues and diseases. In principle, we know the expression pattern for all genes in these samples. Thus, we have a situation, where it is possible to carry out biological studies without performing new experiments. The size of the datasets, however, poses several challenges: appropriate analysis requires specific statistical skills, useful information is well hidden in the datasets and the analysis itself is time consuming. All these reasons prevent the wider usage of public gene expression data. The goal of this thesis is to facilitate re-use of expression data by developing analysis methods and tools. One of the biggest obstacles for re-using expression data is its accessibility. For that reason, we have created two web environments that allow to run complex analysis pipelines on public gene expression data. First of those visualises embryonic stem cell data from FunGenES consortium. The other allows to search for genes with similar behaviour across hundreds of public datasets. By performing analyses over multiple datasets there will be eventually need for integration of the results. For this task we created a rank aggregation algorithm that is specifically designed for lists of genes. When studying multiple datasets it is important to have good overview of their contents. To allow rapid functional characterization of datasets, we have created a visualisation method that can create compact but informative visual summaries of the data. Methods and tools described here, have been created with practical considerations in mind and have already been used in various studies.