Research projects

I focused on these projects during the last 5 years of my academic research.

Inferring pathway activities from gene expression data

High throughput transcriptomic measurements (microarray or RNAseq) became one of the key data sources of systems biology studies. To reduce noise and foster functional interpretability, analysis of gene sets (e.g.: pathway members, genes with similar function etc.) is a standard method in this field. However several recent analyses showed that “data driven” gene sets (e.g.: expression of genes regulated by the same pathway) outperform conventional “knowledge based” gene sets (e.g. expression of pathway member genes) in the context of phenotype association. By creating “data driven” gene sets from different large scale perturbation gene expression studies (like LINCS-L1000) we try to infer pathway activities in cancer models and patient samples, and associate them with different phenotypes (drug sensitivity, survival etc.) to identify potential biomarkers and get mechanistic insight. We concentrate especially to pathways associated with the signalling of G-protein Coupled Receptors (GPCRs).
Machine learning based prediction of drug sensitivity in cancer

Cancer cell lines are valuable model systems for oncology disease. The large collection of pharmacogenomic datasets (like GDSC or CTRP) measures the sensitivity of hundreds of cancer cell lines against hundreds of (approved or experimental) drugs. Using multi-task machine learning models (predicting drug sensitivity for multiple drugs and cell lines with a single model) with cell line and drug specific features makes it possible to predict sensitivity for unknown cell lines and / or drugs, thus can significantly improve computational drug discovery. However the interpretation of these multi-task models is confounded by different factors both from cell line and drug side. We are working on the identification and removal of these confounding factors to increase the performance of machine learning models.
DREAM Challenges

Correct benchmarking of methods in computational biology is essential for selecting gold standard methods and also for correct biological interpretation. While benchmarking is an essential step of any computation biology study, it can be easily (and even unintentionally) biased. The DREAM Challenges open science effort runs collaborative computational biology challenges, where the computational models developed by participants are tested against unseen data, providing one of the best current benchmarking schemes. We frequently participate in DREAM Challenges, and reached top performing results several times.