Abstract
Modeling the relationship between genomic features and therapeutic response is of central interest in pharmacogenomics [Musumarra et al., 2001]. The NCI-60 cancer data set with both gene expression and drug activity measurements provides an excellent opportunity for this modeling exercise. To correlate the gene expression profile with the drug activity pattern, we utilized a soft modeling technique called Partial Least Squares (PLS) [Tobias, 2000]. Soft modeling requires less stringent assumptions about the data than other modeling techniques [Falk et al., 1992]. A high level of collinearity in multidimensional gene expression profiles motivates us to undertake the PLS approach, which not only trims data redundancy but also exposes the underlying hidden functional units as latent features. It is believed that these functional gene groups play a key role in determining the efficacy of the cancer drugs to different cell lines (types of cancer). We have shown the efficacy of PLS in identifying drug resistant and drug sensitive genes. We have also investigated techniques to exploit the non-linear dependence between individual gene expressions in order to explain variations in the drug activity pattern, This is facilitated by a kernel function that implicitly carries out the regression in a higher-dimensional space where the data is linear [Christiannini et al., 2000]. The kernel-based non-linear approach is shown to be more effective in defining the correlation between the drug response and the gene expressions. The PLS approach, as implemented here, could be used to differentiate cancer cell lines between renal cancer and melanoma, for example, or different drug groups like Alkylating agents and Tubulin-active anti-mitotic agents.