Chapter 4 Application of Independent Component Analysis to Tumor Transcriptomes Reveals Specific and Reproducible Immune-Related Signals

Urszula Czerwinska, Laura Cantini, Ulykbek Kairov, Emmanuel Barillot, Andrei Zinovyev

Published in  Lecture Notes in Computer Science book series (LNCS, volume 10891) at conference LVA/ICA 2018: Latent Variable Analysis and Signal Separation, 6 June 2018

4.1 Context

LVA/ICA conference is an interdisciplinary conference that gathers researches working on Latent Variable Analysis and Signal Separation in different fields of application. Works presented at LVA/ICA conference can be both methodological and application works. Submitted papers are reviewed by at least three members of the Technical Program Committee (TPC) or by competent additional reviewers assigned by the TPC members.

I have decided to expose my work on immune-related signals obtained using the fastICA algorithm to the signal deconvolution community in order to receive feedback from the experts of the field. It was a great chance to systematize my findings on overdecomposition of breast cancer transcriptomes and describe them in details. I did not aim to expand biological interpretation in this work given the technical character of the conference.

4.2 Description

In this work, I applied ICA to six breast cancer datasets in a way to obtain a high number of sources (\(k \gg MSTD\) as described in the previous chapter). Then, I identified the sources related to the immune signals in all of the datasets. Finally, I concluded that three cell-types could be identified: T-cell, B-cell and Myeloid cells in most of the datasets.

I am using the protocol of decomposition defined in the previous chapter:

  1. Compute the MSTD for each dataset
  2. If number of samples is >100 then decompose to \(k\)= 100, otherwise to the maximal possible number of components (\(k\) = \(m\)) (assuming that 100 \(\gg\) 25 - the average MSTD)
  3. Interpret the components with the gene enrichment methods

I also improved this over-mentioned protocol with additional steps that will be later included in DeconICA R package described in chapter 6.

  1. The components of the \(S\) matrix are oriented in the direction of the heavy tail (the side of an ICA component with absolute higher weights) so that the top genes are always at the positive side.

  2. The components are interpreted through correlations with two panels:

    • reference metagenes, published in (Biton et al. 2014) - the factors present in most tumor transcriptomes
    • immune cell-type signatures used by CIBERSORT (Newman et al. 2015) as pure cell-type profiles
    • reciprocity was a condition to label the components with reference metagenes and maximal correlation for immune cell-type signatures

4.3 Impact on the further work

With six independent datasets, I validated the hypothesis that using decompositions of \(k \gg MSTD\). It is possible to compute signals corresponding to immune cell types in tumor transcriptomes.

I was also able to test different ways to characterize components and chose the one giving the most consistent results in many datasets. This work was an important step that enabled me to develop an R package and apply it to over 100 cancer datasets.

In the next chapter, I will show a comparison of this approach with an alternative method: Nonnegative Matrix Factorisation (NMF).

Article is available online: https://link.springer.com/chapter/10.1007%2F978-3-319-93764-9_46

References

Biton, Anne, Isabelle Bernard-Pierrot, Yinjun Lou, Clémentine Krucker, Elodie Chapeaublanc, Carlota Rubio-Pérez, Nuria López-Bigas, et al. 2014. “Independent Component Analysis Uncovers the Landscape of the Bladder Tumor Transcriptome and Reveals Insights into Luminal and Basal Subtypes.” Cell Rep. 9 (4): 1235–45. doi:10.1016/j.celrep.2014.10.035.

Newman, Aaron M, Chih Long Liu, Michael R Green, Andrew J Gentles, Weiguo Feng, Yue Xu, Chuong D Hoang, Maximilian Diehn, and Ash A Alizadeh. 2015. “Robust enumeration of cell subsets from tissue expression profiles.” Nat. Methods 12 (5). Nature Publishing Group: 453–57. doi:10.1038/nmeth.3337.