Chapter 10 Conclusions and perspectives

10.1 Conclusions

This thesis described methods and results of applying unsupervised deconvolution to bulk omic data to extract cell-type specific signals.

The first contribution of this thesis is the review of deconvolution tools, including very recent ones that illustrate the diversity of the approaches to the bulk transcriptome deconvolution problems.

The second contribution is the work on methodological aspects of ICA deconvolution. I participated in the definition of Most Stable Transcriptomic Dimension (MSTD) index, and I redefined the way to apply ICA to extract cell-type related signals (overdecomposition) best. I demonstrated that ICA-based signals are reproducible in breast cancer and that the interpretability of ICA is higher than brunet version of NMF.

The third contribution is the DeconICA method for omic data deconvolution through immune components and the R package published online. DeconICA allows detection of immune cell-type signals from tumor bulk data and quantification of their abundance. The tool is not limited to ICA-based decomposition interpretation and can be easily used with different metagene generating methods. The R package has extensive documentation and tutorials that help the user to use the method autonomously. The performance of the DeconICA was evaluated in PBMC transcriptome and concluded to obtain better performance for extraction of some of the cell-type signals than the state-of-the-art published methods.

The fourth contribution is the pan-cancer DeconICA deconvolution study in which signals from 119 datasets of the bulk transcriptome, of 30 tumor types, with a total of 26561 samples and additional five scRNA-seq datasets from five different tumor types were analyzed. On the one hand, the ongoing analysis highlighted detection limits of immune cell signals in some tumor types. On the other hand, I focused on T-cell signal analysis which revealed that there is a heterogeneity of t-cell signals that was identified in T-cell signatures extracted from bulk transcriptome. Further analyses will show if this diversity has a link with patient survival or impacts other immune populations.

Finally, I contributed to a study of heterogeneity of NK and Macrophages based on scRNA-seq transcriptomic data illustrating distinct cell states of the mentioned cell types revealed thanks to a new resource: Innate immune map (of the tumor microenvironment).

10.2 Perspectives

Hopefully, the achievements and findings of the thesis will not finish with the Ph.D. project itself. Many directions can be employed to continue presented work.

In the first place, the DeconICA package can still be improved. Actual compatibility of the tool with other BSS/Matrix factorization methods should be illustrated with examples, and future adjustments can be integrated into the R package. The reference signatures (for cell types and biological processes) can be extended with new signatures, i.e., based on single-cell technology if proven to bring a better interpretation to bulk decompositions. A graphical web-based interface could be a real added value and should be realized in the near future. The applicability of DeconICA to other data types, for instance, methylome is to be demonstrated.

There is a wide array of possibilities on how the analysis of ICA-based deconvoluted immune landscape can be continued. The ways I consider to be employed before journal publication of the results are:

incorporation of clinical and survival data (when available): test for correlations between clinical features and immune cell infiltration, compare survival of patients with high and low infiltrate of different immune cell types
better study signal reproducibility in different tumor types, better understand why in some tumor types extracted cell-type signals are closer to the reference than others.
analysis of the diversity of Myeloid cells, B cells, CAFs, mast cells
study of the relationship of immune signatures and cell cycle using bulk and single cell data

In a long-term perspective, the possible biological findings resulting from this work, concerning a gene or a set of genes, that would novel in the cancer immunity context could be validated in vitro through our partnership with the team of Vassili Soumelis.

From the more general point of view, this work could be extended in a multi-omic manner. Many groups proposed ways to combine multilevel data. Would the analysis of the immune infiltrates be more meaningful if other data types were used simultaneously?

The primary constraints for all algorithms applied to biological data are the amount of data (efficiency of the algorithm) and the course of dimensionality (large p, small n ) (Angerer et al. 2017; Maia et al. 2017). Different data types can have specific difficulties (sparsity, missing values, drop out). Therefore in the multi-omics integration, one needs to cope with all the constraints of different data types simultaneously and the integration problem itself.

One possibility is to employ the tensor decomposition that allows simultaneous decomposition of multidimensional matrices (Teschendorff et al. 2018; Taguchi 2017) (called orders in tensors jargon). Also, joint NMF was proposed to factorize omic data (Zhang et al. 2012). Late integration can also be considered: applying algorithms to multi-omic data independently and integrate them a posteriori, through a consensus (Bonnet, Calzone, and Michoel 2015). Many methods were developed for multi-omics integration (Huang, Chaudhary, and Garmire 2017), little literature is available on the profit of multi-omics integration on the extraction of immune-related signals from bulk cancer data.

A significant constraint of unsupervised approaches is the need to use data including high variability and therefore, many samples. In theory, it could be possible to compute values for a new single sample given a space established by other samples. In practice, the values predicted for the new samples should be carefully verified for possible biases.

Finally, the blind deconvolution approaches can be applied to detect different signals from diverse tissues. For their interpretation adequate reference profiles or known signatures are necessary. Also for single-cell data, blind deconvolution can be a powerful tool to unveil new cell states.

References

Angerer, Philipp, Lukas Simon, Sophie Tritschler, F. Alexander Wolf, David Fischer, and Fabian J. Theis. 2017. “Single cells make big data: New challenges and opportunities in transcriptomics.” Curr. Opin. Syst. Biol. 4. Elsevier: 85–91. doi:10.1016/J.COISB.2017.07.004.

Maia, Ana-Teresa, Stephen-John Sammut, Ana Jacinta-Fernandes, and Suet-Feung Chin. 2017. “Big data in cancer genomics.” Curr. Opin. Syst. Biol. 4. Elsevier: 78–84. doi:10.1016/J.COISB.2017.07.007.

Teschendorff, Andrew E, Han Jing, Dirk S Paul, Joni Virta, and Klaus Nordhausen. 2018. “Tensorial blind source separation for improved analysis of multi-omic data.” Genome Biol. 19: 76. doi:10.1186/s13059-018-1455-8.

Taguchi, Y-h. 2017. “One-class Differential Expression Analysis using Tensor Decomposition-based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung Adenocarcinoma Cell Lines.” In 2017 Ieee 17th Int. Conf. Bioinforma. Bioeng., 131–38. IEEE. doi:10.1109/BIBE.2017.00-66.

Zhang, Shihua, Chun-Chi Liu, Wenyuan Li, Hui Shen, Peter W. Laird, and Xianghong Jasmine Zhou. 2012. “Discovery of multi-dimensional modules by integrative analysis of cancer genomic data.” Nucleic Acids Res. 40 (19). Oxford University Press: 9379–91. doi:10.1093/nar/gks725.

Bonnet, Eric, Laurence Calzone, and Tom Michoel. 2015. “Integrative Multi-omics Module Network Inference with Lemon-Tree.” Edited by Paul P Gardner. PLOS Comput. Biol. 11 (2): e1003983. doi:10.1371/journal.pcbi.1003983.

Huang, Sijia, Kumardeep Chaudhary, and Lana X Garmire. 2017. “More Is Better: Recent Progress in Multi-Omics Data Integration Methods.” Front. Genet. 8. Frontiers Media SA: 84. doi:10.3389/fgene.2017.00084.