Multivariate analysis in addressing the challenges of results validation in gene expression data processing
| Author | Affiliation | |
|---|---|---|
| Date | Start Page | End Page |
|---|---|---|
2025-08-27 | 58 | 58 |
Gene expression is a quantitative estimate reflecting how much the information encoded in a gene is turned into a functional “outcome”, usually proteins or RNA molecules. Evaluation of gene expression levels can reveal peculiarities of cellular processes related with particular pathogenic mechanisms. The construction of biomarkers of certain diseases can be based on gene expression estimates. However, processing, analyzing and generalizing the gene expression data is challenging due to the inescapable variation of technical conditions in sophisticated biochemical process. It results in so called “batch effects” in estimates, when artifact differences appear not due to the sought genetic differences, but due to the variety of technical conditions. The normalization of gene expression levels is the solution and a crucial step in obtaining reliable results. The expression of so called „House-keeping“ genes, ensuring the basic functions of the cells can serve as normalizing values in such cases. We propose Principal Component Analysis [1] approach to concentrate the correlated variety of gene expression values into uncorrelated principal components. In that way variety of gene expression estimates concentrated in the same principal component as the ones from “House
- keeping” genes will be concerned as technical artifacts, while the others will reflect the sought epigenetic phenomena. The proposed method was used to validate expression levels of targeted gene set related to various complexity brain tumors. Data were collected by Oxford Nanopore [2] direct RNA Sequencing. Potential biomarker genes were identified as showing statistically different in Glioblastoma vs Low-grade-glioma cases.