Towards a linearly organized embedding space of biological networks

Friday 5th April 2024 12:00 EEST

 

Dr. Alexandros Xenos

ABSTRACT

Low-dimensional embeddings are a cornerstone in the modelling and analysis of complex biological networks. Embedding biological networks is challenging, as it involves capturing both structural (topological) and semantic information of a graph (i.e., node labels). Typically, nodes with the same label are in the same dense subgraph (neighborhood-based similarity), but it has been shown that similarly annotated nodes can be in different network neighbourhoods while having similar wiring patterns (topological similarity).

However, current network embedding algorithms do not preserve both types of similarity, which limits the information preserved in the embedding space. Moreover, the existing methods for analyzing the embedding space of molecular networks use the vectors of the biological entities as the input for computationally intensive ML models that aid downstream analysis tasks. In contrast, in the field of NLP, they mine the word embedding space directly by doing simple linear operations between the word embedding vectors.

In our work, following the NLP paradigm, we introduce novel random-walk-based embeddings that allows mining biological knowledge directly from the embedding space. Namely, we introduce embeddings that locate close in the space genes that have similar

biological functions (either topological or neighborhood-based similar nodes). We exploit this property to predict genes participating in protein complexes and to identify cancer-related genes based on the cosine similarities between the vector representations of the genes. We also go beyond embeddings that preserve one type of similarity by using the graphlets (small, connected and induced subgraphs) to represent the network and then generate random-walks in the transformed networks.

Finally, we analyze whether it is an intrinsic property in the structure of the data (input matrix representation) that yield embedding spaces that enable downstream analysis tasks via simple linear operations. We demonstrate that the more homophilic the input network matrix representation is, the more linearly organized the resulting embedding space is, and hence, the less needed complex machine learning approaches to perform downstream analysis are. We showcase in nine multi-label (biological) and seven single-label networks that our graphlet-based methodologies embed networks in more linear spaces, alleviating the need for computationally expensive ML methods.

LECTURER SHORT CV

Alexandros Xenos is a Postdoctoral Researcher in the Integrative Computational Network Biology (ICONBI) group led by Prof. Przulj at Barcelona Supercomputing Center. He holds a Ph.D. in Artificial Intelligence (Computer Science) from the Technical University of Catalonia (UPC) and an integrated master’s from the School of Applied Mathematics of the National Technical University of Athens. During his PhD, he did a four-month research visit at Harvard Medical School, where he worked in single-cell contextual embeddings under the supervision of Prof. Zitnik. His research interests are at the intersection of network science, machine learning for data fusion and artificial intelligence. His work focuses on designing embedding methods that represent biological networks in spaces that enable downstream analysis tasks with simple linear operations, alleviating the need for computational-intensive ML models.

Lecture Organizers

This lecture is jointly offered by Archimedes – AI and Data Science Research Center (Athena Research Center) and AIDA.

Microsoft Teams meeting

Microsoft Teams meeting
Meeting ID: 320 137 087 421
Passcode: MmC6Ex

ON-SITE PARTICIPATION

Archimedes Amphitheater (1 Artemidos, Marousi, ground floor), Athens, Greece

More events