Online Mini-symposium on Probabilistic and Topological Methods for Biological Data

as part of the SIAM Conference on Mathematics of Data Science (MDS20)

Topology session

Date: June 11, 2020.
Time: 1:00 PM - 3:00 PM (EST)


Francis C. Motta
Florida Atlantic University

Evidence of an Intrinsic Clock and Host Parasite Coupling in the Intraerythrocytic Developmental Cycle of Plasmodium.

Marilyn Vazquez
The Ohio State University

A Consistent Density-Based Clustering Algorithm and its Application to Image Segmentation.

Maria-Veronica Ciocanel
The Ohio State University

Topological Data Analysis for Ring Channels in Cell Biology

Manuchehr Aminian
Colorado State University

Identifying generators of topological features in real data


Living things did not evolve and do not exist as autonomous elements, but instead represent the nodes of a complex network of interacting dynamical systems, coupled at multiple scales to each other and their environment. For instance, many biological organisms across kingdoms exhibit intrinsic, free-running circadian rhythms that are controlled by gene regulatory networks and are coupled to the rhythmic forcing of the day-night cycle. Although there is evidence that Plasmodium---the causative agent of malaria infection---leverages the intrinsic periodicity of the host for its own purposes, it remains unknown if these protozoans possess a regulatory network controlling the precise timing of their intraerythrocytic developmental cycles (IDC). In this talk we model the variability of IDC progression rates across a population of parasites, and compare it to the intrinsic period variability seen in free-running circadian oscillators, to show that the IDC exhibits properties that are consistent with those of well-tuned biological clocks. We also discuss a data-driven analysis of a singular ex vivo experiment that captures the transcript dynamics of P. vivax together with the dynamic gene expression of 9 human hosts that suggests host-parasite coupling at the level of their transcriptional programs.


Data clustering is a fundamental task for discovering patterns in data, and is central to machine learning. Often, a data set is assumed to live in a manifold and be sampled according to a probability measure. Then the clusters can be defined as peaks in the sampled probability density, and a clustering algorithm would need to identify the peaks in the density to compute the clusters. Some of the challenges in this approach include the non-uniform sampling of the density and the bridges between peaks of the density. To solve these problems, we propose a new clustering algorithm that divides the clustering problem into three steps: picking a good threshold on the sample density to separate the peaks, clustering the superlevel set at the chosen threshold, and classifying the remaining points. We explain the key details of these steps, and provide theoretical assurances on the performance. As an important application, we show how to apply this method to segment images by considering the images as a point-cloud of image patches. We present results on images of various biological systems.


Contractile rings are cellular structures made of actin filaments and are important in development, wound healing, and cell division. In the worm model organisms, ring channels allow nutrient exchange to the developing egg cells and are regulated by forces exerted by myosin motor proteins. I will present an agent-based modeling and data analysis framework for the interactions between actin filaments and myosin motor proteins. This approach provides key insights for the mechanistic differences between two motors that are believed to maintain the rings at a constant diameter. In particular, we propose tools from topological data analysis to understand time-series data of filamentous network interactions. Our proposed methods clearly reveal the impact of certain parameters on significant topological circle formation, thus giving insight into these biological ring channels.


When we work with a synthetic data set in persistent homology, such as the classic "noisy circle", our intuition allows us to tie back a computed topological feature - for instance, a birth/death pair - back to the expected "loop" structure of the data. However, we cannot apply such intuition as easily if we do not "know the answer" in advance. This is a problem if we need concrete answers to question such as "why is this birth/death pair present," and more specifically "what subset of the data points can well-represent this pair?"

I will give my perspective as an applied mathematician on this problem and present my preliminary work building an interface in Python to directly associate computed topological features to their generators, with application to two separate projects: in studying protein structure, and in drawing knowledge about human immune response to influenza-like illnesses in patients in the first few days after exposure.