If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Homogeneous cells appear heterogeneous due to limited sampling and repeatability
Heterogeneity bias increases with attraction/repulsion between cells
Movement in confined environments decreases apparent heterogeneity
Hypothetical applications in neural crest and in vitro cancer systems
Cell population heterogeneity is increasingly a focus of inquiry in biological research. For example, cell migration studies have investigated the heterogeneity of invasiveness and taxis in development, wound healing, and cancer. However, relatively little effort has been devoted to exploring when heterogeneity is mechanistically relevant and how to reliably measure it. Statistical methods from the animal movement literature offer the potential to analyze heterogeneity in collections of cell tracking data. A popular measure of heterogeneity, which we use here as an example, is the distribution of delays in directional cross-correlation. Employing a suitably generic, yet minimal, model of collective cell movement in three dimensions, we show how using such measures to quantify heterogeneity in tracking data can result in the inference of heterogeneity where there is none. Our study highlights a potential pitfall in the statistical analysis of cell population heterogeneity, and we argue that this can be mitigated by the appropriate choice of null models.
Collective migration of cell populations plays an important role in development, regeneration, and disease. Evidence is mounting that population heterogeneity functionally contributes to the collective behavior of cells in many systems. One form of heterogeneity that is frequently studied is that of leader and follower cell states within a population. This has been investigated in the migration of the zebrafish lateral line primordium (
However, leader-follower heterogeneity is not found in all collectively migrating cell populations, and mathematical models are able to produce collective migration of a group of identical agents, calling into question the need for such heterogeneity (see discussion in Box 2 in
), indicates that population heterogeneity may often emerge from the interaction of cells with microenvironmental signals. Recent theoretical work has also shown that (proliferative) heterogeneity can arise from constraints, such as confinement, alone (
). Together these pieces of evidence raise the question of how cell interactions and microenvironmental conditions can induce, promote, or otherwise affect the heterogeneity of a cell population, and our ability to measure it.
Cell-tracking data provide a major source for evidence of heterogeneity of movement. Advances in three-dimensional (3D) imaging and computational tracking of complete cell populations in vivo or in realistic in vitro assays give rise to rich datasets amenable to measuring distributions of statistics of interest. For example, recent efforts by
) to measure cross-correlational delay times of mammalian cell cohorts moving in a 3D extracellular matrix gel. Such methods could be of great interest, for example to characterize the invasiveness of cancer cells from different tumor samples and assess the efficacy of potential metastasis-inhibiting treatments. However, unlike the animal experiments for which such methods were initially developed, cell biology assays are limited in observation time, and typically cannot be repeated with the same cells while retaining individual identification. Both these shortcomings result in smaller datasets, which are more prone to spurious correlations from chance.
As researchers we are faced with the problem of when, and when not to, include heterogeneity in our mechanistic descriptions of collective cell migration. In short, when we observe what appears to be heterogeneity, is this an intrinsic property of the system, an emergent phenomenon, or a statistical artifact? Here, we take inspiration from previous studies to highlight potential pitfalls in the analysis of tracking data from cell collectives. We develop a suitably versatile model for 3D collective cell migration and use this to generate trajectory data which we, in turn, analyze with a commonly used measure of heterogeneity, the distribution of delay times in directional cross-correlation. By showing that, under this protocol, we can measure heterogeneity in the form of leader-follower relationships despite individuals being identical, we demonstrate that care must be taken when using this, or similar, measures of heterogeneity. Our results highlight that appropriate null models must be carefully dedicated to particular experiments to assess the statistical significance of observed correlations. By analyzing the systematic bias of our analysis toward apparent heterogeneity as a function of model parameters and boundary conditions, we propose two hypothetical applications of our work that could be tested using in vivo and in vitro cell-tracking data. We conclude by discussing alternative measures of heterogeneity, further model extensions, and potential avenues for the development of statistical tests.
have previously developed a self-propelled particle (SPP) model for collective migration in two dimensions. We here extend this model to three dimensions, but for simplicity first explain the basic model components in two dimensions. The direction of movement, θ, of cell i at time t+1 is given by
In this equation, three terms independently influence the direction of movement, variously depending on the separation, rij, between cells i and j: (1) alignment with the movement direction of all neighboring cells (within distance re, where is the velocity of cell j at time t, for simplicity assumed to be of fixed speed), scaled by parameter α ≥ 0; (2) intercellular forces, fij, i.e., attraction/repulsion toward/away from neighboring cells (within distance r0), scaled by parameter β ≥ 0; (3) noise, here chosen to be a random vector on the unit sphere, , which scales with the number of neighbors, Ni (within distance r0, including cell i itself), to represent the uncertainty in the forces from neighboring cells (
). The noise term is controlled by parameter η ≥ 0, and without loss of generality we have set η = 1 in the simulations presented here. Interactions between cells are further restricted to nearest Voronoi neighbors only, even if more cells are within the respective distance cut-off. Figure S1 illustrates the various interaction zones.
Extension to Three Dimensions
In the 3D model we have developed here, the direction of movement is parameterized by two angles, the azimuthal angle, θ (from −π to π), and the polar angle, ϕ (from 0 to π), which are updated according to
where the interactions in (Equation 2) are again between nearest neighbors only. Cell positions for cell i are updated via , with fixed speed v0 = 0.05, respecting boundary conditions. For details on the form of the intercellular forces, see Method Details in the STAR Methods. In this paper, we have implemented both free (cells are unconfined) and no-flux (reflective) boundary conditions. A similar 3D model has also been developed by
to restrict interactions to nearest Voronoi neighbors.
In the work shown here, we primarily focus on homogeneous populations and how they can appear heterogeneous. To explicitly include heterogeneity, we repeated a subset of our simulations with a few “informed” cells, or cells in a leader state. The randomly chosen subset of informed cells aligns, not with their neighbors, but with a prescribed direction (here chosen to be the x-axis, without loss of generality). This preferred direction could represent, for example, a chemoattractant gradient or other directional cue. The alignment strength, α, is the same as for the other cells, and other cells still align with the informed cell if they are neighboring them. We chose 10% of cells to be in such a leader state, as similarly small fractions have been reported as sufficient to affect the overall population behavior in migrating cell populations (
The mathematical model has been implemented in MATLAB. Unless stated otherwise, 3D simulations were run without confinement (free boundary conditions), for 100 cells whose initial positions were chosen randomly from within a cube of edge length L = 2. To alleviate the influence of transient behavior dependent on initial conditions, we run simulations for 1,000 time-steps and ignore the first 500 time-steps in our analysis, inspecting the order parameter (see Global Model Behavior in the STAR Methods) plotted against time to validate that this is sufficient to reach a (quasi-)steady state. We find that groups of cells move in an ordered (globally aligned) manner for high values of α and β (Figure S3). Example trajectories (Figure 1) reveal moving streams and slowly translating clusters that break apart as the strength of intercellular forces, β, becomes comparable to the noise strength, η = 1, or lower. At low α and high β, groups of cells are in static, positionally ordered, arrangements (Figure 1).
Delay Correlation Analysis
To measure heterogeneity of movement within a group of moving cells, we measure the extent to which cells are following one another using delay correlation analysis. For two cells i and j the directional cross-correlation at lag τ is calculated as
where ⋅ denotes the scalar product and time-averaging. The peak-delay time, τC, for a pair of cells is that for which is highest in the observed data. By binning the peak-delay times for all cells in the population, we obtain the sample distribution P(τC), so that P(τ1) = 1 would mean that all pairs of cells have their highest directional cross-correlation at lag τ1, and P(τ0) = 0 means no pair of cells has peak correlation at lag τ0 (note that P(τC) is symmetric about τ = 0 since ). Thus, P(τC) constitutes a population-scale measure of heterogeneity.
To only consider cohesively moving populations, we restrict our analysis to model realizations with directional order above a minimum threshold, that we set at . Within those simulation results, we then find peak-delay times for any pairs of cells which have been within interaction radius r0 and have directional cross-correlation of Cmin = 0.5 or greater at at least one time point. An illustrative example of the resulting distribution of peak-delay times is shown in Figure 2A, indicating that the width of the distribution increases with β. To test whether some cells showed consistently non-zero lag times (either leading ahead or following behind others), we also calculate the distribution of peak-delay times for the directional correlation between individual cells and all other cells in the population. An example of the individual distributions is shown in Figure 2B. For a more comprehensive view, we calculate the SD of the peak-delay distribution, σ(τC), as a measure of the width of the distribution, which can be seen to increase consistently with increasing β over a range of values for α (Figure 2C). The SD is also high for low values of β at some values of α, but this trend is not consistent across the range of α values. Increasing the alignment strength, α, for fixed β does not increase the measure of heterogeneity. In summary, stronger attraction/repulsion between cells leads to increased apparent heterogeneity.
When we included heterogeneity in our simulations, we found that the increase in apparent heterogeneity is less strong for high alignment strength (Figure 2D). This can be understood as strong alignment with a prescribed direction suppressing directional fluctuations. The model with heterogeneity fit the experimental data (see Box 1 and Comparison with Experimental Data in the STAR Methods) at lower values of the interaction strength parameter (Figure B1) than the homogeneous model. This illustrates the cohesive effect that even small numbers of cells in a leader state can have on the population.
Neural crest (NC) cells display a wide range of migratory patterns in different organisms and embryonal locations, yet a unified mechanistic understanding has so far eluded the research community. For example, in chick cranial NC migration, data on cell behavior and gene expression, as well as mathematical modeling, have suggested a collective migration mechanism whereby the cell population is divided into leader and follower states (
). Measures of heterogeneity used in that study include the directional correlation of cells with the migratory route, and change of relative position of cells within the group. As only a few tens of cells per embryo were analyzed, an appropriate null model could help to answer how likely one is to see the differences observed in homogeneous populations. Furthermore, as cranial NC cells migrate in streams and trunk NC cells migrate in narrower chains, appropriate models could explore how different microenvironmental conditions with varying degrees of confinement can affect the chances of observing differences in measures of heterogeneity.
In our delay correlation analysis we find that confinement tends to decrease the apparent heterogeneity (Figure 3E) for a given parameter combination and population density. When including a fraction of 10% leader-like cells, confinement still decreases the measure of heterogeneity (compared to no confinement), but increasing confinement has less effect on the ability to detect heterogeneity (Figure 3F). For similar analysis of in vivo tracking data from trunk and cranial NC cells, we can thus predict the following: If decreased heterogeneity was measured in the trunk versus head, the populations may, in fact, both be homogeneous. This would suggest that leader-like cell states may not exist in vivo, as the observations (differences in heterogeneity) can be explained without them. If the measure of heterogeneity was greater in the trunk, then both locations could host heterogeneous populations. Such an observation would support the notion that cells adopt leader-like states, or otherwise respond differentially to guidance cues, which could be of biological importance.
Heterogeneity of cell migration is of great interest to cancer research, for example to identify whether a subpopulation of cells in a tumor is more invasive, and how this invasive behavior can be modulated by drugs or microenvironmental properties. Recent studies of mammalian cell cohorts in 3D environments have used delay correlation analysis to quantify heterogeneity (
). The authors found a range of delay distributions of different cell clusters, indicating potential evidence for heterogeneity of movement between different cell clusters (although making no direct claim as to any underlying heterogeneity of cell states themselves).
To distinguish between transient artifacts of small sample sizes and spatiotemporal heterogeneity as an intrinsic property of a cell collective, one has to compare the delay distributions with those generated from a suitable null model. To illustrate this, we simulated our model with lower cell numbers, representing the size of cell cohorts observed by
(see Comparison with Experimental Data in the STAR Methods). Our results (Figure B1) show that existing data are broadly consistent with results obtained from a model without heterogeneity, as well as a model with a few cells striving to move in a particular direction. Here, heterogeneity cannot be inferred from the width of the peak-delay distribution alone. With heterogeneity, the model is best fit at lower interaction strengths for either dataset. The best fit for each of the datasets is at different simulation parameters, suggesting inter-cluster heterogeneity in the experimental observations, but a greater number of comparable experimental datasets is required to make any such inference robust. However, the point of this illustration is explicitly not to argue that our model is the best description for the data, but that one cannot deduce heterogeneity from the measurements without consideration of an appropriate null model.
Furthermore, we found that stronger attraction-repulsion between cells can increase apparent heterogeneity in a population of identical cells (Figure 2A). This leads to an intriguing hypothesis: When tracking cell clusters that undergo a transition to become more invasive, they may appear less heterogeneous as the cluster loosens and cells start to break free from the attachments to their neighbors.
In this paper, we have used a minimal model of collective cell migration to highlight potential pitfalls in measuring heterogeneity from cell-tracking data. We have presented a model for 3D collective cell migration that can generate trajectories of moving cell populations with a variety of collective behaviors (Figure 1). Focusing on cohesively moving cell populations, we analyzed the heterogeneity of migration using delay correlation analysis, demonstrating that non-zero heterogeneity can be measured in a homogeneous population with interactions (Figure 2A). This apparent heterogeneity stemmed not just from cells with broad, but symmetric, individual delay distributions, but also from cells whose correlation with the rest of the population peaked consistently at non-zero delays (Figure 2B), thus capturing leading and lagging on the timescales of observation. We further found that this bias toward apparent heterogeneity increases with stronger intercellular forces (Figure 2C), but not consistently so with stronger alignment. By applying no-flux boundary conditions, we investigated how confinement affects the appearance of heterogeneity and found that, for a given population density, more narrowly confined populations appear less heterogeneous (Figure 3E). Finally, we suggested two biological applications where our results may be relevant: (A) in the study of NC cell migration in different embryonal microenvironments, and (B) when cancer cells undergo a transition to become more invasive (see Box 1). Based on the insight gained from our modeling study, we considered potential in vitro, as well as in vivo, experiments and hypothesized their outcomes.
Our goal was to present an illustrative example using a suitably generic model of collective movement, rather than to construct the most realistic model of collective cell migration—which will differ with each biological application. Even within this constraint, other modeling choices are possible: For example, the explicit alignment term in our model equations might seem unrealistic. In an alternative SPP model for collective cell migration,
have shown that short-range adhesive forces can be equivalent to an alignment term. We therefore anticipate that the results of our simulations and analysis would not change qualitatively if alignment of movement directions was mediated through another type of interaction, such as short-range adhesion.
To quantify heterogeneity of collective movement, we chose the popular method of delay correlation analysis (
). The idea behind this method is to calculate to what extent some cells move first, and others follow. Alternative methods to determine the (directional) coupling between cells, other than cross-correlation, are available, such as causal information flow (
). Measures of heterogeneity other than delay in directional coupling could be used to complement the analysis. One example is the degree of rearrangement of cells' relative position, which can be quantified as “neighbor overlap” (
) can be utilized to assess whether genetic heterogeneities of cell populations are statistically significant in the first place. Even if genetic heterogeneity can be reliably measured, it may still be of interest to correlate this with behavioral analysis, e.g., cell tracking.
In vivo, cell population heterogeneity is often located at the boundary of a migrating collective (
). Thus, it is natural to ask whether cells at the boundary of our simulated populations also show increased heterogeneity. Cells at the boundary have fewer neighbors to interact with than cells at the core, which influences the direction of movement through alignment, attraction-repulsion, and noise (Equation 2). However, when analyzing cells at the boundary and core of the population separately, we found no difference in peak-delay distributions (Figure S4). Therefore, the apparent heterogeneity that can arise from interactions between identical agents is not an “edge effect.”
In an extension of our model we included cells in a leader state, which align with a prescribed direction instead of with their neighboring cells. For this particular form of cell population heterogeneity, different implementations could be considered within the SPP modeling framework. For example,
have used an SPP model with asymmetric alignment interactions to represent tumor-stromal interactions.
Our work illustrates how different choices of null models can affect the interpretation of heterogeneity in cell population data. Comparing different versions of our model, with and without heterogeneity, with experimental cell-tracking data (Figure B1), we showed that either was able to capture the width of the peak-delay distribution, but different parameter values gave the best fit in each case. Without consideration of the homogeneous model, one may have been led to conclude heterogeneous motility in the tracked cell clusters. Even with both choices of null model to compare one could not strongly differentiate, based on the data at hand, between a homogeneous cell population and a less strongly interacting heterogeneous population—they yield peak-delay distributions of similar width. This fact can be used by experimental researchers to assess limitations in their data, such as observation time, sampling frequency, and/or number of replicates.
Going beyond simulation studies, random matrix theory can be used to rigorously quantify the expected correlations between a collection of random variables. This branch of statistics has numerous and fruitful applications in physics, finance (
). Analytical tractability of random matrix statistics, however, decreases drastically when venturing beyond independent Gaussian random variables. In collective cell migration, and in biology more generally, relevant null models fall into this territory more often than not. We may never see a random matrix theory of collective cell migration, but perhaps we can use it as inspiration to make headway with numerical calculation and computational modeling.
To close, we would like to loosely suggest best practices to avoid spurious correlations in complex biological systems. We ought to maximize observation time in a given experiment, as much as is reasonable without sacrificing the stability of the experimental system. When interpreting the results, it is imperative that we think carefully about what the null hypothesis is, and be aware that the usual tests for statistical significance may not apply. When comparing different biological systems, or observing the change in a system over time, we should also try to compare these with tailored, yet simple, simulations to sanity-check at least the qualitative differences or trends we observe. With this study, we hope to have contributed to a conceptual guideline for researchers interested in quantifying heterogeneity of cell populations, a field we expect to grow substantially with the increasing abundance of cell-tracking data.
Contact for Reagent and Resource Sharing
As Lead Contact, Linus Schumacher is responsible for all requests for further information Please contact, Linus Schumacher ( firstname.lastname@example.org ) with requests and inquiries.
Intercellular forces point in the direction of the unit vector eij between cells i and j, i.e., fij = f(rij)eij. The magnitude of the force, f(rij), was chosen to reflect hard core repulsion at short distances (informally described as a force of infinite magnitude), spring-like attraction-repulsion at intermediate distances, and no force at large distances. Specifically, the magnitude of the force between two cells i and j separated by a distance rij is given by
Here, rc denotes the core radius, re the equilibrium cell separation, ra the attraction radius, and r0 the interaction radius, above which cells cease to exert forces on each other. Default parameter values used here are r0 = 0.2, re = 0.5, ra = 0.8, and r0 = 1. In our numerical implementation we approximate the magnitude of infinite force (volume exclusion) by exp(100). The exponential regime was chosen to represent de-adhesion processes for increasing cell-cell separation. This does not accurately reflect forces for cells forming a new contact (decreasing distances), which would require a force-law exhibiting hysteresis. However, we have opted for the simpler form above as our simulations are of cell collectives in contact and we are here not investigating cases when these become cohesive after being initially separated. Thus, the form of intercellular force chosen is sufficient for our purpose. Figure S2 compares our choice of intercellular force with that of
First, we quantify global alignment of velocities vi for N cells as
where the second equality holds for constant speed v0. Thus, a value of Φ = 1 means all cells are perfectly aligned, while a value of Φ = 0 means complete disorder (or the special case of perfect anti-alignment, which we do not observe in our simulations). We use this order parameter to globally characterise the model behaviour, and thus restrict our analysis to instances of the model that correspond to the movement of ordered cell collectives only.
conduct analysis of mammalian cell cohorts in 3D. In particular, they report broad peak-delay distributions of clusters of Nc1 = 10 and Nc2 = 19 cells. We chose to represent these with N = 10 and N = 20 cells, respectively, in our computational experiments. To keep the cell density consistent, we chose the (random) initial cell positions to lie within a cube of length L = 0.9 and L = 1.1, respectively. Given these dimensions, our cell speed of v0 = 0.05 approximately matches the displacement of cells (2.0–2.4 μm per frame, with an initial cluster diameter of ≈50μm, 2.4/50 = 0.048) seen in the datasets analysed. The experimental distributions were reproduced by reading the height of the bars of Figures 4C and D in
Conceptualization, Methodology, Software, Validation, Formal Analysis, Investigation, Data Curation, Writing – Original Draft, L.J.S.; Writing – Review & Editing, Visualization, Supervision, Project Administration, L.J.S., P.K.M., and R.E.B.; Funding Acquisition, L.J.S. and R.E.B.
The authors thank Yasha Sharma, Diego Vargas, Muhammad Zaman, and Jeremy Green for insightful discussions. L.J.S. was funded through a Doctoral Prize award by the UK Engineering and Physical Sciences Research Council (grant number EP/M508111/1 ).