If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Deep learning neural networks are a powerful tool in the analytical toolbox of modern microscopy, but they come with an exacting requirement for accurately annotated, ground truth cell images. Otesteanu et al. (2021) elegantly streamline this process, implementing network training by using patient-level rather than cell-level disease classification.
The advent of automated, high-throughput microscopy has revolutionized cell biology, allowing focus on the particular, e.g., detailed inspection of rare cells, or providing meta-level statistics of population metrics (
Early advances in machine learning delivered expert systems—computer models based on expert knowledge in which data metrics and the rules that linked them were user defined. Thus, in essence, the machine simulated the analytical steps of the human brain, bringing much enhanced speed and reliability (
). Over time the algorithmic operations of the machine have grown increasingly complex and opaque to the human user. This process has led us to today’s deep learning systems in which automated correlation, classification, and decision making is done within artificial neural architectures. Thus, we have progressed from machine learning that aimed to model the decision making processes of the brain to systems that mimic the brain itself. The benefit of this computational development are extremely powerful deep learning networks capable of discovering information on processes and interactions that is hidden within data patterns. The requirement for machine supervision still remains, however, because expert knowledge is needed to define the labeled datasets on which the deep learning networks are trained. It is this aspect that
). For example, in cell-based diagnostics, experts have to spend a great deal of time inspecting cell images and annotating them according to whether they correspond to a phenotype associated with a healthy or diseased patient. This annotated “ground-truth” dataset is then used to train the network to automatically recognize the designated cell types (
) (Figure 1). This approach is resource-heavy, requiring expert knowledge, a lot of time, and accuracy in data labeling. It also assumes a-priori knowledge of what cells are important and what they look like, but what about unknown populations? How can we use machine learning in the case of clearly indicated disease, with known physiological symptoms, but no knowledge of the cellular biomarkers of the pathology?
present a weak learning approach (iCellCnn) that removes the need for cell-level, ground truth annotation by training the neural network with a collection of cells labeled according to patient status. They term this a “bag of cells” approach, and its novelty lies in the use of patient-level classification (Figure 1). Instead of recognizing individual cells whose morphology indicates disease, the network is trained to recognize distributions of image features, collected from a population of cells. In the authors’ words they use “weak labeling of a set of inputs, instead of a strong labeling of individual inputs.”
The disease considered is a blood cancer, Sézary syndrome. This is a T cell lymphoma that is characterized by anomalous cerebriform (brain-like) morphology of T cell nuclei. In iCellCnn, multiple T cell images, obtained from an individual patient blood sample, form the input to a convolutional autoencoder—a feature-extracting neural network. This combines morphological feature information from all cells within a one-dimensional feature vector. This vector, an abstracted description of the blood sample, is used as an input to a random forest classifier which indicates the probability of the presence of diseased cells in the input cell collection (those of cerebriform morphology). Thus, the training of the machine learning model defines morphological patterns of disease at the cellular level in a data-driven manner. As all T cells are presented to the neural network, it learns to ignore the non-disease-specific cells that might confuse classification of patient status. The authors benchmark their approach by comparing diagnoses to those obtained by using a strong learning approach, implemented by prior labeling of individual cells as disease-associated or healthy. Two levels of annotation are adopted:
Naive, in which the status of the patient is assigned to all of their cells (i.e., 100% of cells from a healthy patient are labeled as healthy and vice versa for a diseased patient). This works on an assumption that patients with Sézary syndrome have an increased frequency of mutated T cell nuclei.
Manual, in which cells from a healthy patient are again naively annotated as healthy, and 1,000 expertly identified pathological T cells are annotated as diseased. Although disease-associated cells are explicitly identified, this approach still results in morphologically abnormal T cells from healthy individuals being labeled as “healthy,” so there is still the potential for distortion of the model predictions.
Although both strong and weak approaches were able to distinguish between healthy and diseased patients, the weakly supervised training produced the most pronounced separation of healthy and diseased classifier scores, estimating ∼14% prevalence of diseased cells in healthy and 85% in diseased patients.
In a nice addition to the scope of the paper, the authors use a custom-built microfluidic system to obtain the T cell images. This contains a 45 × 45 μm channel, in a polydimethylsiloxane (PDMS) monomer on a glass substrate, which elasto-intertially focuses cells within the fluid stream and can image >2,000 cells per patient. Image capture is achieved by using a ×60 objective lens and a CMOS camera. The technical simplicity of this device and the streamlined machine learning analysis provide an ideal toolset for ready adoption within clinical laboratories.
Although weakly supervised machine learning has previously been used in conjunction with imaging flow cytometry to analyze blood samples (
is the first to use the approach to demonstrate disease diagnosis. Its data-driven approach is tailor made for clinical application because the medical determination of a patient’s illness becomes the input label when training the neural network and the output decision of the machine. This opens the way to cell-agnostic diagnoses where the presence of disease can be detected but its effect on specific morphological traits of cells remains unknown. The black box nature of machine learning might be seen as an advantage in this situation given that it allows clinicians to bypass the complexity of cell morphology and its alteration by disease, safe in the knowledge that the accuracy of the computational decision making has been verified by comparison to expert medical opinion.
The author benefitted from stimulating discussions with Paul Rees on machine learning and cell classification.
Declarations of interests
The author declares no competing interests.
Label-free cell cycle analysis for high-throughput imaging flow cytometry.