Advertisement
Patterns
All content is freely available to readers and supported through open access

Mar 11, 2022

Volume 3Issue 3
Open Access
On the cover: Up until this moment, digital traces left behind by citizens in their everyday lives are generally very difficult to access for scientific research. The software in the paper by Boeschoten et al. (100444) changes this. Using the software and the presented workflow allows scientific researchers to only collect the parts of the digital traces that they are specifically interested in for their research question, and thereby the privacy of the research participants is preserved. The cover image illustrates that scientists (represented by the hands) are finally able to grab these digital traces. The icons illustrate the variety of data types and topics that can be investigated using this approach. Image credit: Neo Cheung, Eyra Leap B.V....
On the cover: Up until this moment, digital traces left behind by citizens in their everyday lives are generally very difficult to access for scientific research. The software in the paper by Boeschoten et al. (100444) changes this. Using the software and the presented workflow allows scientific researchers to only collect the parts of the digital traces that they are specifically interested in for their research question, and thereby the privacy of the research participants is preserved. The cover image illustrates that scientists (represented by the hands) are finally able to grab these digital traces. The icons illustrate the variety of data types and topics that can be investigated using this approach. Image credit: Neo Cheung, Eyra Leap B.V.

Preview

  • Data-driven assessment of dimension reduction quality for single-cell omics data

    • Xiaoru Dong,
    • Rhonda Bacher
    Dimension reduction (DR) techniques have become synonymous with single-cell omics data due to their ability to generate attractive visualizations and enable analyses of high-dimensional data. In this issue of Patterns, Johnsona et al. develop a statistical approach to assist in selecting high-quality reduced representations to improve analyses and biological interpretations.

People of data

  • Zooming in on the brain via data science

    • Shanshan Jia,
    • Zhaofei Yu
    Zhaofei Yu, a young principal investigator (PI), and Shanshan Jia, a PhD student, explain the importance of data science in their research. They have developed a method to extract information from high-dimensional data based on wavelet decompositions using simulated and experimental neuroscience data. This method is not only beneficial for neuroscientists, but it might also be helpful to anyone who deals with high-dimensional data.
  • A hybrid approach for identifying drug repurposing candidates and their mechanisms: An interview with Vanessa Lage-Rupprecht and two co-authors

    • Vanessa Lage-Rupprecht,
    • Bruce Schultz,
    • Marcin Namysl
    Senior researcher Vanessa Lage-Rupprecht and two collaborators talk about what data science means to them and illustrate how they managed to create a data and lab coexistence in their drug-repurposing project, which was recently published in Patterns. In this article, they have developed a drug-target-mechanism-oriented data model, Human Brain PHARMACOME, and have presented it as a resource to the community.

Perspectives

  • Data justice and data solidarity

    • Matthias Braun,
    • Patrik Hummel
    Justice in and around data and artificial intelligence is often reduced to fairness. We follow proponents of the notion of data justice in arguing that the focus needs to be widened. We propose that in addition, a commitment to data solidarity is needed in which the concerns and interests of those who are currently insufficiently considered in social endeavors mediated by data are put to the forefront. Data solidarity is an important catalytic element to advance toward data justice.
  • Empowering local communities using artificial intelligence

    • Yen-Chia Hsu,
    • Ting-Hao ‘Kenneth’ Huang,
    • Himanshu Verma,
    • Andrea Mauri,
    • Illah Nourbakhsh,
    • Alessandro Bozzon
    AI systems have great potential to be applied in achieving Sustainable Development Goals. However, conflicts of interest among multiple stakeholders result in challenges when co-designing AI systems with local people, collecting community data to fine-tune the AI models, and adapting the behavior of AI to long-term social change. Through case studies and the literature, this article explains these challenges and highlights viable paths toward empowering local communities to advocate for social and policy changes in response to pressing regional issues.
  • A guide to backward paper writing for the data sciences

    • Jon Zelner,
    • Kelly Broen,
    • Ella August
    In this perspective, the authors propose a backward approach to data science writing that begins with clearly identifying the scientific and professional goals motivating the work, followed by a purposeful mapping from those goals to each section of a paper. This approach is motivated by the conviction that manuscript writing can be more effective, efficient, creative, and even enjoyable—particularly for early-career researchers—when the overarching goals of the paper and its individual components are clearly mapped out.

Articles

  • A k-nearest neighbor space-time simulator with applications to large-scale wind and solar power modeling

    • Yash Amonkar,
    • David J. Farnham,
    • Upmanu Lall
    A novel daily time step stochastic simulator capable of capturing the joint dynamics of high-dimensional multivariate fields across a large domain is presented. The proposed algorithm’s utility is demonstrated via application to joint wind and solar fields across the Texas Interconnection. The risks of undersupply from infrequent, persistent periods of suppressed wind and solar availability are analyzed.
  • Representing the dynamics of high-dimensional data with non-redundant wavelets

    • Shanshan Jia,
    • Xingyi Li,
    • Tiejun Huang,
    • Jian K. Liu,
    • Zhaofei Yu
    A crucial question in data science is to extract meaningful information embedded in high-dimensional data. Such information is often formed into a low-dimensional space with a set of features that can represent the original data at different levels. A method combining wavelet analysis with conditional mutual information can extract non-redundant features from various types of neuroscience data. Simple decoders can read out meaningful information with high accuracy using only a small set of these features.
  • EMBEDR: Distinguishing signal from noise in single-cell omics data

    • Eric M. Johnson,
    • William Kath,
    • Madhav Mani
    A novel algorithm for assessing the quality of dimensionality reduction (DR) methods is proposed and applied to several single-cell omics datasets. The method is local, quantitative, and statistical, which permits quality to be detected on a cell-wise basis in a manner comparable across parameter sets and DR methods. Optimizing DR methods per cell permits a novel embedding scheme that robustly reproduces structures in the original data.
  • A hybrid approach unveils drug repurposing candidates targeting an Alzheimer pathophysiology mechanism

    • Vanessa Lage-Rupprecht,
    • Bruce Schultz,
    • Justus Dick,
    • Marcin Namysl,
    • Andrea Zaliani,
    • Stephan Gebel,
    • Ole Pless,
    • Jeanette Reinshagen,
    • Bernhard Ellinger,
    • Christian Ebeling,
    • Alexander Esser,
    • Marc Jacobs,
    • Carsten Claussen,
    • Martin Hofmann-Apitius
    In our drug repurposing approach, we combined knowledge mined from a variety of sources with focused experimental screening data of selected drug candidates into our drug-target-mechanism-oriented data model, the Human Brain Pharmacome (HBP). We identified previously unreported drug-target combinations that show evidence as being viable therapeutic candidates for Alzheimer disease (AD). Data-driven approaches combining in silico and in vitro analyses are increasingly in the spotlight and represent a future path to knowledge discovery, especially in the context of complex diseases.
  • DAISM-DNNXMBD: Highly accurate cell type proportion estimation with in silico data augmentation and deep neural networks

    • Yating Lin,
    • Haojun Li,
    • Xu Xiao,
    • Lei Zhang,
    • Kejia Wang,
    • Jingbo Zhao,
    • Minshu Wang,
    • Frank Zheng,
    • Minwei Zhang,
    • Wenxian Yang,
    • Jiahuai Han,
    • Rongshan Yu
    Lin et al. propose the DAISM-DNN pipeline for cell type deconvolution from bulk RNA-seq data using deep neural networks trained with dataset-specific training data generated from calibration samples augmented with in silico mixing strategies. DAISM-DNN enables accurate intra- and inter-sample deconvolution and is robust to random errors in ground truth cell type proportions of calibration samples. The trained DAISM-DNN model can also be used across multiple biomedical experiments if these experiments are conducted with a strict SOP to ensure quality consistency.
  • Robust importance sampling for error estimation in the context of optimal Bayesian transfer learning

    • Omar Maddouri,
    • Xiaoning Qian,
    • Francis J. Alexander,
    • Edward R. Dougherty,
    • Byung-Jun Yoon
    Accurate estimation of classification error is challenging in scientific domains, where available data are limited. Although transfer of data and knowledge from relevant domains can alleviate this issue, previous studies on transfer learning have mostly focused on improving the learned models rather than enhancing the performance analysis. In this paper, we propose a transfer learning scheme for Bayesian error estimation that can leverage data from relevant domains to enhance the estimation of classification error in the domain of interest.
  • Universal machine learning framework for defect predictions in zinc blende semiconductors

    • Arun Mannodi-Kanakkithodi,
    • Xiaofeng Xiang,
    • Laura Jacoby,
    • Robert Biegaj,
    • Scott T. Dunham,
    • Daniel R. Gamelin,
    • Maria K.Y. Chan
    A novel and universal ML framework is developed to predict charge-, Fermi level-, and chemical potential-dependent formation energies of point defects in zinc blende semiconductors such as CdTe, GaAs, SiC, and so on. We lay out in detail the methodology for data generation using high-throughput DFT simulations, training of ML models via feature selection, hyperparameter optimization, cross-validation, and prediction and screening for a dataset of more than 12,000 point defects across 34 compounds.
  • scTenifoldKnk: An efficient virtual knockout tool for gene function predictions via single-cell gene regulatory network perturbation

    • Daniel Osorio,
    • Yan Zhong,
    • Guanxun Li,
    • Qian Xu,
    • Yongjian Yang,
    • Yanan Tian,
    • Robert S. Chapkin,
    • Jianhua Z. Huang,
    • James J. Cai
    scTenifoldKnk is a machine learning workflow performing virtual KO experiments to predict gene function. It constructs gene regulatory networks using single-cell RNA sequencing data from wild-type samples and then computationally deletes target genes. Real-data applications demonstrate that scTenifoldKnk recapitulates findings of real-animal KO experiments and accurately predicts gene function in analyzed cells.
  • Inferring global-scale temporal latent topics from news reports to predict public health interventions for COVID-19

    • Zhi Wen,
    • Guido Powell,
    • Imane Chafi,
    • David L. Buckeridge,
    • Yue Li
    We developed a machine learning model called EpiTopics toward automatic detection of the changes in the status of non-pharmacological interventions (NPI) for COVID-19 from news reports. EpiTopics learns country-dependent topics from large numbers of COVID-19 news reports that do not have NPI labels. Subsequently, EpiTopics learns accurate connections between these topics and changes in NPI status from a set of labeled news reports, which enables accurate detection of temporal NPI status for each country referred to in the news reports.

Descriptor

  • Privacy-preserving local analysis of digital trace data: A proof-of-concept

    • Laura Boeschoten,
    • Adriënne Mendrik,
    • Emiel van der Veen,
    • Jeroen Vloothuis,
    • Haili Hu,
    • Roos Voorvaart,
    • Daniel L. Oberski
    The software introduced in this paper allows for a privacy-preserving analysis of digital traces by researchers. As individuals generate digital traces in more and more aspects of their lives, this software can open up new sources of data to researchers from various disciplines. For example, it can provide access to social media data for social and behavioral scientists, open up bank transaction or energy use data to economists, or release electronic health records to health scientists.
Advertisement
Advertisement