Springer handbook of bio-/neuro-informatics. (English) Zbl 1304.92001

Springer Handbooks. Dordrecht: Springer (ISBN 978-3-642-30573-3/hbk; 978-3-642-30574-0/ebook). lvii, 1229 p. (2014).
This handbook is structured in 12 parts, covering a wide variety of notions associated with bio- and neuroinformatics, presented from a computational angle, machine learning techniques for knowledge discovery or identification and characterization of biological networks, and applications of bioinformatics in medicine, with a particular focus on brain processes, and brain signal analysis and modeling.
In the first, introductory chapter, following a general introduction to information science, bioinformatics and neuroinformatics with emphasis on the scope and developed approaches, the editor briefly describes the subject of the twelve parts of the handbook (A to L).
Part A consists of six chapters and commences with a discussion of the central dogma of molecular biology and a description of the components involved: the DNA and the RNA. Next, the signaling is reviewed and it is followed by a set of integrated approaches (such as dielectrophoresis and its application for separating blood cells) for understanding different processes in the cell. The fourth chapter moves on to the genomics level focusing on high throughput (HT) technologies for measuring gene expression and the typical workflow for data analysis. The data representation and visualization, the noise detection and normalization and the identification of patterns are also discussed. The fifth chapter focuses on the protein level and includes an overview of HT platforms for protein separation, quantification and characterization. The post-translational modifications are also discussed. The sixth chapter presents notions about pattern formation and animal morphogenesis. First, the standard models such as the state space models and networks are reviewed. Next, these are incorporated into the discussion of evolution and development of morphology; the chapter concludes with a Darwinian interpretation of genes and development. The last chapter of part A reviews models of bacterial colonies. Commencing with a description of the morphology transitions and dynamics of bacterial diffusion, the author focuses on the ring formation of Proteus mirabilis and Bacillus subtilis. Next, the modeling of arrays and aggregates is presented using Bacillus circulans as example. The chapter concludes with a description of the limitations of current growth models of bacterial colonies.
The second part of the book discusses different notions of molecular biology linked with genome and proteome informatics. It consists of four chapters dealing with the organization of genomes, determining microRNAs (miRNAs) and cis-regulatory elements, which influence the expression of mRNAs, modeling of proteins and prediction of their structure. The eighth chapter commences with a description of the genomic structure. Next, a method based on chromosome conformation capture is presented for testing the presence of interacting chromatin. The chapter concludes with approaches for determining genomic loci; the examples are based on the yeast genome. In chapter nine, the authors introduce the miRNAs and review the difficulties in their prediction and determining their mode of action (how these molecules influence the gene expression). Some accuracy notions such as false positives and false negatives are introduced and are followed by an introduction to enrichment analysis, i.e., the hypergeometric enrichment, which in this context is equivalent to depletion of motifs in the complement set. The authors review existing enrichment methodologies and present Sylamer, a tool specifically designed for miRNA seed site enrichment. The chapter concludes with an example of Sylamer on real biological data. In the tenth chapter the authors focus on cis-regulatory elements influencing the expression of mRNAs. It commences with a discussion of the requirements of a mRNA sequence data necessary for such a task; next the authors present known regulatory elements such as RNA binding protein target sites and miRNA target sites and introduce a de novo method for cis-elements discovery based on primary, secondary and tertiary structures. In chapter eleven the authors discuss protein modeling based on substitution tables and prediction of structure differentiated for soluble proteins and membrane proteins. The computational approaches (the template free and template based modeling) are also discussed in detail.
Part C consists of seven chapters focusing on knowledge discovery and modeling using machine learning methods. It includes a comparison between fuzzy and crisp (rule based) approaches, a discussion of the methods developed for microarrays and some notions on phylogenetic analysis. In chapter 12, the author reviews the basic notions for supervised and unsupervised learning. He discusses multiclass classification and kernel learning, including also case study examples. For unsupervised learning, the author presents the following methods: hierarchical clustering, Bayesian approaches and maximum likelihood estimations (MLEs) and concludes with Monte Carlo methods. In the thirteenth chapter the authors present a case-based reasoning and feature section approaches using medical examples. These are described in detail and justifications are provided for the decision steps. In chapter fourteen the authors discuss a particular type of HT gene expression measurement: the microarrays. Approaches such as traditional and hybrid clustering are discussed; the authors also introduce cluster validation measures. Chapter fifteen comprises a side-by-side comparison of fuzzy approaches and rule-based methods. The authors commence with a description of fuzzy neural networks with an emphasis on the initialization and training steps and an application in benchmark data. Next, a comparison is conducted on clustering and prediction accuracies. Chapter sixteen reveals another angle of bioinformatics data analysis: the phylogenetic analysis. First, the heterogeneity within biological data is discussed and statistical methods for evaluating it are presented. Next, the parsimony is defined and the concepts are further described using biological data. In the seventeenth chapter the authors discuss protein folding recognition, introducing the notions surrounding the 1-D, 2-D and 3-D structures together with protein databases and feature vectors. Next, the feature selection and classification is introduced through support vector machine (SVM) classifiers, PCA/LDA-based SVMs and quadratic discriminant analysis. In chapter eighteen the author proposes a theoretical approach for the classification problem, introducing kernel methods, their properties and use, which can increase the accuracy of SVMs. The biological applications used as examples include the peptide identification, using the kernel spectral dot product, and the protein homology prediction using pair kernel.
Part D focuses on the modeling of regulatory networks; it consists of seven chapters covering methods to infer transcription networks and transcriptional regulation from the data, approaches based on differential evolution included in neural network models, and methods based on the interactions between proteins. In chapter nineteen the authors use omics data to find paths in biological networks. First, the network representations are described, followed by supervised techniques to predict interactions. Next, the interaction networks are used for the interpretation of the data and the cause-effect duality is discussed in detail. In the twentieth chapter the authors use dynomics and regulomics to model transcription networks using probability equations. Next, the GP RODES algorithm based on the decomposition of ODE systems is presented in detail. The chapter concludes with an example of reverse engineering in up-regulated and down-regulated mRNAs, in a biological example. In chapter twenty-one the authors focus on transcriptional regulation and commence with an overview of algorithms for the discovery of binding motifs, which includes alignment-based, word-based and learning-based approaches. Next, external information that can be used to confirm the predictions is presented. The chapter concludes with some approaches for the identification of target genes and the connection between PMW and SVMs is theoretically justified. In chapter twenty-two the authors present an approach for inferring gene regulatory networks (GRNs) using recurrent neural networks (RNN). Following an overview of the state of the art for inferring GRNs, the authors introduce the differential evolution (DE) approach and describe in detail the alterations to the canonical RNN to incorporate the DE information. The chapter concludes with reconstruction experiments both on in silico and in vivo networks. Chapter twenty-three uses protein-protein interaction networks for the discovery of structural patterns. First an approach based on traditional graph clustering is presented, followed by a stochastic approach based on geometric random graphs and on hierarchical random graphs. The efficiency of the method is evaluated using random graph models and examples on the human interactome, concludes the chapter. Chapter twenty-four discusses transcriptional regulatory networks, based on interactions between mRNAs and transcription factors or small RNAs; the examples are in prokaryotic systems. The chapter concludes with a generalization of these approaches (including protein-protein interactions an their computational predictions) in human systems. In chapter twenty-five the authors present an approach for the identification of somatic mutations using the whole exome sequencing data. The pipeline is described in detail and for each step both the relevant software and the expected thresholds are indicated.
Part E consists of two chapters describing databases and ontologies commonly used in bioinformatic studies. In chapter twenty-six the authors present four types of databases: sequence databases, with the EMBL nucleotide sequence dataset, GenBank and UniProt, structure databases, including the protein data bank (PDB), the structural classification of proteins (SCOP) and the protein contact map, interactomics databases storing experimentally determined interactions, e.g. molecular interactions (MINT) and proteomics databases such as PeptideAtlas, NCI repository and PRIDE. Chapter twenty-seven overviews the ontologies used in bioinformatics. Commencing with the definition of ontologies and their basic structure, the authors also introduce the conceptual reference, and the artifacts that find their way in these databases. Following a brief history of the use of these ontologies in bioinformatics, some languages and systems for the representation of ontologies such as semantic web, RDF and OWL are presented. The chapter concludes with an overview of BioPAX, pathway ontologies of the unified medical language system (UMLS) and a guide for the use of ontologies in bioinformatics analyses.
Part F is composed of eight chapters focused on applications of bioinformatics in medical health and ecology. Chapter twenty-eight focuses on the signal processing, viewed from a statistical angle, for cancer stem cells formation. The authors present first the model and discuss both the forward and inverse problem providing the theoretical description of the process and the pseudocode for modeling the transitions. Simulation results are also included. The twenty-ninth chapter discusses DNA methylation, commencing with general notions and introducing gradually the link between methylation and cancer. Chromatin remodeling, histones modifications and the roles of the different enzymes involved in this process are discussed in detail. The chapter concludes with an overview of the role of epigenetics in cancer therapy. Chapter thirty discusses the dynamics of autoimmune diseases, with lupus erythematosus as example. Following a description of the disease model and control input, the authors present an approach for transposing the biological processes into equations and present simulations for them. In chapter thirty-one the authors introduce nutrigenomics, highlighting first the effect of bioactive nutrients on gene expression, on metabolite profiling and protein expression. The chapter includes also an overview of systems biology, bioinformatics tools and approaches, such as the systems biology markup language (SBML) and the systems biology graphical notation (SBGN) that are used to process the nutrigenomics data. In the thirty-second chapter the authors introduce nanomedicine and propose biostatistics approaches for biomarker discovery, which takes into account multiple comparisons and it is aimed for decreasing the false discovery rate in evaluating the differential expression in gene expression data. Next, these approaches are exemplified using human healthcare examples. The chapter thirty-three makes one step further, introducing personalized medicine based on personalized information models. Global, local and personalized models are directly compared and the personalized modeling system (PMS) is described in detail (including the algorithm and its optimization). Next, this approach is applied for colon cancer diagnosis using profiles from gene expression data for the training of the classifier component. In chapter thirty-four the author describes the components of health informatics, starting with coding approaches and vocabularies, organization of data in electronic health records (EHRs) and approaches for the standardization of physiological measurements. Next, the interaction between practitioners and the automated system for decision-making coupled with error avoidance is described. The chapter concludes with an overview of consumer e-health and the handling of privacy and security issues. Chapter thirty-five exploits bioinformatics (ecological informatics) approaches to predict and manage invasive species. The prioritization of species is conducted using a self-organizing maps (SOMs) approach, weighted to incorporate the establishment of pest species. Next, approaches for variable and feature selections as well as handling missing data are discussed. The chapter concludes with a description of individual-based models adapted for the dispersal process within a geographic information system.
Part G makes the transition to neuroinformatics introducing models for information processing in the brain and nervous systems. It consists of five chapters discussing the processes occurring in synapses, models using neural networks and neural circuits and the use of MRI classification. The thirty-sixth chapter commences with a detailed description of the biological processes taking place in a synapse, followed by notions on the physiology of synaptic transmission (the presynaptic and postsynaptic sites). Next, the integration of excitatory and inhibitory synaptic inputs and the modulation of synaptic transmissions are discussed. The chapter concludes with computational techniques for signal processing such as noise analysis and imaging methods. The thirty-seventh chapter commences with models of spiking neurons. Next, the neural encoding is introduced and its adaptation for spiking neural networks is discussed. The chapter concludes with applications of this approach and an overview of their evolving architecture to incorporate rank order population encoding and one-pass learning. Chapter thirty-eight commences with notions on magnetic resonance imaging (MRI), followed by a detection method based on linear models. Next, the second level analysis is introduced and the generalized linear mixed model is presented. The two-stage model and the variance estimation as well as the threshold correction method are discussed in detail. The chapter concludes with the evaluation of effective connectivity using Granger causality (GC) tests, F tests and model selection approaches. The thirty-ninth chapter proposes neural circuit models for the description of neuropathological oscillations. It starts with a biophysical model for the study of hippocampal theta rhythms in Alzheimer’s disease and continues with a description of neural population models (neural network models) applied on cortical rhythms. The chapter concludes with a neural mass model for the description of thalamocortical alpha rhythms. In chapter forty the authors use wavelet decomposition, feature selection and classification applied on MRIs for the understanding of brain function. The quadratic discriminant analysis and principal component analysis as well as SVMs and SOMs are used for this task. The chapter concludes with results on real data.
Part H presents signal processing methods for brain signal analysis and modeling; it consists of three chapters reviewing kernel methods, analysis of multiple spike trains and time-frequency analysis applied on this topic. Chapter fourty-one discusses kernel adaptive filtering (KAF) algorithms, more precisely non-linear adaptive filtering in kernel spaces. The authors start with introductory notions on linear and non-linear adaptive filters and introduce the kernel filters. Next, the kernel least mean squares algorithm is described in detail. The examples include nonlinear channel equalization, chaotic time series prediction, regression and filtering in spike reproducing kernel Hilbert space (RKHS). The forty-second chapter presents recurrence plots and their use for univariate analysis in evaluating serial dependence, deterministic chaos and multivariate analysis; next, the analysis of multiple spike trains is introduced and examples to illustrate these concepts conclude the chapter. Chapter forty-three focuses on adaptive multiscale data-driven time-frequency analysis. First, the empirical mode decomposition is introduced, followed by its multivariate extensions. Next, the non-uniqueness problem is presented through the mode-alignment of multivariate empirical mode decomposition (MEMD). The chapter concludes with a discussion of the filter bank property of MEMD, a description of noise assisted MEMD and applications on phase synchrony and classification of motor imagery data.
Part I contains five chapters on information modeling of perception, sensation and cognition. The first chapter of this section discusses the modeling of vision with neocognitron, a neural network model for robust visual pattern recognition. Commencing with an outline of the network, the author introduces next the principles of robust recognition, and overviews the mode of action of S-cells, emphasizing the feature extraction and the training components. Next, the role of C-cells is described. The chapter concludes with examples of networks extended from the neocognitron. Chapter forty-five contains a neural network model to simulate the gustatory system. Following the anatomy description of this system, the authors link the taste processing in the brainstem focusing on the neural circuitry with information processing and the gustatory coding in the forebrain. In the forty-sixth chapter the authors present the integration of electroencephalogram (EEG) signal processing for brain-computer interfaces (BCIs). First, the BCI technologies are reviewed and the notions and paradigms related to this topic are discussed. Next, the authors present a case study of a BCI mobile robot control. The chapter concludes with a description of a source-based BCI with an emphasis on the sequential Monte Carlo problem formulation, the EEG source localization model in state-space and using beam forming as a spatial filter. Chapter forty-seven discusses brain-like information processing for spatio- and spectro-temporal pattern recognition. The author commences with the description of single spiking neuron models, which is followed by concepts of learning and memory in a spiking neuron. The general classification, the learning rule and the combined rank-order and temporal learning are also discussed. The chapter concludes with a review of computational neurogenetic models and software and hardware implementations for spatio-temporal pattern recognition (STPR). In chapter forty-eight the author shifts the focus to natural language and presents neurocomputational models using phonological and lexical representations of words. Next, the sentence-sized semantic representations are introduced, and the memory representations are discussed in detail. The chapter concludes with syntactic representations and a review of limitations and alternatives for SRN-based models of syntax.
Part J consists of two chapters describing neuroinformatics databases and ontologies. In chapter forty-nine, the authors review the ontology-related notions in computer science and in bio- and neuro-informatics; this is followed by a review of data mining approaches applied on these ontologies. In chapter fifty the authors discuss the integration of large-scale neuroinformatics in the INCF organization. The INCF scientific programs are described, including the program on ontologies of neural structures, the program on multi-scale modeling and the program on standards of data sharing. The chapter concludes with an overview of the INCF neuroinformatics portal.
Part K consists of six chapters which focus on brain diseases and how modeling can be used to understand their characteristics. Chapter fifty-one discusses Alzheimer’s disease (AD). Starting with the epidemiology review and the morphopathological hallmarks, the author continues with a description of changes at metabolism and gene level. The computational modeling of the AD brain is presented via the modulation of cholinergic function by \(A\beta\), tau transmission and neurovascular function. Chapter fifty-two presents the AD from the amyloid precursor protein (APP) perspective by identifying and mapping the interactions of APP and understanding the proteolytic processing. The authors also include an overview of the data collected so far and underline relationships between APP and proteolytic fragments and wider synaptic systems. In the fifty-third chapter the authors present a machine learning pathway for the identification of discriminant pathways. The chapter commences with the experimental setup, consisting of the feature selection step the pathway enrichment and the sub-network inference followed by the stability and difference analysis. Next, a description of the example data and a review of experimental results on air pollution, Parkinson’s disease and AD is included. In chapter fifty-four the authors present gene-dependent dynamics of cortex applied in idiopathic epilepsy. Starting with an overview of the concepts, from neurogenesis to computational models of epilepsies, the authors continue with a discussion of gene expression regulation and describe a computational neurogenetic model based on the gene-protein regulatory network and the parameters involved. Next, the dynamics of the model are discussed with emphasis on the estimation of parameters. The chapter concludes with simulation results and future directions in neurogenetic modeling. In the fifty-fifth chapter the authors discuss methods for evaluating risks and outcomes of strokes. Commencing with a description of strokes, their symptoms and risk factors, the authors continue with a description of machine learning methods, such as the \(k\)-nearest neighbors which can be incorporated in hazard models. In chapter fifty-six the authors present the role of surface electromyography (sEMG) analysis for the recognition of rehabilitation actions. Following a description of sEMG, the time domain and frequency domain analysis methods are introduced. The chapter concludes with examples on real data.
Part L describes in five chapters nature-inspired integrated information technologies. Chapter fifty-seven describes brain-like robotics. The chapter commences with a description of natural and artificial brains and the corresponding models. Next, the cognitive robotics are introduced, and the native and non-native architectures are discussed in detail with a focus on key areas like memory, attention and emotions. The chapter concludes with a review of platforms and research projects on this topic. In the fifty-eighth chapter the authors describe developmental learning for user activities and start with a review of the system and challenges. Next, they introduce the system architecture, underlining the Markov decision process, the incremental hierarchical discriminant regression and the HMMs for motion pattern classification. The chapter ends with experimental results on the auditory and motion components as well as high-level reasoning. In chapter fifty-nine the author introduces quantum and biocomputing. Following the description of NP problems, the Hilbert space structure and the compound system, quantum operations are introduced. The author also discusses quantum algorithms based on Fourier transforms and random walks. The chapter concludes with an overview of biological applications of quantum computing. In chapter sixty the authors introduce approaches for network identification using ANN models, inspired by neural and cognitive processes in the brain. Weakly brain inspired models such as local knowledge based learning and strongly brain inspired models such as spiking neural networks are discussed. Next, the author introduces computational neurogenetic models and quantum inspired computational intelligence (CI). The chapter concludes with an overview of principles for the integration of brain, gene and quantum information. In the sixty-first chapter the authors discuss creativity and the perception of art and introduce picture complexity metrics and quantitative measures of complexity to model the artistic component of the brain. This section concludes with an overview of the Allen brain atlas, a publicly available resource linking gene expression data with neuroanatomical remarks in mouse, human and other non-human primates. The authors start by describing the resources on the mouse model and point out how to conduct integrated searches and visualizations. Next the human and non-human resources are described. The authors conclude with a discussion of future directions.
The handbook is a valuable and timely resource, which can be used by undergraduates, post-graduates and established researchers. By presenting together a variety of (linked) approaches, it provides a fantastic source for bioinformaticians and neuroinformaticians alike.


92-00 General reference works (handbooks, dictionaries, bibliographies, etc.) pertaining to biology
92-06 Proceedings, conferences, collections, etc. pertaining to biology
00B15 Collections of articles of miscellaneous specific interest
92B20 Neural networks for/in biological studies, artificial life and related topics
92C40 Biochemistry, molecular biology
92C50 Medical applications (general)
92D10 Genetics and epigenetics
68T05 Learning and adaptive systems in artificial intelligence
92B25 Biological rhythms and synchronization
92D20 Protein sequences, DNA sequences
92D15 Problems related to evolution
92C20 Neural biology
Full Text: DOI