cnc-2016-0013.pdf - Review For reprint orders please...

This preview shows page 1 out of 17 pages.

Unformatted text preview: Review For reprint orders, please contact: [email protected] Big Data in traumatic brain injury; promise and challenges Traumatic brain injury (TBI) is a spectrum disease of overwhelming complexity, the research of which generates enormous amounts of structured, semi-structured and unstructured data. This resulting big data has tremendous potential to be mined for valuable information regarding the “most complex disease of the most complex organ”. Big data analyses require specialized big data analytics applications, machine learning and artificial intelligence platforms to reveal associations, trends, correlations and patterns not otherwise realized by current analytical approaches. The intersection of potential data sources between experimental TBI and clinical TBI research presents inherent challenges for setting parameters for the generation of common data elements and to mine existing legacy data that would allow highly translatable big data analyses. In order to successfully utilize big data analyses in TBI, we must be willing to accept the messiness of data, collect and store all data and give up causation for correlation. In this context, coupling the big data approach to established clinical and pre-clinical data sources will transform current practices for triage, diagnosis, treatment and prognosis into highly integrated evidence-based patient care. Denes V Agoston*,1,2 & Dianne Langford3 1 Department of Anatomy, Physiology & Genetics, Uniformed Services University, Bethesda, MD 20814, USA 2 Department of Neuroscience, Karolinska Institute, Stockholm, Sweden 3 Department of Neuroscience, Lewis Katz School of Medicine, Temple University, Philadelphia, PA 19140, USA *Author for correspondence: [email protected] usuhs.edu First draft submitted: 15 July 2016; Accepte d for publication: 25 May 2017; Published online: 10 July 2017 Keywords: artificial intelligence • big data • big data analytics • machine learning • traumatic brain injury   Big Data (BD) and Big Data Analytics (BDA) are changing our lives significantly. Most of us use Google and Amazon, and it is difficult not to notice how well they can predict our interests, preferences, etcetera. To be able to do so, Google and Amazon needed several things: huge amounts of data, BD and new tools including BDA, Artificial Intelligence (AI) and Machine Learning (ML). These tools are capable of processing and analyzing mountains of data to generate correlations and make predictions. The BD approach thus allows Google, Amazon and others to find correlations, make predictions, and generate new information and knowledge. One of the most clear beneficiary of using BD and BDA approaches would be the ‘most complex disease of the most complex organ’, traumatic brain injury (TBI). TBI is when ‘physics meets biology’, in other words, when physical forces suddenly disrupt the structural integrity of the brain leading to functional impairments. Approximately 70% of TBI cases are caused by sudden acceleration/deceleration of the head resulting from falls, traffic and sport accidents, among others [1] . Over 85% of TBIs are mild, also called concussion, and result in large part from playing contact sports [2] . The physical forces of impact can be measured, recorded and analyzed in the context of the biological response. Using the BD approach, a TBI ‘dosimetry’ can be established that, in analogy with ionizing radiation, can assess injury severity, guide therapeutic interventions and provide predictions. In the case of severe Concussion (Epub ahead of print) part of ISSN  Review Agoston & Langford TBI, modern neurointensive care monitors dozens of physiological and biochemical parameters, thereby generating huge amounts of real-time data. Analyzing such data in the context of outcomes using the BD approach can significantly assist in making therapeutic decisions. Using the BD approach in experimental TBI research would help to close the substantial gap between preclinical and clinical studies by collecting and analyzing the physical data in the context of biological response. The third critically important application of the BD approach is to ‘mine’ existing, legacy data published in the scientific literature. This latter task is probably even more challenging than designing and performing experiments and studies with set parameters in mind such as Common Data Elements (CDEs) and BD approaches. Despite enormous scientific and monetary investment during the last several decades to identify evidence-based, specific, efficient pharmaco- (or other) therapy, there is still no treatment to mitigate the acute and the long-term consequences of TBI. These facts clearly show that the TBI field has reached its ‘strategic inflection point’ and that repeating the same will not result in new and much needed information and knowledge. The BD approach offers solutions to many of TBI’s most vexing issues. In this review, we will briefly outline the potential of BD and BDA, and the possible benefits and main challenges of using these approaches in experimental and clinical TBI. Big Data BD is a term for extremely large datasets that are so large and complex that they cannot be analyzed using traditional data processing applications [3–15] . The analysis of BD requires specialized BDA, AI and ML that can reveal patterns, trends, associations, correlations and interactions and make predictions. BD in another definition is “any voluminous amount of structured, semi-structured or unstructured data that have the potential to be mined for information” [3] . BD is characterized by the three V’s: Volume, Variety and Velocity. In addition, BD also has Variability, Veracity and Complexity. Volume is the most important characteristic of BD. The volume of data is growing exponentially. For example, in 2009 the world’s total data volume was approximately 1.5 zettabytes (1 zettabyte is 1000 terabytes or 1018 gigabytes). In 2015, the data volume grew to 8 zettabytes and it is predicted that by 2020 it will be 44 zettabytes [16] . Biomedical data have contributed substantially to this overall growth in volume due to data-rich technologies such as various imaging modalities and the various omics (genomics, proteomics, etc.). 10.2217/cnc-2016-0013 Variety or diversity of data is another characteristic as well as the main challenge of BD. The overwhelming majority of data including data in the biomedical literature is in an unstructured format containing text, images, multimedia, among others. Scientific articles are typical examples of unstructured data in that they do not have a predefined data model, as they are not organized in a predefined manner. They include raw text, images, videos, physiological and pathological data, among others. Such unstructured data are very difficult to understand using traditional programs due to irregularities and ambiguities. A combination of text mining, image, nucleotide and/or amino acid sequence analyses and other preprocessing steps are needed to give structure to this raw data and to extract the information or generate quantitative signature vectors. The most challenging is text preprocessing, requiring statistical parsing, computational linguistics and/or ML [17] to generate numerical summaries. Velocity data have temporal dimension, are data in motion and is the third main characteristic of BD. Velocity means that the data collected can vary from a single batch/sampling, for example, the selected experimental end point through periodic sampling, in other words, multiple time points, through near real-time collection to real-time streaming data. An example of near real-time or real-time data collection is neuro­intensive care monitoring. The importance of such continuous data collection is obvious as it can provide the clinicians with trends, such as improving or worsening conditions over time. As the costs of collecting and storing data are getting less expensive, near real-time or real-time data streaming is becoming increasingly common. In addition, BD also has Variability, Veracity and Complexity. Variability differs from variety in that it refers to the absence of uniformity. For example, a parameter that is expected to be the same can vary due to human or machine error. Variability can have substantial impacts on the reliability of data, in other words, how representative each data point really is, which in turn will affect data homogeneity. Veracity means that the data are uneven in quality, incomplete, ambiguous or deceptive. Filtering out inaccurate data is a serious challenge as it can lead to the classic ‘garbage in, garbage out’ scenario. Complexity is generally defined as many different components that interact with each other in multiple ways causing a higher order organization that is greater than the sum of its parts. Incompleteness of data represents an especially serious challenge of BD approaches in biomedical research, including TBI research. Publications represent only a fraction of total data collected and accumulated during experimental TBI work or clinical studies, and these Concussion (Epub ahead of print) future science group Big Data in traumatic brain injury; promise & challenges data are ‘curated’. According to conservative estimates, some 50% of the data from experimental and or clinical TBI research is never published for various reasons including failure of the experimental data to support the hypothesis. In addition to the unpublished data, which may be available in digital format, large proportions of ‘dark data’ contain laboratory notes, clinical notes, animal care records, among others. These data may reside on paper, analog media and/or on personal hard drives, and are thus called the ‘file-drawer phenomenon’ [8] . Although it appears counterintuitive, large volumes of incomplete messy data are more valuable and enable higher probability of correlation than clean, curated small datasets that may or may not be representative and almost certainly biased. One can call it the IBM versus Google approach for creating language translation. IBM fed the French and the English versions of Canadian parliamentary transcripts – small selected curated samples – into its machines to infer which French word is the best equivalent of the English word. The huge undertaking became stuck in the complexities of mathematical probabilities. The Google approach has been different. Google collected all available data from the Internet, an arguably messy task, and an incomplete and inaccurate source of languages, as opposed to the clean but small IBM sampling. Google has taken in billions and billions of pages from all kinds of sources and documents in multiple languages. The result speaks for itself: Google Translate currently covers 104 languages and the quality of the translations is fairly accurate. We should note here that IBM is catching up with Google as IBM’s Watson cognitive computing technology can analyze BD to identify novel drug targets, among other biomedical applications [5] . Traumatic brain injury TBI is a spectrum disease. The severity of the impact ranges from severe to mild, with the latter also called concussion [18] . TBI accounts for approximately 30% of deaths caused by injury among young people under age 45, and it is the single most common cause of death and permanent disability in this group [19] . The incidence of TBI is staggering. In 2015, approximately 2 million individuals suffered with TBIs in the USA alone, and the number worldwide was approximately 60 million. The medical, economical and social expenses directly related to TBI are approximately 96 billion dollars annually in the USA alone. Injuries that include TBI cause the deaths of approximately 150 people per day in the USA resulting in approximately 50,000 deaths per year. The incidence of TBI has been steadily increasing and the number of TBI cases nearly doubled from 2001 future science group Review to 2010 from 521 to 824 per 100,000 people in the USA [1]. The WHO has predicted that by 2020, TBI will be among the top three diseases causing death and disability [20] . As indicated by the difference between the rate of increase in emergency department visits versus hospitalization (70 vs 11%), the rise is mostly due to the surge in mild TBI/concussion cases [2] . Due to lack of uniformity in reporting requirements, there are controversies regarding changes in mortality [21] . However, mortality decreased substantially at locations with improved neurocritical care [22] . Severe and moderate forms of TBI increase the risk of Alzheimer’s disease 2.3- to 4.5-times [23] , and consequently multiply the already staggering medical, fiscal and social expenses related to TBI. At the other end of the severity scale, mild TBI/concussion accounts for approximately 85% of all TBI cases [2,24] . Mild TBI, especially when repetitive in nature, increases the risk for developing neuro­degenerative conditions, such as chronic traumatic encephalopathy three- to five-times, thereby further increasing the disease-associated expenses [24] . The physical impact results in the primary injury process, structural and functional damage that is instantaneous and cannot be treated but only prevented by avoiding TBI. Based on the type of physical forces and how they interact with the head/brain, the primary injury process includes damage to axons, blood vessels, neurons and glia, and triggers the highly complex and dynamically changing secondary injury process [25] . There appears to be a correlation between physical forces and the secondary injury process. Mild TBI or concussion predominantly causes transient metabolic changes; whereas, severe acceleration/deceleration in TBI results in vascular [26] and axonal [27] injuries followed by complex downstream processes including inflammation [28] . Blast-induced TBI appears to have unique pathophysiology [29–31] . However, the exact correlation between physical forces and the biological response is not known. Moreover, the biological responses change over time, so the temporal aspect of these changes dramatically increases the data to be measured, monitored, collected, stored and analyzed. The cellular, molecular and structural changes associated with the primary (physical) and secondary (biological) responses to the injury manifest in functional changes observed clinically. These changes include a whole array of altered physiological responses, for example, decreased cerebral perfusion, depressed glucose metabolism, altered water balance, edema, among others, and neurobehavioral changes ranging from dizziness, confusion and memory impairment to loss of consciousness [32] . These clinically observed signs and symptoms change over time post TBI leading to the ever-increasing amounts of clinical data. 10.2217/cnc-2016-0013 Review Agoston & Langford Current use of BD in TBI Efforts to improve the clinical practice guidelines to assess the severity of concussion have resulted in the development of several algorithms to evaluate changes in physical, cognitive, behavioral, imaging and neuropsychological levels [33] . Traditionally, the use of BD in concussion research has incorporated clinical guidelines that include multimodal subjective features, thereby producing significant challenges for clinicians attempting to diagnose concussion and the severity of the injury [33,34] . Collecting biological data that include functional testing results and blood biomarker analyses in combination with the collection of physical data that describe frequency, location and force of impacts will provide complex information at multiple levels, thereby generating massive amounts of data. Managing and sharing these data have led to efforts to improve sharing and distribution of BD within the TBI field. In response to the explosion in the amount of TBI data, numerous models for database repositories have been proposed and some have been established. Building upon a foundation created in 2009 by the International Mission for Prognosis and Analysis of Clinical Trials (IMPACT), an initiative to establish CDEs for TBI, was launched to standardize data collection across clinical trial sites. Data included demographics, clinical care, genetic and proteomic markers, neuroimaging and outcome measures to represent a range of TBI data including data relevant to all TBI studies, highly heterogeneous datasets and measures for which no consensus or validation has been achieved [35,36] . Importantly, the database contains few imaging data and virtually no monitoring data, thus greatly limiting the use of the database for BDA. Transforming Research and Clinical Knowledge in TBI multicenter prospective observational studies were then conducted to validate the feasibility of implementing CDEs among 650 subjects who received CT scans in the emergency room within 24 h of injury from level I trauma centers and one rehabilitation center in the USA [36] . Currently, Transforming Research and Clinical Knowledge in TBI houses data on 3000 patients from 11 sites in the USA and was the first to populate the Federal Interagency TBI Research (FITBIR) informatics system. Collaborative efforts between the National Institute of Neurological Disorders and Stroke and the Department of Defense created a national resource for archiving and sharing clinical research data on TBI [37] . The goals of FITBIR informatics system are to promote data sharing in the field of TBI, enable data sharing among individual laboratories and encourage connectivity with other platforms [38] . FITBIR currently stores over 200,000 data records that include detailed demographics, 10.2217/cnc-2016-0013 outcome assessments, imaging and biomarkers. By implementing the comprehensive interagency CDEs for TBI research as defined by the CDE work group, FITBIR provides tools and resources to extend the data dictionary. In this platform, qualified researchers can gain access to the data in the hopes that novel modeling approaches may uncover relationships not realized by the original data collectors, thereby leading to additional studies and successful clinical trials for treatment of TBI. The CDE initiatives for clinical as well as preclinical TBI studies are giant steps toward improving disease severity classification, unifying data entry, depositing and archiving data. The success of the initiative is reflected in the increasing entries as well as analyses and studies using FITBIR. Neurointensive care units (NICU) generate very large volumes of data collected during continuous monitoring of vitals, physiological and biochemical parameters such as cerebral perfusion pressure, cerebral blood flow, brain tissue oxygenation, intracranial pressure, changes in intracranial glucose metabolism, among others [32,39] . Combined with the outputs of various imaging modalities, EEGs and other diagnostic monitoring, each patient generates staggering amounts of data during the NICU stay. However, there is currently no unified protocol for analyzing data to help in developing guidelines for the patient’s management, and in the absence of follow-ups, the correlation between early disease management and long-term outcomes cannot be established [40] . While neurointensive monitoring generates digital and relatively simple datasets, neuroimaging, probably the most powerful diagnostic tool in NICU, produces huge amounts of very complex data. The absence of standardized timing of image acquisitions, lack of uniformed imaging protocols and other unresolved issues make the BD approach in the NICU rather challenging [41] . In this context, Smith et al. created a defined set of CDEs for use in preclinical models that consisted of ten modules divided into a Core Module with 57 CDEs and Injury-Model-Specific modules for nongeneralizable elements [42] . Among the Core CDEs, CDE domains included animal characteristics, animal history, assessments and outco...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture