Sign up for our newsletter and get the latest big data news and analysis.

Government Sponsored Data Analytics in Healthcare and Life Sciences

The insideBIGDATA Guide to Data Analytics in Government provides an in-depth overview of the use of data analytics technology in the public sector. Focus is given to how data analytics is being used in the government setting with a number of high-profile use case examples, how the Internet-of-Things is taking a firm hold in helping government agencies collect and find insights in a broadening number of data sources, how  government sponsored healthcare and life sciences are expanding, as well as how cybersecurity and data analytics are helping to secure government  applications.

Government Sponsored Data Analytics in Healthcare and Life Sciences

Government sponsored data initiatives within healthcare and life sciences are encouraging—they not only increase transparency but also have the  potential to help patients. Not surprisingly, recent years have seen a flurry of activity in this sector in many countries. For example, the Italian Medicines Agency collects and analyzes clinical data on expensive new drugs as part of a national cost effectiveness program. Based on the results of  this effort, the government may reevaluate prices and market access conditions.

Within the U.S., the Federal Government has been encouraging the use of its healthcare data through various policies and initiatives with the hope to directly improve cost, quality, and the overall healthcare ecosystem. With more data being released, the Federal Government is trying to ensure that  all appropriate stakeholders, including those in private industry, can access the information in standard formats. The portal, for  example, includes federal databases with information on the quality of clinical providers, the latest medical and scientific knowledge, consumer  product data, community health performance, government spending data, and many other topics. In addition to publishing information, the aim is to  make data easier for developers to use by ensuring that they are machine-readable, downloadable, and accessible via application programming  interface (API).

In a report by Voterra Partners for Dell EMC, Sustaining Universal Healthcare in the UK: Making Better Use of Information, we learn about the effort to address the unprecedented financial pressure faced by the UK’s National Health System—affecting patient’s quality of treatment, as waiting times  increase, and research funding is restricted. The belief is that the availability of patient information and data analytics could have a substantial  beneficial impact. Using big data technology, there is focus on three main areas:

  • Interoperability of patient records – the ability to access and update records at any point in the healthcare system by integrating NHS institutions.
  • Data analytics – using large quantities of information to better predict and personalize medicine. Data analytics can identify the combination of  factors that put the patient at high risk of developing a chronic condition, allowing the intervention to prevent them from getting ill.  Personalized medicine can improve early diagnosis and improve quality of care treatments and outcomes can be analyzed in conjunction with  patient details in order to maximize the benefit of any treatment.
  • Mobile technology – apps can be used by medical practitioners to provide up-to-date practical advice and by individuals to manage their health.  Tracking devices are becoming more popular and could be used to maintain personalized healthy lifestyles.

Scotland has used informatics technology to provide an integrated care model for the treatment of diabetes. GPs, patients and secondary care professionals have collaborated to treat diabetes over a period of 20 years. The informatics technology is used to track patients’ treatment and treatment outcomes that are carefully monitored and managed so as to reduce the severity of the condition. This has achieved impressive results as shown in the figure provided below.


Scientists are realizing fascinating new perspectives on the human genome, and it’s all thanks to the advancements made in data analytics. For years, genes have been studied and mapped, with perhaps the crowning achievement being the completion of the Human Genome Project in the early  2000s, but true understanding of how human genetics work has required more intensive study and more resources. Only recently have scientists been able to look more closely at human genes, and much of this progress comes as they apply data analytics to the effort.

There is much interest in genomics and personalized healthcare across the European Union. In the UK there is the “100K project,” where 100,000  genomes are being sequenced. Formally known as “The 100,000 Genomes Project,” it is an ambitious program to sequence 100,000 whole genomes  from NHS patients across England. Genomics promises significant benefits in healthcare through scientific discovery, and this study will help to  deliver on this goal. Dell EMC provides the platform for large-scale analytics in a hybrid cloud model for Genomics England. The project has been  using Dell EMC storage for its genomic sequence library, and now it will be leveraging a data lake to securely store data during the sequencing  process. Backup services are provided by Dell EMC’s Data Domain and Networker.

Government sponsored data initiatives within healthcare and life sciences are encouraging—they not only increase transparency but also have the potential to help patients. Click To Tweet

Arizona State University (ASU) worked with Dell EMC to create a powerful high-performance computing (HPC) cluster that supports data analytics.  As a result, ASU built a holistic Next Generation Cyber Capability (NGCC) using Dell EMC and Intel technologies that is able to process structured  and unstructured data, as well as support diverse biomedical genomics tools and platforms. HPC technology and the Dell EMC Cloudera Apache  Hadoop solution, accelerated by Intel, upon which NGCC is based, can handle data sets of more than 300 terabytes of genomic data. In addition, ASU  is using the NGCC to understand certain types of cancer by analyzing patients’ genetic sequences and mutations.

Apache Spark is an ideal platform for organizing large genomics analysis pipelines and workflows. Its compatibility with the Hadoop platform makes it easy to deploy and support within existing bioinformatics IT infrastructures, and its support for languages such as R, Python, and SQL ease the  learning curve for practicing bioinformatics practitioners. Widespread use of Spark for genomics, however, will require adapting and rewriting many  of the common methods, tools, and algorithms that are in regular use today.


There are many challenges to analyzing neural data. The measurements are indirect, and useful signals must be extracted and transformed in a  manner tailored to each experimental technique. Analyses must find patterns of biological interest from the sea of data. An analysis is only as good as  the experiment it motivates; the faster we can explore data, the sooner we can generate a hypothesis and move research forward.

One of the reasons neural data analysis is so challenging is that there is no standardization. There are families of workflows, analyses, and algorithms that we use regularly, but it’s just a toolbox, and a constantly evolving one. To understand data, we must try many analyses, look at the results, modify at many levels—whether adjusting preprocessing parameters, or developing an entirely new algorithm—and inspect the results again.

Spark allows the ability to cache a large data set in RAM and repeatedly query it with multiple analyses. This is critical for the exploratory process, and is a key advantage of Spark compared to conventional MapReduce systems. With Spark, especially once data is cached, we can get answers to new  queries in seconds or minutes, instead of hours or days. For exploratory data analysis, this is a game changer where the ability to visualize intermediate results is critical.

Infectious Diseases

The National Institutes of Health (NIH) is trying to prevent the spread of infectious diseases, e.g. super-viruses. NIH looks at all the drug data and clinical trial results submitted to the FDA, and correlates data from drug manufacturers, doctors, and patients to build a model. As an example, if a super-virus were to take place in a given population, data analytics can help answer questions like: how affected would that population be, how fast  would it spread, what actions would the NIH have to take to quarantine that area, and what steps would be needed in order to control the virus before it spreads across the country?

A real-life example is how data analytics is helping the fight against the Zika virus. The World Health Organization has declared the Zika virus a public health emergency that could affect four million people in the next year as it spreads across the Americas. Big data and analytics have played a role in  containing previous viral outbreaks such as Ebola, Dengue fever, and seasonal flu, and lessons learned are undoubtedly being put to use in the fight against Zika. However, while statistical modeling of vast, real-time data sets is becoming ingrained across healthcare and emergency response, the  support infrastructure needed to put these initiatives to work at ground level is lagging behind.

Data research has dramatically sped up the development of new flu vaccines. By analyzing the results of thousands of tests at institutions around the  world, compounds can be developed to target the specific proteins that are found to enable the virus to grow. Big data is also used by epidemiologists  to track the spread of outbreaks.

Dell EMC and the University of Cambridge maintain the European HPC Solution Centre. In collaboration with Intel, Dell EMC and Cambridge HPC  Solution Centre aims to provide answers to challenges facing the HPC community and feed the results back into the wider research community.  Thanks to the Centre, researchers are exploring the genetic analysis of tens of thousands of disease patients.

If you prefer, the complete insideBIGDATA Guide to Data Analytics in Government is available for download in PDF from the insideBIGDATA White Paper Library, courtesy of Dell EMC.

Leave a Comment


Resource Links: