Sign up for our newsletter and get the latest big data news and analysis.

Big Data and Open Science Data

This article is the fourth in an editorial series with a goal to provide a road map for scientific researchers wishing to capitalize on the rapid growth of big data technology for collecting, transforming, analyzing, and visualizing large scientific data sets.

In the last article, we surveyed the big data technology stack available for scientific applications. The complete insideBIGDATA Guide to Scientific Research is available for download from the insideBIGDATA White Paper Library.

insideBIGDATA_Guide_Research_featureBig Data and Open Science Data

With scientific data sets growing ever larger, researchers are finding that the bottleneck to discovery is no longer a lack of data but an inability to manage, analyze, and share their large data sets. Individual researchers can no longer download and analyze the important data sets in their scientific fields onto their own computers. The goal of the recent trend toward open scientific data is to remove the bottleneck to discovery by providing researchers with access to a variety of key data sets across scientific disciplines and the computing infrastructure to allow scientists to easily manage and share their data and analysis. Big data technologies serve to facilitate these goals by allowing for unparalleled data storage, and analytical capabilities.

More and more these days, research grant proposals require an “open data” element where all data collected by the project is to be made openly available through an easily accessed data store. One of the first, and best known data sharing projects is The Human Genome Project. The sequencing of the human genome was a massive undertaking by many researchers around the world. The results of their efforts have greatly advanced many areas of research in the life sciences and healthcare over the past decade and half, but none of that would have been possible if the genomic sequences had not been widely available. Instead, anyone can freely download human genomic data and use it in conjunction with big data technology. This is what open data is all about.

Here are some key reasons for sharing data and making scientific data open:

  • Clearly documents and provides evidence for research in conjunction with published results
  • Meet copyright and ethical compliance (i.e. HIPAA)
  • Increases the impact of research through data citation
  • Preserves data for long‐term access and prevents loss of data
  • Describes and shares data with others to further new discoveries and research
  • Prevent duplication of research
  • Accelerates the pace of research
  • Promotes reproducibility of research

To promote this level of data sharing, many scientific journals require their authors to make all data underlying their articles openly available from the moment of publication of the article. Opening up research data makes it much easier for other scientists to build upon that work and advance the field.

SciData_131-228x300The open data trend has yielded some interesting initiatives to bring open data to the mainstream of scientific research. A good example is Scientific Data, an open-access, peer-reviewed, online only publication from Nature Publishing Group, containing descriptions of scientifically valuable data sets, The goal of the publication is to assist researchers publish, discover and reuse research data. Scientific Data is open to submissions from a broad range of  natural science disciplines, including descriptions of big and small data, from major consortiums and single labs. Scientific Data primarily publishes Data Descriptors, a new type of scientific publication designed to promote an in-depth understanding of research datasets. Data Descriptors combine traditional scientific article content with structured information curated in-house, and are devised to maximize data reuse and enable searching, linking and data-mining.

If you prefer, the complete insideBIGDATA Guide to Scientific Research is available for download in PDF from the insideBIGDATA White Paper Library, courtesy of Dell and Intel.

 

Leave a Comment

*

Resource Links: