Sign up for our newsletter and get the latest big data news and analysis.

Is Your Data FAIR? An Open Data Checklist for Success

In this special guest feature, Assaf Katan, CEO & Co-Founder of Apertio, the Open Data deep search engine, suggests that there are huge social and financial benefits that businesses and economies can realize if they can successfully leverage Open Data. Despite this, there are still some hurdles for data professionals to leap. A great way to start is to consider whether your data meets the criteria for what’s known as the FAIR principles. These are Findability, Accessibility, Interoperability and Reusability. Assaf is an accomplished executive and CEO with 20 years experience in both startup and corporate environments. Vast experience in leading strategic business initiatives, M&A, growth processes and transformations from planning to execution. Passionate about closing deals and desert hiking.

Many believe that data should be recognized as the ‘oil of the 21st century’, the world’s most valuable resource. This opinion is spreading, with 64% of more than 1,000 researchers who contributed to the State of Open Data Report in 2018 making their data available, compared to just 57% in 2016. There are huge social and financial benefits that businesses and economies can realize if they can successfully leverage Open Data. Despite this, there are still some hurdles for data professionals to leap.

A great way to start is to consider whether your data meets the criteria for what’s known as the FAIR principles. These are Findability, Accessibility, Interoperability and Reusability.

Findability

An essential tool for finding the data that professionals need is a single point of access to government data. A lot of data is available, but the distributed nature of the datasets means that many people simply don’t know where to start. Enriched use of metadata is one way to ensure that your data is findable. Metadata should be given globally unique identifiers that are persistent, registered or indexed within this single point of access. Taking this principle further, when your search engine can look within the metadata and the files themselves and not simply by publisher’s classifications, data professionals are far more likely to find what they’re looking for.

Accessibility

Once the data has been found, is it accessible to all? Data that needs a license to utilize, costs a premium or is limited to a subsection of the population cannot be truly called Open Data. Of course, some datasets are sensitive (such as health records or matters of ongoing security) and then there should be a protocol in place for authentication and authorization. Outside of this however, access should be given the highest priority for mass consumption. Where licensing or permissions are a factor, the steps for requesting and accessing the datasets should be available and simple to follow.

Interoperability

This category discusses the way that the datasets can be used in conjunction with one another, or other services and institutions. Many governments or organizations use ‘one-off’ methods of data distribution or publishing, which can affect comparability and usefulness of the data itself. A shared language can help here, using vocabularies that are broadly applicable in each industry, and references to other similar datasets. Normalization of datasets and a shared method of downloading and viewing data side by side are also important.

Reusability

Open Data can only be reused if the information is verifiable, with clear direction for citations and detailed provenance. In studies of more than 115 countries, just 53% of Open Data can be categorized as ‘reusable.’ As we’ve said, the format itself should also be easily shareable or comparable with other datasets. This is a common problem for data professionals, for example where some governments share datasets in annual reports while others have no data, or datasets of averages over a longer time period. Smart search engines should be able to point you in the direction of comparable datasets or similar search parameters and results.

Taking FAIR Further

While the FAIR principles are not equal to the idea of Open Data, they are a great start for ensuring that data sets are available en-masse to the data professionals who need them. Looking deeper, Open Data search engines are expanding on their capabilities to allow data users to streamline their research even further.

These include factors like how to filter and sort the data search results, by source location or government, as well as by publisher categories. Some also allow you to view similar datasets alongside your original, or relate your findings with those of your peers to see what other data sources have been useful or comparable, and learn from their experience through their own comments.

Open Data looks to create a culture of transparency that can truly support both data users and publishers – but only if the data is both discoverable and usable. Working towards data sets that can be found and accessed, widely utilized and reused by professionals of all kinds is a huge task, but it has incredible potential benefits across our society and our economy at large.  A single point of access, smart search tools and following the FAIR principles are all good practice to help us get there.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: