Sign up for our newsletter and get the latest big data news and analysis.

SaaS Data Ownership: The Key to Data Protection and More Impactful Machine Intelligence

In this special guest feature, Joe Gaska, Founder and CEO of GRAX, discusses how SaaS data ownership is the key to data protection and more impactful machine intelligence. Under Joe’s leadership, GRAX has become the fastest-growing application in Salesforce’s history. He has been featured on the main stage at Dreamforce and has won numerous awards including the Salesforce Innovation Award. Prior to founding GRAX, Joe built Ionia Corporation and successfully sold it to LogMein (Xively), which is now a part of the Google IoT Cloud. Joe holds a BA in Applied Mathematics and Computer Science from the University of Maine at Farmington.

With Gartner reporting that 97% of organizations having some form of SaaS applications in their technology stack, the question of SaaS data ownership is quickly becoming something we can no longer sweep under the rug. Cloud applications are everywhere and so is the sensitive customer data stored in them. And while most organizations have caught on to the fact that they need to take direct ownership of their SaaS data, many still see it as just a compliance checkbox.

But the data stored and repeatedly overwritten in our SaaS applications represents a historical record of cause and effect change patterns in our business. This data, aside from being essential for compliance and data privacy, represents the biggest missed opportunity to improve modern-day machine learning algorithms. It is the literal “cause and effect” information gap that machine learning algorithms need to make sense of why things change in our business.

Some of the most iconic companies in the world that we buy from daily, wear on our wrists, have in our pockets, or rely on to power the internet, are starting to catch on to this opportunity –and they are using an old set of tools in a new way in order to drive unfair advantage in their markets.

SaaS Data Privacy and Protection

With most major clouds (AWS, Azure and GCP, to name a few), data warehouses and other traditional tools now offering extensive protections and configurability for a myriad of regulatory scenarios, the elephant in the room remains SaaS or cloud applications. When it comes to CRM, third-party marketing automation tools or just about any other SaaS application, businesses are often at a loss about how to extend the same protections to sensitive customer data stored in those tools. Yet, those same tools are the lifeblood of our organizations – they are literally the mechanisms that move us forward in our markets.

So we audit our vendors, force them to sign BAAs or other industry-specific affidavits, block non-compliant tools and hope for the best. When GDPR requests come in, we do our very best to comply, hoping to limit our liability if something goes awry. Meanwhile, as individuals, we opine about the lack of protection extended to our own personal data in all of the cloud apps in which it is stored.

SaaS Data is the Missing Link for Machine Learning

With the mirage of general machine intelligence quickly fading, we’ve turned to narrower, purpose-built machine learning algorithms to help shed some predictive light on our future. This is where companies like Tesla are successfully feeding massive streams of narrow, time-series sensor data into machine learning algorithms to improve self-driving car functionality over time. The rest of us, in the consumer or B2B space, are often left scratching our heads about why Siri or some other, perhaps more modern intelligent algorithm running in our enterprise, seems to be so poor at giving us meaningful predictions about our future. We often overlook one of the key linchpins of answering that question – something the engineers at Tesla understand all too well: the most critical success factor in machine learning is feeding in a high volume of changes in data over time.

But, short of putting a million connected vehicles onto the road, how can we take advantage of that insight in our business?

It turns out that the answer to that question is the same one that addresses the SaaS data privacy and protection issue we identified earlier: SaaS application change data.

SaaS Data Ownership & Change Data Capture

For most organizations, the highest velocity of changes in data happens in the SaaS applications that they use to go to market. And the dataset those changes are happening to is often the sensitive customer data stored in CRM, ERP, e-commerce, and other critical cloud applications.

Based on both the regulatory need to protect such data, and the strategic advantage the data holds to improving analytics, machine learning and predictive modeling, it behooves every single organization in the world to start taking ownership of their SaaS application data.

But how can this be done?

SaaS Data Replication, Backup, Archiving – oh my!

Most organizations turn to some form of data replication or change data capture, ingesting application data into some parts of their DataOps ecosystems to try to extract value there. However, most final resting places of data, such as cloud data warehouses, are often only good at consuming data at a specific point in time. They don’t offer the ability to consume all changes in data over time, a critical factor for both the regulatory and machine learning scenarios identified earlier.

However, some organizations are starting to use old tools in new ways – one such case involves SaaS data backup. Traditional backup tools are extending functionality into SaaS applications, while other, SaaS-first tools are offering organizations the ability to snapshot data and store it in their own cloud environments. While some tools require a workaround to allow organizations direct access to captured data, a new breed of tools is starting to allow organizations to directly access the raw data in their own cloud environments.

3 Things to Look for in the Right Tool

Three simple guideposts can quickly tell an organization if they have found the right tool for the job:

  • Ownership – The tool must allow you to replicate or archive your cloud application’s data into your organization’s data lake; this puts sensitive data under the governance umbrella of your cloud infrastructure provider – which is often much more reliable than that of the application. What happens to the data as it traverses the backup tool itself? More and more organizations are turning to tools that allow them to maintain their data’s digital chain of custody – in other words, to never allow the data to touch anything other than a customer-owned environment.
  • Access – Your SaaS data should be available for direct consumption in its original format via your data lake as well as directly inside of your SaaS application itself. This allows you, in certain scenarios, to archive sensitive data out of the application while still letting the app continue to access the data from your data lake.
  • Data Capture – Your SaaS application backup, archive or data capture tool should give you the ability to capture up to every single change in your application data over time. This is critical to generating the vast change dataset necessary to making your predictive algorithms impactful and effective for your business. One hour or even 15-minute backup increments are no longer the standard – the declining cost of cloud data storage combined with the fact that most SaaS application data is measured in gigabytes and not petabytes, allow for organizations to capture every single change to high-value SaaS data in order to then feed it into ML and AI.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Leave a Comment

*

Resource Links: