Why Executives Need to Embrace Fake Data in Software Testing

Print Friendly, PDF & Email

“The value of an idea lies in the using of it,” said Thomas Edison, yet the steps one must go through in order to move from concept to practice is where the true challenge lies. This is true in many areas of life. Scientists, like Edison, conduct experiments in laboratories, setting up idealized environments within which they can test their theories and inventions to ensure their functionality before putting them into real-world practice. Entrepreneurs are filled with inspiration for an idea, but then rigorously research, develop, and test the product or service to allow it to realize its full potential. This maxim also rings true for enterprises seeking innovation. And just as researchers and entrepreneurs test their ideas in labs where they sacrifice “real life” conditions for control and consistency before unleashing into the market, enterprises innovate using testing environments to ensure ideas have real value before adopting them.

Enterprises have to take advantage of cutting-edge technology and new innovation to stay competitive—which involves finding, testing, and implementing new solutions to further the overall business strategy. Despite the seemingly straightforward nature of the process, testing solutions in a lab-type setting is far from simple. Running a high level Proof-of-Concept (PoC) requires the creation of a testing environment fully stocked with data with which to carry out the test. But enterprises don’t (and shouldn’t) allow genuine data to be used in these tests, for a myriad of legal, security, or bureaucratic reasons, so fake data must be generated in order to represent the data that exists within the actual production environment.

The idea of using fake data scares many company executives, and makes them skeptical of the process as a whole. If fake data is being used to test the new solution, how can a company know how it’s truly going to work or scale properly when plugged into the actual production environment? After all, any researcher will tell you, gaps inevitably exist between fake and real data. Thankfully, there do exist technological solutions to make everyone sleep better at night. Deep Mirroring, for example, is the process of deeply analyzing and learning the configuration of a production environment, and creating a mirrored version of it, with structured and unstructured data, logic, APIs and all. Deep Mirroring enables creating a highly realistic yet entirely generated testing environment that maintains the crucial rules, relationships, and behavior that define the original.

Faking data may sound as if all that’s required is a good imagination, but creating and analyzing this data is a unique and complex process. Today’s systems are built mainly on top of open-sourced structured and unstructured Big Data clusters, which are replacing legacy RDBMS (relational database management systems). The storage and fetching queries Big Data requires are much more complex and are done through Map-Reduce processes or In-Memory queries, which creates additional complexities in learning and emulating the actual production environment. The result of faking highly realistic data is a testing environment that, while at first glance seems to be a real production environment, actually differs in the most important factor of all for enterprises operating with security and legal restrictions—the data is not real.

This highly sophisticated technology produces testing environments as close to “real life” conditions as feasibly possibly. The generated actions, structure, data, and interfaces make sure that no sensitive enterprise information is jeopardized. The behavior of the environment, on the other hand, provides the enterprise with the ability to fully understand the way a solution would act when integrated into its production environment. This allows for the most accurate and reliable testing to take place, second only to actually using the production environment.

That being said, even with a highly realistic testing environment, the leap from lab to life is not one made lightly. As developers and enterprises conclude the PoC process, the many differences that de facto exist as a result of not being the actual production environment begin to appear more and more critical. As part of the attempt to ensure that the integration of a new solution won’t be met with failure once in the production environment, various styles of predictive analysis can be deployed during the testing stages. New and constantly evolving predictive algorithms assess how a solution operates in the testing environment and predict how it will operate in the real production environment, removing doubts that come with taking that final step of integrating a solution. By testing the scalability of new solutions, adding or removing computing power, or just throwing various wrenches into the gears to test reactions and compatibility, enterprises can at least test some of the real world impediments that will be faced before the solution goes online.

In today’s competitive landscape, you have to use testing environments, labs of sorts, but instead of fearing them, embrace them. Recognize that using fake testing environments keeps the benefits of PoC testing without the risks. Utilizing all the tools at your disposal to make the test environments realistic and to foresee how the software will act after implementation, will allow you to take the reins of innovation. By using these technologies, you will be able to plan, find, test, and implement technologies, that were once just ideas, and lead your industry in innovation.

Contributed by: Alexey Sapozhnikov, CTO and Co-Founder of prooV. Alexey is a career entrepreneur with over 20 years’ experience in enterprise software and high tech management. His areas of expertise include Big Data, Cloud Computing, Data Science, Python, and Enterprise Architecture.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind