Robust test data management strategies

When it comes to test data management it is true that no single approach, method, diagram or vision will give you everything that you want. What is particularly true is that it doesn’t matter what you do, if your test data management (TDM) is not high on your list of priorities then one of the following is likely to happen:

  • Verifying user needs will be incredibly challenging
  • Precious time allocated to testing will be wasted creating data that doesn’t fully meet the needs of anyone
  • Tests will give false positives or false negatives
  • Test automation will be ineffective, frustrating and counterproductive
  • You might run the risk of contravening GDPR or other laws and regulations.

Testing can be used for many things, but one of the greatest elements is to provide a measure of confidence. Run as many tests as you like; create the world’s greatest automation factory churning out millions of tests across browsers, apps and platforms, but it really doesn’t matter unless your data is right.

Cambridge computing pioneer and father of modern computing, Charles Babbage, conceived and designed the Difference Engine. While the machine was never built in his lifetime, it was clear from the earliest computers, that the outputs obtained were in direct relation to the inputs. In those days computers were like steampunk machines; made from gears and cogs, requiring oil, fine-tuning and dedicated expert maintenance. And yet today, almost 150 years later, TDM is no different. The same principles apply to the Difference Machine as they do to the Sunway TaihuLight Supercomputer and the latest advances in AI. It’s all about the quality of the data available.

Functional testing & the TDM journey

So, here at the Information Service Division UIS (Cambridge), we pay special attention to TDM. We system and regression test a number of central, single record, browser-based information systems that provide an integrated information service for our stakeholders, including colleges, administrative departments, faculties, schools, students, staff, academics and alumni.

Our team was created in 2008 to help reduce the number of defects in our production systems. Initially consisting of functional testers, the requirement was to verify and validate the work produced by developers and functional analysts before it was released to our production systems. Over time, the team grew into what it is today, a professional group of independent test analysts running thousands of functional, automated and performance tests to support business-critical applications across the University.

As each of the test specialisms grew (functional, automated and performance testing), we found they all needed their own TDM processes – and this is our journey to TDM maturity.

In the early days, one of the main challenges with functional testing was how to manage the test data. We realised that finding quality test data, and being able to store and maintain it, was just as important as writing and executing test scripts. Keeping ad hoc data in test scripts or in secure lists was not going to be sufficient for our needs.

If we were to test business features and scenarios properly, we needed a data management strategy to manage the many variants of static and transitional test data. It was important that we had quality data providing consistent and correct results and it had to include the provision for single data sets for some scenarios and high volumes of data for automation and performance testing. Ideally, as reusable as possible.

We also realised that TDM takes up time and we had to accept that finding the right data was going to be a time-consuming process, sometimes taking as long as it takes to test the business process or write scripts. We had to take on the administrative effort in maintaining a test data management repository as our data became invalid quickly which meant that our TDM process had to include frequent review and maintenance stages.

Applying the software development cycle to TDM

To make sure our TDM strategy provided the data that we needed, we applied the stages used in the software testing development cycle – planning, analysis, design, build, and maintenance. We added an extra step by communicating our test data strategy to the business for sign-off. This was a strategy that would carry us through the present and would continue to work in future years. To meet our requirements our TDM process has gradually matured into a collaborative and iterative process.

It was critical for us to involve a wider audience of functional analysts, developers and users to confirm requirements and make sure all major business scenarios were fully addressed. All test data management decisions were based on discussion, process planning and demonstrations. This helped us focus on prioritising tests and improving efficiency. After much hard work, we had created a TDM process with a good understanding of what was reflective of the true business process and met testing requirements.

The next most import factor was for our test environments to replicate the end user environment as much as possible and another important consideration was to make sure that our testing environments were well defined, up to date and consistent.

So, our test environments were refreshed from the production environment on a regular basis after each major code release giving us data for all our testing needs. Test environments also reflected the production environment by including all the relevant interfaces, which were also refreshed. All data in the test environments was masked (scrambled) to anonymise personal data.

Now that we had ring-fenced our data and made sure it was of sufficient quality, we then had to gather and store the data. The majority of functional test data sets were retrieved with database queries, which were then stored in test scripts or in a shared database query library. In the test scripts, it was made clear if the test data was for re-use or had to be created.

Working with this test data management process provides quality data efficiently and on-demand but we still have challenges. We have to take into consideration that the same data may be used by multiple teams, which means data can be used up or becomes invalid. Some data may not reference all test cases in a test suite and some data may be very scarce. The costs of storage can be high and our environments sometimes have to be scaled down. It is also important to learn from each code release. By obtaining feedback from project teams, DBAs and users, we are able to build better test strategies for the next code release. We are not afraid to learn from experience.

TDM for automation & performance testing

Automation is a key part of the UIS test strategy and TDM has played a major role in successfully creating, maintaining and running our non-functional and automated testing services.

Our automation and performance TDM strategy organically developed as our frameworks matured. We went from having one Test Engineer running 50 automated tests running sequentially, relying on hard-coded test data with minimum horizontal scalability, to running more than 800 cross-browser distributed tests across 28 runners covering six business critical systems across the university. So, TDM becomes indispensable in terms of supporting operability and securing our long-term investment in automation.

At the start of our automation journey, our tests were running ad-hoc and test data selection and design were not conducive to long-term operability or to the maintenance of the test assets. However, as our expertise grew we acknowledged that test automation required a high degree of test data orchestration and TDM was crucial in supporting the four ‘Rs’ of test automation: reusability, reliability, resilience and realism.

A well-designed automated test should deliver on the four Rs of automation good practice and consequently, it should run today, tomorrow and in a year’s time. To establish this level of rigour it took the automation team numerous iterations during which we tried various approaches, leveraging different technologies and methodologies. We also changed our automation test design approach and adopted a method based on the following principals: collaborate, build and maintain (CBM).

The CBM approach became the most essential part of our test automation strategy. It empowered the automaton testers to take ownership of their automation suites and fully manage the quality of the test data design processes. As a result, each automation suite runs as an agile automaton project, using valuable information gathered through consultation with stakeholders and by having sprint planning meetings in collaboration with functional analysts and technical developers.

This approach has been so successful that now our automation suites have been used to support the testing of infrastructure changes, the application of security patches and the functional testing of system upgrades. The value and return on investment (ROI) provided by test automation services, in our experience, can be directly correlated to the quality of the test data, the maturity of the test automation frameworks and the corresponding TDM strategies.

During the last ten years, many organisations have integrated their automation and performance testing in-house solutions with cloud services in an attempt to increase test coverage, streamline testing feedback loops and accelerate QA processes. Although these hybrid cloud strategies may offer certain advantages in terms of infrastructure scalability, they also present compliance and security risks which can be painful and complex to manage.

At UIS, our performance testing services went from engaging external contractors to utilising 20 Windows XP computers in parallel, to orchestrating load and volume exercises with thousands of users completing millions of end-to-end transactions per hour. Each one of these approaches had different degrees of complexity, data orchestration requirements and cost. Storing performance test data, sending it via the network and having the sufficient CPU and memory to simulate thousands of users, does come at a cost.

As with all maturing services we had some critical early lessons to learn. Cloud can be costly and it requires careful planning, management and monitoring. By failing fast and failing early, we were able to learn, adapt, refine and evolve. This meant we were able to improve our performance testing services by creating a hybrid model built on three pillars: cost-effectiveness, TDM and having full control of our test data (at rest and in transit), and the capacity to leverage the cloud for scalability.

Main objectives

One of the main objectives when testing a system is to ensure the delivered product meets the agreed operational, performance and usability requirements. TDM has a major role to play in facilitating all QA efforts across the delivery of IT systems as well as in creating customer satisfaction. The more comprehensive our test design becomes and the more similar our test data is to production data – having complied with GDPR and security aspects – the higher the level of realism our testing activities can generate by way of creating high-valued test cases. Equally, the more mature our TDM strategies become, the more informed testing we can do and the more ROI we can realise from our testing tools and resources.

Conversantly, poor TDM can be the Achilles’ heel of testing services (especially test automation projects), as these fail to deliver on productivity and cost-effectiveness and eventually, management runs out of patience, confidence and money. However, there will always be data challenges, no team can be complacent with their TDM strategy. It has to be an iterative process and be seen as an important part of any test strategy.

The strategic direction of travel for many organisations across different industries is, inevitably, on a direct collision course with automation, big data and the adoption of artificial intelligence and robotics. TDM is likely to be at the epicentre of the so-called ‘test data revolution’ and will require engagement and collaboration from all stakeholders in order to create quality test data that supports QA activates and drives demonstrable value across the software development lifecycle.

Written by Jane Karger, Test Delivery Manager (Functional Testing) at University Information Services, University of Cambridge & Agustin Fernandez Trujillo, Test Delivery Manager (Non-Functional Testing) at University Information Services, University of Cambridge

Related Posts