January 2020

Cloud Computing


froglogic Squish

GUI Test Automation Code Coverage Analysis

Industry Events

Now in its sixth year The European Software Testing Awards celebrate companies and individuals who have accomplished significant achievements in the software testing and quality assurance market. 

Enter The European Software Testing Awards and start on a journey of anticipation and excitement leading up to the awards night – it could be you and your team collecting one of the highly coveted awards.


Mobile Testing

Manage your infrastructure as code


Master's guide



With the UK’s employment rate estimated at 76.1%, higher than for a year earlier (75.6%) and the joint- highest figure on record, 2019 has been looking up for those both seeking employment and looking to change jobs. The UK unemployment rate was estimated at 3.8% – having not been lower since October to December 1974 (ONS).

Of course, this is may all be down to how the government records and indeed, classifies ‘employed’ and ‘unemployed’, with zero-hours contracts taking up a large proportion of these figures.

All that aside, the need for software testers, DevOps experts and similar industry roles is greater than ever before and, in many cases, it can be considered a flooded market, with more jobs than there are suitable candidates. But, if you do find the role for you, what might come up in the interview and how might you best answer those essential questions?

In How to get hired in Test Automation (p.4), Michael Battat discusses how their senior developer advocate at Applitools, Angie Jones, advises on the range of questions you might be asked and what to do when coding skills come up on your interview!

It’s a long-form article, so make sure you have a read – whether going for an interview yourself, or perhaps looking for your next star employee.

Elsewhere in this issue we have further discussion of the jobs and skills market and the need to stay ahead of the latest trends and changes within the industry.

In It’s DevOps, Jim, but not as we know it (p.14), Abdul Rashid Hamid discusses how the traditional IT infrastructure & management role has evolved over the last decade, with skills once cherished and in demand, having now been confined to history. With new requirements in the workforce as we enter new and changing phases within the industry – and as per the law of supply and demand – professionals will need to adjust or become obsolete.

For those looking to an increasingly DevOps future, Senthilkumar Sivakumar gives a comprehensive guide to DevOps, its tools, processes and the types of testing most often used and/or required in DevOps: A Dungeon Master’s Guide (p.38). Here he discusses DevOps from start to finish, covering everything you might need to know if you are looking to introduce these transformational practices into your business or workplace.

In a similar vein, John Scott advises on how to stay ahead of the curve when the world is changing rapidly. In his article How to stay ahead in DevOps (p.22), specifically focusing on telecoms, he discusses how telecoms are becoming an enabler to pivot towards the Internet of Things and connected devices. How are businesses and testers going to stay ahead of the pack and deliver quality, faster?

Talking of enabling, in The Need for Web Accessibility Testing (p.48), Narayanan Palani discusses how, in the modern testing marketplace, new trends of speciality testing can really enhance your employability, as well as making the world a better place for those requiring enhanced technology accessibility in their lives. True accessibility testing will also always be a manual testing requirement – food for thought.

So, wherever you or your organisation are on the intersection between upcoming business and technology trends, it’s always important to be ahead of the curve, whether a tester new or old. 


November 2019 | Volume 11 | Issue 5

© 2019 31 Media Limited. All rights reserved.

TEST Magazine is edited, designed, and published by 31 Media Limited. No part of TEST Magazine may be reproduced, transmitted, stored electronically, distributed, or copied, in whole or part without the prior written consent of the publisher. A reprint service is available. Opinions expressed in this journal do not necessarily reflect those of the editor of TEST Magazine or its publisher, 31 Media Limited.

ISSN 2040‑01‑60

Barnaby Dracup

Email: editor@31media.co.uk
Phone: +44 (0)203 056 4599

Get Hired In Test Automation


Angie Jones
Senior Developer Advocate


Angie Jones, a senior developer advocate at Applitools, is a test automation rock star. She has held leadership positions at well-known companies in test engineering. She knows functional testing and test automation.


Are you looking for a new job in test automation? Are you ready for the range of questions you will be asked?

Angie Jones, a senior developer advocate at Applitools, is a test automation rock star. She has held leadership positions at well-known companies in test engineering. She knows functional testing and test automation. And she knows how you can ace your technical test interview.

While Angie first shared her webinar, Ace Your Next Job Interview, with us in 2017, we have seen lots of open jobs in functional testing and Selenium testing. Demand grows for engineers with SDET (Software Development Engineer in Test) qualifications to help build working and maintainable software. Whether you are test automation engineer looking to interview with someone who advertises for a “Selenium testing” position, or you are looking for an SDET position, coding skills will come up on your interview.

We reviewed this article with Angie. She said the content is just as relevant today as it was two years ago. So, here is a summary of her webinar (which you can also watch below).

Test Automation Coding

If you take away nothing else from this column, take away this: automation engineering for functional tests requires coding skills, knowledge of test approaches, and a way to marry the two effectively. Automation engineers need to know how to build appropriate tests into the code being tested. As a result, you need to expect to be quizzed on both your test knowledge as well as your coding knowledge.

Angie’s point?

“The game has definitely changed. The automation engineer interview might be the most difficult one because not only do they ask us automation questions but they’re also asking us testing questions and developer questions. Each one of those may have its own round of interviews.”

Effectively, people are looking for someone with a tester’s mindset, and a developer’s skill set. This could be an SDET position, or just an automation engineer with code analysis tools.

Let’s review the ideas Angie shared, so you can feel more prepared. She broke the interview into two parts: testing questions, and developer questions.

Ace The Testing Questions

If you’re interviewing for a functional testing role, you need to ace the functional testing questions.  Savvy development teams may implement a “shift left” strategy and move a percentage of testing to the developers themselves. You must expect to answer the testing questions from both your developer interviewers as well as your test interviewer.

The classic interview questions for functional testing asks you to develop a test approach for an everyday object. Like, say, a chair.

How Would You Test A Chair?

SDET, Automation Engineer or Manual Tester – stop and consider this question.

If you haven’t come across this question – or one like it – remember two concepts: assumptions and limits. Your assumptions will dictate how you approach the problem. Your test cases will be influenced by both your assumptions and what you think about limits.

Let me be more concrete.  When you think about testing anything, do you begin with test strategies? Test cases? What have you missed?

Validate your assumptions by asking questions. If you only assume, you can miss some of the important design considerations that will drive your test strategy.

Here are some questions that might help:

  • Who is the user?
  • What are the purposes for which the chair was designed?
  • Was there a weight/size/height/age assumption?
  • Was the chair designed for a person with specific abilities, or specific disabilities?

You might think your chair tests are fine until you consider a set of chairs like this one:

Yes, each chair supports sitting. But one is clearly mobile, one or two may be too awkward or too heavy for an individual to move, and at least one may have a weight limit.

Each has its own set of specifications and use cases. And, you might think that this set is sufficient to broaden your definition until you come across the chair that doubles as a stepladder.  In this last example, if you do not consider the stepladder use consideration, you will miss a set of important functional tests.

You want to make sure that you develop use cases from the design and intended use – and not simply what you thought would be important. For instance, “How high can a person stand on the chair?” is not a concept for the one on casters.

Once you know the intended use cases, you can consider tests to run and appropriate limits for those tests. So remember – ask questions.

The Automation Round

The most common mistake for engineers pursuing test automation jobs, Angie says, is people who prepare only on UI tests. Those candidates who think the job is Selenium Testing are stumped by any question falling outside UI testing.

Angie suggests that everyone know and understand the test automation pyramid, first introduced by Mike Cohn:

In this pyramid, the bulk of automation involves unit tests. Services tests, which validate all but the behavior of the UI, are the next largest chunk. UI tests are the smallest volume of automation tests, as they are both complex and costly in terms of engineering time to ensure both behavior validation and test coverage.

If you want to be prepared for discussing automation, you need to know how to handle unit tests as well as service and business logic tests.

If you are unfamiliar with unit testing, you can take the Unit Testing course on Test Automation University.

Let’s start with unit tests. Imagine you had the following method to automate:

public int add (int a, int b);

Language aside, let’s think about what is needed to test this behavior.

You may be thinking, “Oh crap, I don’t ever do unit tests.”

Take a second to understand the inputs and outputs.  Here, this method takes two integers and it returns an integer. Given the name of the method, add, you might assume that it’ll add the numbers and return the sum, but it doesn’t hurt to ask a clarifying question such as:

“I assume this method adds the two parameters and returns the sum, right?”

just to make sure is not a trick question.  In fact if there’s anything that’s not perfectly clear or obvious to you, ask about it.

If you find yourself becoming self conscious and you think you’re asking too many dumb questions then just remind the interviewer (and yourself) that you’re a tester and you’re trained to challenge assumptions. Your interviewer will appreciate that.

Handling Unit Test Questions

So for this question, Angie recommends listing every test you can come up with. You don’t see the body of this method, so don’t assume what’s inside of it. Don’t assume everything works beautifully. In fact, assume this method will break in normal operation – so think of normal operations you might run.

Angie listed the following tests from the top of her head:

  • a and b both positive
  • a and b both negative
  • a positive and b negative
  • a negative and b positive
  • a being zero
  • b being zero
  • both a and b being zero
  • the sum of a and b exceeding the memory allocated for an integer

The more tests you can consider the better you will do here.

While this question is gauging your testing abilities, it’s also testing how much you understand about code. You need to understand the difference between valid UI tests and valid unit tests.

Avoid making a common mistake UI testers make in suggesting impossible coded tests – something like:

“Oh, I want to pass a String as one of the arguments.”

That’s a very typical UI test, to see how the UI behaves when a user enters a String in a number field. However, from a unit test perspective, your tests are directly calling into these methods in code. If you were to try to pass a String as an argument, your test wouldn’t even compile.

Know the difference between valid and invalid unit tests. If you suggest an invalid unit test, it sends a signal to your interviewer that maybe you don’t understand code.

Service Layer Tests

Automation testers must understand how web services behave. An SDET is expected to understand service calls, service responses, and hot to test service behavior. Increasingly, Selenium tester interviewing for an automation engineering role needs to know this as well..

If you know UI but don’t know web services, dig in and learn (there are various sites to check out, including the course “Exploring Service APIs through Test Automation” on Test Automation University). Otherwise, spend a little time brushing up.  Angie says interviewers consistently ask about web services.

Here is a sample question:

Given a user profile, how would you test CRUD operations of a REST API?

CRUD is an acronym for Create, Read, Update and Delete.  These have equivalent commands REST API commands in Create = POST, Read = GET, Update = PUT, and Delete = DELETE.

In the question, you aren’t given any information about parameters. Should that stop you? Nope. You’ll want your interviewer to know that you can think abstractly – and then be more specific if you’re given more detailed parameters. So, you can ask:

“Do you have a spec, or would you like me to solve this in general terms?”

If they want you to solve this in general terms, you can lean on what you already know to answer this question.

Answering the Service Layer Question

Given that you have already created a user profile someplace you can leverage your experience to think about this problem in the abstract. Once you consider that most service calls have both required fields and optional fields, you can think about passing different parameters. Let’s start with the “Create”, or REST POST method.

The four POST tests Angie considered included:

  • POST with all required and optional data
  • POST with required data only
  • POST with required data missing
  • POST with invalid data for parameters

You might be able to come up with more than these four basic scenarios, but as a starting point, these are sufficient. It shows you are thinking about normal and abnormal inputs as ways to validate output. If you knew more about the API, you could be more specific.

You can also point out that successful POST calls will result in a status code of 201 for successful creation. You can mention that if you knew more about the API, you could validate more details within the body and header. And this is typically in JSON, XML or plain text format.

Angie put together the tests she considered as basic for all four of the commands:

These include error paths as well as happy paths, multiple calls, etc. You can also mention appropriate response codes and checking the body values if you have them. For instance, 200 is the response code for a successful GET, PUT, and DELETE.

Remember to think broadly and abstractly for tests to consider.

If these don’t seem obvious to you, it’s worth studying service calls some more.  There are some great courses on Test Automation University. Check out:

UI Automation

So, it may seem like the most trivial UI question you’ll get is something like: use Selenium API to log into an app. It’s a pretty overused question that most interviewers skip by these days.

Overtime, interviewers have concluded this question exposes little about the prospect’s knowledge of practical UI testing techniques. If you hire someone just on this type of simple question, you may discover your employee can do rote work, but not the analytical thinking needed to write real tests.

Instead, Angie suggests that you expect something like what she uses in interviews – showing a real UI and asking you, the candidate, to use the Page Object Model as a reference for creating an approach to creating UI tests.  Something like this:

This is Twitter profile on mobile. Given this page, and the Page Object Model pattern, how would you build the test framework?

This kind of question can expose whether you understand how to approach testing a real world application.

To answer this question, look at the screen and determine which elements your tests might need to interact with and/or verify.

So, on this page, there’s a

  • banner header
  • a profile photo
  • name
  • handle
  • bio
  • location
  • link
  • number following
  • number of followers
  • an Edit Profile button
  • Tweet button, and a
  • navigation footer

There are also four tabs, and each will have it’s own content and interactions.

Thinking Through a User Interface

The key is thinking how to organize this into a Profile Page class. This all seems straightforward to organize, but Angie points out that this example can separate the great candidates from others. For instance, if you realize that the “Edit Profile” button only exists for the logged in user, and it won’t show up on anyone else’s profile – that’s great. More importantly, how do you design your page object class for this case? Do you include the Edit Profile button in the Profile Page class, knowing that sometimes it’s visible, and sometimes it’s not – or do you create a base profile page class that has common elements and subclass it with a My Profile vs. Other People’s profile? Whichever way you go, you need to justify your answer.

Also, think about the tabs. There’s Tweets, Tweets and Replies, Media and Likes. How would you handle that? That might be a question for you.

In fact, your interviewer might wait for you to see these tabs and come up with an approach. If not, the interviewer would possibly point them out to you as a hint (and ding you a little on the interview) before asking you how you would deal with these.

Just like with the Edit Profile issue, your approach to these could be to make these part of the Profile Page class, or to make each their own class accessible from the Profile Page Class. It’s up to you – you just have to be ready to justify your approach.

Finally, there’s the navigation bar at the bottom. What do you know about it? If it turns out it’s on the bottom of every page, would you create separate calls for each page of your app? Or, would you make a base page class that all pages inherit from which includes the navigation bar? Or, would you do something totally different? Again, you need to justify your answer, and be able to discuss the tradeoffs and pros and cons of  your choice.

Angie’s Approach to the UI Question

Angie says her design would include:

  • a base page class that includes all the elements that appear on all pages
    • the navigation bar at the bottom with appropriate methods
    • any other elements that show up on every page.

This approach allows code re-use and avoids code duplication (which makes the code easier to maintain if functions and layout change between releases).

For each of the tabs, Angie says she would create separate classes and link to them via “click” methods from the Profile page class. The result is a small Profile page class that is easy to maintain if the tabs ever change.

The tabs themselves are components that have their own web elements and corresponding methods. If this were all one page, it would become a mess to manage, as one would have to keep track of which tab were active and which methods were appropriate at that time.

Finally, with the Edit Profile button, Angie recommends considering the profile page of the user vs. the profile page of another user asking, “Is the Edit Profile button the only difference?” This might be a question for the interviewer. As it turns out, the Edit Profile button only appears on my page, but it turns out that there are lots of buttons on another user’s page – like “Follow”, “Unfollow”, “Block”, “Mute”, etc. This makes the case for having a base profile page that contains all the common profile page elements, as well as two more profile pages classes for “My Profile” and “Other Profiles”. Each would contain what is specific to those pages.

Comfort With the DOM

Another set of interview questions determines whether someone is comfortable with the DOM. It’s such an important part of testing, and yet so many people rely on the locators recommended by their browser’s Developer Tools. This results in flaky tests.Let’s look at this example.

How do you build a test that makes a selection and records a vote? Your test should be generic enough that it can select either choice and record a vote.

Angie shared that as an interviewer, she would start with this UI, and then share the HTML that creates this UI:

Angie said that most people freak out when they see the HTML – and yet they are automation engineers.

Angie finds it disappointing when people say, “Well, I just use Firebug.” Because, when there are no locators – yes, Firebug can build a test, but it will be extremely brittle.

This is what Firebug gives for the second poll option:


Yes, that’s accurate for this build of the app – and it’s brittle. You don’t have any contextual reference. You don’t have a way to link this to the app itself. You don’t link to any reference in the code. You can’t automate this – this is the link to the second option explicitly and directly from the HTML root. If the app changes, this code will break – and you may not know why. You’re going to have to recode this manually all the time. Yuck.

Angie wrote a whole blog post about the perils of relying on recommended locators from Developer Tools.

In going back to the DOM, there’s a span element for “Yes” and a separate span element for “No”. Should you use these? Again, the answer here should depend on how comfortable you are with the idea that “Yes” and “No” are only used in the HTML in these locations on this page. How comfortable are you with that likelihood (HINT: Not very!).

Going up one level for each, there is a call to a radio class for the radio button value. It turns out you can create a CSS selector which allows you to uniquely call one or the other:

.//span[contains(@class, ‘PollXChoice-choice--radio’)]/span[text()=‘%s’]

If you want to know about how to select web element locators like the CSS selector here, there’s a whole course on Test Automation University (and you can read a course preview here).

Development Round

Development questions are the scariest ones for test engineers. Angie says that she thinks they’re scary, too. Engineering management teams are requiring more development-level tests because of the frequency of test automation project failures. They want their test engineers to be top notch coders.

You may get as many as five of these questions during  your interview.

Here are three things to brush up on:

  • Big O Notation
  • Data Structures
  • Algorithms

Big O Notation

This is huge in coding interviews. It’s a measurement of the performance of the algorithm (by execution time) based on the algorithm’s design.

The point is to know the relationship between the algorithm design and performance, related to the size of the input set. An algorithm with a O(1) is excellent – performance is the same regardless of the input set.  An algorithm with O(2^N) is horrible – something that runs recursively with multiple calls.  If this is familiar, awesome. If not, there is a little more detail in this article

Data Structures

The next thing to worry about is data structures. There are four common data structures you will encounter:

  • Hash Table – by far the most common in tech interview questions.
  • Stack – shows up in some questions. Stack is a last-in/first-out data structure.
  • Queue – shows up in some questions. Queue is a first-in/first-out data structure.
  • Linked Lists – doesn’t show up in very many questions. – Each element points to the next element.

Angie says she encountered questions about hashtables frequently, and stacks and queues as well. Linked lists weren’t ones she encountered often, but they did come up.

You can find a more detailed overview of these and other data structures here.

Sample Questions

Angie pointed out that there are two kinds of test approaches she encountered during her interview experiences. In some cases, she was asked to use a coding site like hackerrank.com. In other cases, she would be asked to enter code using an online text editor like Google Docs. In still other cases, she would be asked to code on a white board. Angie suggests you ask your recruiter or the hiring manager what to expect for doing coding examples.

During her interviews, she mentioned being in a room with developers and automation engineers who were evaluating her answers.

Here are some tips to make things go smoothly.

  1. Very rarely will you get written questions. Most of the time, the questions are spoken. Listen and write down what you think people are asking so they can see, and then clarify.
  2. Write down an example to code against. This will help you structure your answer.

Sample Question 1

Angie was given the following question:

“You’re given two arrays and you’re asked to print out any characters that appear in both arrays”.

Using Tip #1, Angie reminds us to think – what does this question mean? Make sure to ask clarifying questions if you’re not sure. So, she asked:

“The array is an array with each element being a single character, correct?” This validates that the array wasn’t an array of strings of arbitrary length, which might make the coding more challenging.

What other questions might you ask?

Approaching Sample Question 1

Using Tip #2, Angie reminds us to think about coding cases – simple cases to define correct behavior, and exception cases that one would expect to encounter.

For the simplest cases in Angie’s example, she wrote down a simple set of arrays:



She said she normally uses a more complex example to help push her coding approach. As she said,

“If you code for the simplest examples, you may miss the tricky examples. I try to put the tricky one up front to force me to think it through.”

She then updated her array example to have an array with duplicate elements:



Another suggestion is – once  you have your example, write out what your expected result would be for the example before you code – it’s a way to check your work. Here, the answer should be:

Expected Result: {‘c’,’b’}

Now you have a way to validate your work before you get started. What you don’t want to have is code that doesn’t result in your expected output – now you have to work backward to figure out what you did wrong, and it will leave you flustered.

Thinking Through Approaches

Take a minute to think through Big O notation, algorithms, and data structures to begin. Take enough time to pick an approach, but don’t take too long to start – your interviewers will think you don’t know what you are doing, and you will feel flustered. Pick a solution that comes to mind first.

Say something like, “Okay, let me get my thoughts out here and I can refine them later.” That seems to work well.

Angie took a moment to write out code related to her sample question. Her first approach was to write a nested loop, which advanced through array a1 and compared each element to the elements in a2 – printing the character if the element in a1 matched an element in a2. :

When she walked through this nested loop, she realized there was a problem, as the algorithm would match the ‘c’, then the ‘b’, and then the ‘c’ again. This is not the output she wants.

This is also an O(N^2) approach, which is pretty inefficient. There should be a better way.

Catching Any Mistakes First

The important point is for you to catch these issues before the interviewing team does. They want you to know the problems in your approach to guide you to a better outcome.

In this case, the issue of duplicate values – along with knowing that the O(N^2) algorithm is not the most performant, makes it clear that you know something useful. You know BigO complexity, and you know how to  look at your solution and find cases to break it – meaning you understand testing as well. Also, it means you are not afraid to admit when you mess up.

Angie then thought through the best way to improve the solution. She decided that the best approach was to create a hashtable from array a1, then compare array a2 with the a1 hashtable. The code looks like this:

This is more code than the original solution and it looks ugly, but it ends up being an O(N) solution, which is really efficient. It also addresses the duplicate issue and the O(N^2) performance problem of the first solution. Therefore, it’s a superior answer.

As you walk through issues, your interviewers may be giving you hints or clues about your code. Pay attention to both their hints and their questions.

Sample Question 2

Angie dealt with the next question:

“Compare a String with a List of String elements and count the number of times the String is found in the list.”

Using Tip #1, she asked the questions that came to her mind to clarify the parameters of the problem. Then, she came up with an example and started thinking about code. She came up with this:

The solution here works, but everyone else will figure this approach out. How do you stand out as creative?

Standing Out From the Crowd

What happens if the list is null, or an entry is null? Won’t lines 3 or 4 generate an error? How do you handle that?

What’s more, if you know your language library, you can take advantage of built in methods. In this instance, there is a built-in Java library method that does this.  The modified code looks like this:

Here, the code is much more efficient as it leverages the language library. It also shows that you have thought about the null list failure mode (and the element null issue is addressed in the built-in function).


The National Software Testing Conference

20 May 2020


The National Software Testing Conference is a UK‑based conference that provides the software testing community at home and abroad with practical presentations from the winners of The European Software Testing Awards, and roundtable discussion forums that are facilitated and led by top industry figures.

This premier industry event also features a market leading exhibition, which enables delegates to view the latest products and services available to them.

The National Software Testing Conference is open to all, but is aimed and produced for those professionals that recognise the crucial importance of software testing within the software development lifecycle. Therefore, the content is geared towards C‑level IT executives, QA directors, heads of testing and test managers, senior engineers and test professionals.

Why attend?

The National Software Testing Conference has its finger on the pulse as it is owned and supported by the industry leading journal TEST Magazine and is produced by the organisers of The European Software Testing Awards.

No other software testing conference in the UK or Europe can boast these credentials. This conference ensures that you get access to the latest analysis and ground-breaking research from authoritative sources – which enables you to keep one step ahead of market trends.

All delegates will receive a printed copy of The European Software Testing Summit Report, covering trends and learnings derived from the hundreds of entries into The European Software Testing Awards programme.

Maximise the power of data in the cloud

Torq Pagdin
Director of Data Engineering


Torq is an experienced technology professional with a passion for data and new technologies. He particularly enjoys people management and inspiring others to achieve their goals


Setting up you and your data up for success in the Cloud – what is the right way to handle Operations, Data Quality and Data Availability issues?

Torq Pagdin, Director of Data Engineering at Hotels.com lays out how he has set up his team to maximise the power of data in a public cloud platform.


Moving your data pipelines from on-premise solution to a public cloud provider can be a very daunting endeavour. However, for many businesses, the benefits of moving to the cloud far outweigh the risks but what exactly are the problems that data engineers and software developers are likely to run into? What blockers and pitfalls are they likely to run into? How should the team be set up to maximise these benefits and make sure the team is set up for success?

Before answering these questions, let us first explore the tremendous potential of moving to a cloud platform and why cloud migration is increasing so rapidly.

Cost Reduction. Owning and maintaining your own Data Centre can be expensive. As well as the hardware refresh costs, there is the overhead of having to manage outages for software upgrades and physical fixes. Moving to the cloud offers the potential to manage and accurately predict your costs.

Flexibility. Perhaps one of the biggest motivators for migrating to a cloud platform is the flexibility and reliability it offers. Multiple server types, predefined machine images and the latest software versions are all within easy reach with reliability built into your service.

Scalability. Having the ability to gain that extra bit of compute power when you most need it (and then dropping back down again) is a major factor for cloud adoption.

Disaster Recover/Security. Data Centre based computing requires additional hardware and storage in an external location as part of a full and proper disaster recovery strategy. It also requires a mechanism for maintaining the data transfer to ensure (and prove) that no data has been lost. This is taken care of in the cloud as multiple copies are taken both availability zones and regions to ensure restoration can be done as quickly as possible.

Immediately useable (and useful) set of tools. Most of the big cloud providers offer a set of tools that can be utilised on the platform and can be extremely useful to getting your applications up and running very quickly. Everything from networking to Machine Learning tools can be used (at a price).


While moving to the cloud is clearly good from a business point of view, how does is affect the way data engineers have to work?


What is Data Engineering?

Data engineers are the designers, builders and managers of data pipelines. They develop the architecture, the processes and own the performance and data quality of the overall solution. To that end, they need to be specialists in architecting distributed systems and creating reliable pipelines, combining data sources and building and maintaining data stores.

The role has evolved in the last few years as software engineers have been required to learn more about data and traditional database engineers have been required to learn software engineering languages as businesses have moved away from enterprise warehouse solutions to distributed ‘big data’ pipelines.

Fig 1. Skills required by a Data Engineer

As such, data engineers require skills in a number of technical disciplines. These include scripting languages (such as LINUX and Python), object-oriented programming skills (particularly Java and Scala) and of course SQL and how the syntax varies between different applications. It also requires understanding of distributed systems, data ingestion and processing frameworks and storage engines. Experienced data engineers have knowledge of the strengths and weaknesses of each tool and what it is best used for. There is also a requirement to know the basics of DevOps, particularly when having to install new tools, running statistical experiments and implementing machine learning for Data Scientists.

So, with all that skills and knowledge, what is the problem? Surely migrating straight to the cloud is easy?

Well not quite. Despite a decent knowledge of operations, Data Engineers as NOT DevOps. They do not have the deep level of understanding of networks, VPCs, subnets, security and infrastructure languages (like Terraform). Plus, as more companies look to move to a multi-cloud strategy, the complexity of cloud account structures requires specialist knowledge. There is also an increasing requirement to help users navigate their way around the intricacies that occur from using multiple data mining tools. Analysts in the past have never had to worry about how big a cluster must be to make sure their query completes in a reasonable time, nor have they had to interpret a SQL failure message that reads like a java run-time error. The development feature teams often do not have the time to help with this so something else is required.

Data and Application Operations (DOPS & APO)

DOPS work with engineers and network teams and are responsible for the support of the managed data and shared cloud accounts. This includes VPCs, Subnets, Identity and Access Management, White and Black listing and ensuring account security is compliant with the governance team. They also support application deployments, are the point of contact for any infrastructure issue resolution and are the focal point for upgrades and patching (where required).

They also provide a vital service in helping to manage the cloud costs.  Developers (and testers for that matter) have a habit of launching clusters to serve their needs – in the development, testing and production environments – but then often forget to tear them down afterwards. This means the company can end up spending a fortune of virtual instances that are simply not being used. The DOPS team will monitor, alert and generally police this to make sure it does not occur.

The APO team are more focused on supporting the teams that are using the data. If you run an application that auto-scales a cluster based on the CPU utilisation needed for a query to complete, it is within the interests of the company to have someone who is an expert in query optimization, or the costs are going to spike with poorly written queries. That is where the APO team come in. They are experts in not only rewriting queries for speed but for teaching the users how to do this themselves. They also monitor query and table usage so that a deprecation program can be created on low usage tables, as well as directly supporting engineering teams with ‘proof of concepts’ for new external applications. With the rate of new products entering the data processing market, providing a service to evaluate new products is vital and ensures continued innovation within the engineering team.

So DOPS ensure the cloud infrastructure is supported across all the data teams and APO ensures that the applications and users are supported.

Perfect. Now you are all setup to fully utilize the power of the cloud. Or are you?

What about data quality and data availability?

As the business world starts to rely more on more on machine learning, the accuracy of the underlying data that ML models are trained on has become far more prevalent.

It is no longer acceptable to have ‘mostly’ good data; even the smallest amount of ‘bad’ data can cause inaccuracies in predictive analytics.

As data engineers, we bear the brunt of any criticism and rightly so – data scientists often bemoan the fact that much of their time is spent cleaning up data rather than producing the models they are trained to do. We are the first part of a long chain and the world of data engineering has to embrace this responsibility.

This is the usual timeline for a Production failure:

  1. Production Support are alerted to a failure in the middle of the night
  2. They apply a ‘Band-aid’ fix to get the application up and running again
  3. The next day they inform the development team who own the code to assess options
  4. The development team then plan the reprocessing of bad data to stop users from having to halt their work
  5. A permanent fix is suggested, estimated and then put on the backlog (often never to be seen again!)

The other issue with data quality is that feature development teams can spend multiple days within a sprint simply trying to get to the bottom of failures. This means that promised roadmap items get pushed further and further back, making the teams less efficient and causing frustration and mistrust from the stakeholders.

So, what can we do about it?

Step forward the Data Reliability Engineering team!

Data Reliability Engineering (DRE)

DRE is what you get when you treat data operations as a software engineering problem. Using the philosophy of SRE, Data Reliability Engineers are 20% operations and 80% developers. This is not about being a production support team – this is about being a talented and experienced development team that specialize in data pipelines across multiple technical disciplines.

The 6-step mission of DRE is:

  1. To apply engineering practices to identify and correct data pipeline failures
  2. To use specialist knowledge to analyse pipelines for weaknesses and potential failure points and fix them
  3. To determine better ways of coping with failures and increase automation of reprocessing functionality
  4. To work with pipeline developers to advise of potential DQ issues with new designs
  5. Utilize and contribute to Open Source DQ Software products
  6. Improve the ‘first to know rate’ for DQ issues

So, the DRE team own the failure, the fix and the message out to users. They can call in the feature team developers help if specialist knowledge is required but aim to handle in-house as much as possible thus freeing feature teams to continue with their roadmap.

One of the other main functions of DRE is to setup and own the data quality platform. Whether it is an open source DQ solution like Apache Griffin or Quibble or a set of Spark libraries like AWS DeeQu, the team must work out the best way of alerting about data issues and where possible, set up an ‘ETL circuit breaker’ within the job flow that will kill a job if there are any anomalies detected on key fields. This will stop incorrect data getting into the Production tables which is one of the things that so frustrates users. The DRE team are actively encouraged to contribute back to these open source projects with any improvements or modifications they have made.

But…does that mean the feature teams simply throw Data Quality responsibilities aside and over the fence to DRE? Certainly not! Each team still has a responsibility for their own pipeline and DQ should be a core element of the architecture and design. The DRE team work with both developers and Product teams to make sure that DQ is included in estimates and themselves are part of the sign off process for QA/UAT.

Is DRE the complete solution to all Data Quality problems? Unfortunately, not – bad data issues will always occur as edge cases for data in particular are so hard to predict. However, having a dedicated engineering team for DQ shines a light on issues, provides transparency to stakeholders and data consumers and builds trust between data engineering, data science and the analysts who are so dependant of correct data.

To sum up, migrating to the cloud can be very difficult for engineers and users alike, but if handled in the right way, can lead to multiple gains in flexibility, scalability and lower costing as well as creating an environment for exploratory analytics and accelerated innovation. This model for a team structure allows the feature teams time and space to provide the needs of the business, provides two operations teams to make sure the infrastructure and users are all supported and a capability team that helps build trust by improving data quality and availability.