What can you do to avoid this in future? Where is the process broken, and how can it be improved?
No magic tool
First, the bad news: There’s no ‘one size fits all’ solution. The problem you are having for your product is different to the one I am having for my product; you may find a way to avoid the same problem from happening again, but you are going to have a different problem tomorrow.
So what can you do? You have to be sure you have the right checkpoints and alerts in place so you can test soon and fix faster.
Exercise your code
Tests should exercise your code, looking for both expected success and expected failures, to cover all of the boundary conditions.
One of the mistakes that myself and my team have made in the past is that we only tested when we wanted to release a new version of our apps live: Our technical landscape is complex and yours most likely is as well, so it can happen when a ring of the chain is broken and when we don’t realise before our customers; this is the reason why we need to be able to verify on a daily basis (and sometimes more often) to ensure that the production versions of our apps are working fine.
We have a number of tools/processes that help us to achieve this goal:
- Automated monitoring alerts: If one of the KPIs we identified as essential to help evaluate the health status of our apps is above/below a specific threshold, the Dev and Ops team automatically receives an alert via various channels (Slack, email, etc.). We have two different threshold levels, orange and red (red meaning ‘take immediate action’) and everybody knows what to do if we receive one of those alerts.
- Daily sanity checks: Our QA team runs a set of standard tests to ensure the apps are working as expected on a daily basis. Those sets include analysis like memory usage, CPU usage, simulated usage for a specific amount of time, and automated testing against a mock server – this allows us to verify specific edge cases – and manual sanity check – to define this a simple smoke test.
- Weekly automation tests: Fragmentation is a challenge on the Android platform, and we have a very differentiated device base on both platforms that force us to maintain support for old versions of operating systems. To manage this, we are running all our automation tests using a third-party test farm where we can use hundreds of devices with different configurations for OS, network connection, device type, etc. The final report is analysed at the start of the week to identify any major trends that could potentially be an issue that is yet to be detected.
- Pre-release OS versions: As developers, we know that a new version of an operating system can depreciate or change something we use or rely on in our code. This is why we run our tests on any ‘developer preview’ version of the OS that we support. In doing so, we may not only catch a problem way before our customers, but we can also verify that app performances are not changing between different versions of the same operating system.
Furthermore, this can work to check the status of production versions of our apps. But what about the tools, frameworks and best practices we use during the ‘delivery’ of changes, updates or new features? We do not do anything fancy here other than rely on standard best practices and tools that are well known and mature, such as TDD, BDD and unit tests/automation.
More than often, I see people discussing (sometimes fiercely) about TDD and BDD: Some people say that TDD is the way to go; others believe BDD is the only thing that should be implemented.
My view is that you need a good amount of unit tests: Those tools are complementary, and if your team is able to use them for the right scope, they will prove invaluable.
Now let’s start with unit tests. As a quick search on Wikipedia will tell you, unit tests focus on a ‘single unit of code’ that is a function of an object or module. This highlights the basic characteristics (and the resulting scope) of a unit test:
- It is specific to a very small piece of code
- It should be simple and quick to write
- It should be quick to run
- It should be isolated from dependencies (a.k.a. network operations, database access, etc.).
Now the implicit simplicity of a unit test is the reason they are so powerful in protecting code from an ‘unintended consequences’ of change: If you have a set of unit tests verifying your code works, then you can safely change your code and your programme should not break. If it is not easy to write a unit test then most probably it is not a unit test and you should reduce its scope.
TDD stands for ‘Test-Driven-Development’ and the process to write/execute a test should include the following steps:
- Start writing a test
- Run the test and any other tests that already exist – the new test you added should fail
- Write only the code that would allow the test to pass
- Run the test again and any other test that already exists (all tests should pass)
- If needed refactor your code
- Go back to point 1 and iterate
This requires great ability from engineers, because they will have to write the tests before writing any code, so it is quite a dramatic shift in mindset. Despite this, the reward can be huge – especially if test coverage goes up from 90 to 100%. The fact that you end up with a very extensive coverage is the power of TDD: Once you implement TDD, you will be able to trust that your code works and the changes you apply won’t break the rest of the code in your programme.
Finally BDD, which gives many (including myself) a headache: BDD is a set of best practices that, when used with unit tests and TDD, can give great results.
BDD helps to shift the focus from implementation to behaviour, and, to do this, we need to spend some time thinking about what the scenario is. This is facilitated by the fact that usually, you write a BDD test with something like ‘it should do something’. If the requirements change, as long as the scenario does not, you should not modify your test.
You should use unit tests, TDD and BDD together, and in combination, because:
- Unit tests give you what to test
- TDD gives you when to test
- BDD gives you how to test
Manual testing: The machines can’t do it
When I see my QA testers following a rigid Scenario->Case->Steps schema while manually testing our products I shudder: They should not spend their time on that. This is food for a machine; for fully automated testing that we can run on hundreds of devices in one night. I usually define my fellow QA colleagues as our ‘last line of defence before a disaster strikes’: I need their brain, skills and experiences, not a robot that repeats tasks until its energy runs out. A manual testing session can focus on specific areas, which allow testers to find more defects and use their skills at their best. You cannot define all the possible ways a human can use a product, especially a mobile app – or maybe you can, but it would be terribly expensive and time-consuming. Get the basic scenarios in order, focus on the path that is most relevant for your customers/users, automate that path and its scenarios, and then ask your QA team to spend their time thinking about the product freeing them from the repetitive tasks. Their brain and their judgment is something that a computer doesn’t have (yet)!
Functional vs non-functional
I won’t go into the details of why you should care about functional and non-functional testing, because you should! They are both incredibly important and no product should be considered ready for release if it has not been tested in one of those two ‘areas’. I often see engineers, testers, product owners, stakeholders and so on debating fiercely about those two ‘areas’. If the person speaking is an engineer, an architect, or someone deeply involved in technical discussions then they will highlight how fundamental the non-functional tests are.
If a PO or stakeholder is speaking, they will ask you to be sure that the product you want to release is aligned with the business requirements and objectives. Which one should you start with? Which one is more important? Well, that is something like the ‘customer dilemma’ well known and understood in the world of customer rewards. For example, you get customers who purchase more expensive airline tickets to collect miles for future vacations, fashionistas that get 30% off coupon that arrives in his/her mailbox sent by the nearby boutique, and those who get a free cup of coffee offered in response to the ten they have already paid for and purchased.
The world of rewards usually serves two ends: To build customer loyalty and to entice new business. But is one more important than the other? And to whom should firms offer a better deal?
We care about our customers, the way they use and engage with our products, as well as (hopefully) re-engage. A first time customer will probably be attracted by the look and feel, the navigation, and the content offered by your apps. But the second time the use the apps, the app will have to be fast (the emotion of discovering your product features is gone at that point), reliable, and the user flow should be simple. Usability will also play a big role, as well as security.
When I hear that functional testing ensures there are no major bugs and that non-functional testing stands up to customer expectations, I wonder if we are just considering our customers’ trained animals, which hopefully will behave correctly so we won’t have a problem with our apps! This should not a problem with how your app stores user data (security, so non-functional) but what about bugs? And what if an app slows down because the API we designed to communicate with the backend is not performing at its best (again, non-functional)?
Those two tests are both extremely important and should have equal importance within your testing strategy. We want new and existing customers to be happy, as well as stakeholders, engineers, architects and the company CTO, because our apps are designed, implemented and tested to respect the high level of non-functional requirements that are set for the entire company.
So next time you design a feature, ask yourself: Am I considering the two different tests? Do they have equal importance in my testing plan? Am I considering customer satisfaction, APIs and product reliability?
Set the baseline
Now that we understand that functional and non-functional tests are two equal citizens we can focus on non-functional testing.
I remember a time when I was working for a major fashion company and we released a new feature to present an image gallery in one of our apps that apparently created a problem that some stakeholders reported: The app was not just slow, it was painfully slow. I remember how the PO went into panic mode asking everyone to “fully focus on the issue right now!”
We rushed to our keyboards; typed as fast as we could to test the feature, monitored the status in production to help understand what went wrong between our tests, and looked at when we released it into production. It turned out after 30/40 minutes that we tested the features using JPG images, which were sized between 500Kb and 1024Kb, the design team decided to publish ‘better-looking images’ the night before via the CMS that was powering the apps. The images were super sharp, beautiful, and showcased the products to their best. Unfortunately, their size was less beautiful and was in the range between 3 and 5Mb each. That, for obvious reasons, slowed down the entire gallery that had to load between 5 and 8 images every time.
To triage this issue, we spent extremely valuable time with our engineers and testers, as well as a very stressed PO. But what could we have done to prevent that?
Enter the magic world of defining a ‘baseline’: In performance testing it is important to understand what the current status is, so it is possible to verify at any given time if a change is affecting the product performances.
If we had defined a baseline for the gallery feature we would have verified immediately that there was nothing in the app that was responsible for the degraded performances.
The baseline alone won’t be enough: You have to check that your app performances are in a predefined range every time you apply a change in DevOps or every day in your production environment. Mobile apps are typically at the end of a long chain of systems and each one of them could, potentially, affect how the apps work. Using the predefined baseline as a way to compare the daily status of the chain will help identify any issue sooner, and reduce the impact it could have on your customers’. My team, for example, is able to check the performances every single day (we usually do this in the morning, but nothing would stop us running the same tests multiple times a day) so we can be sure to investigate and, if need be, fix the problem as soon as possible.
If we need to introduce a new feature that has no impact on the performance (a simple UI change, for example) of our apps then we only run a regression test. On the opposite, any feature that may affect the performance of the app must be verified carefully.
One of the best practices that are generating good results is to have the QA team to actively participate in design, architecture and implementation discussions: They will be able to highlight any major flaws with the visual and architectural designs before anyone spends any time writing code. This, in turn, will reduce the development cycle and helps to achieve delivery goals.
Every release is different, simply because in every release cycle we add a specific feature or refactor specific areas of the apps. When we work with our product owner to prioritise the stories for the next sprint we also prioritise our testing effort, focusing on those parts of the app that are touched by the changes we agreed on with our stakeholders.
Prioritise your bugs
We use the analytics tools to understand the impact of a bug, using two kinds of information – crashes and usage data.
If the app crashes we always want to fix the problem as soon as possible, but, sometimes, it is not possible to do this straight away; so we assess the severity of the issue by checking how many customers/devices are affected by the crash. Similarly, if the bug is not causing the app to crash then we verify usage data: Is the bug appearing in an area that is often used by our customer? The priority of the fix will be high and the opposite if the bug is happening in a view that is rarely used by our customers.
Keep some sanity
When you go into production do not forget to run your post-release process: Once a release is live, run a sanity check on the apps using production accounts so you know that your customers can see what it is expected and that the apps are not crashing on the multitude of devices supported. You should also ensure third-party systems that interact with apps are correctly integrated.
The ghosts will haunt you
Maintaining product quality while delivering new features and changes is a difficult task: QA testers must break traditional working practices, develop new skills, be interested in software design and development, and be involved in many different stages of the delivery process.
Do it right: The quality of your products will go up and the time you need to develop will shorten.
Do it wrong: The ghosts of your quality process will haunt you at some point.
Witten by Emilio Vacco, Mobile and Third-Party Products Director at The Telegraph