Six Questions for a Quick & Easy Root Cause Analysis

Posted by Hollis Hazel on March 8, 2022

What’s a Root Cause Analysis?

When an issue is found at a customer, my team first solves the issue. Then we take a closer look at how the issue occurred to see how we can prevent the same thing from happening again.

How do I do one?

A Root Cause Analysis is otherwise known as a bug retrospective. There are many models and frameworks available, like the Five Why’s or Fishbone Diagrams. But at its core, a Root Cause Analysis is very straightforward. You just want to know two things:

  1. What happened?
  2. How can we prevent this from happening again?

To help you get started, I want to share my guide to a Quick and Easy Root Cause Analysis.

Root Cause Analysis: The Quick & Easy Method

Simply sit down with a couple of team mates, and answer the following six questions. It’s that easy!

A: What Happened?

  1. What was the timeline?
  2. How many customers did this affect? How many reported it?
  3. In which issue or story did you introduce the bug?
  4. What was the Cause/Root Cause?

B: How can we prevent this from happening again?

  1. What can we do now so that if it happens again we’d spot it immediately?
  2. What could we do differently next time to prevent this from happening at all?

Quick & Easy RCA: An Example

Here’s an example that I did with my team recently. Let me walk you through the six questions.

A: What Happened?

1. What was the timeline?

When was the bug noticed, reported, picked up, fixed? Who was involved?

In our case, one of our mobile features was malfunctioning. This was reported by three large customers over a couple of days. Two days later we had picked it up and a week and a half later the fix was live.

2. How many customers did this affect? How many reported it?

How many incidents or other internal issues were registered? How many customers had the problem?

We had only three incidents, but many more customers were affected. Either they didn’t notice, or they did but didn’t report it.

3. In which issue or story did you introduce the bug?

Was it something we expected? Why or Why Not?

We narrowed it down to a few potential stories in which we changed the backend interaction with the database. We didn’t expect these components to interact like this.

4. What was the Cause / Root Cause?

What went wrong? In which component? Note that we’re talking about a technical cause, (lack of) knowledge/experience. The goal is NOT to blame someone specifically.

In our case the malfunction was due to a database call hanging. Subsequent requests done in the backend afterwards failed. The broken feature didn’t make any database queries itself. That’s why we didn’t notice the malfunction at the time.

B: How can we prevent this from happening again?

5. What can we do now so that if it happens again we’d spot it immediately?

You could add logging or metrics to try to spot your bug sooner, or perhaps add it to your list of error-prone components for manual testing.

We created extra low-level automated tests, to try to catch similar issues should they occur.

6. What could we do differently next time to prevent this from happening at all?

Can we change our process to better support us? How can we share/reuse our new knowledge?

We concluded that because the buggy feature is critical for many of our customers, we would cover it with an end to end test.

Conclusion

Answering theses six questions took us less than half an hour. We can use the insight we gain during RCA’s to sharpen our bug prevention and up our quality. If you want to do the same thing at your team, I’d be curious to learn how it went!

Testing Accessibility with Accessibility Insights

Posted by Anna Maier on December 9, 2019

There are many tools out there that help you check if your website or webapp is accessible. Most of them do an automatic check based on some accessibility guidelines. Some also provide functionality to do checks yourself, for example, to check the color contrast. The open source tool Accessibility Insights takes a different approach: on top of the usual automated checks there is a set of guided manual checks. This makes it a great tool to learn about accessibility testing and programming.

Read more

Five Free Web-based Tools for Exploratory Testing API Responses

Posted by Hollis Hazel on August 7, 2017

How to test whether your API can handle anomalous HTTP responses.

Today I was exploratory testing my Service, which performs sequential HTTP requests that depend on the response of the previous request. I wanted to find out how our API would handle a variety of HTTP responses. I also wanted to see what would happen if things went wrong. What if the response contains an image? Or an error code? Or what if there was a timeout?

To answer these questions as quickly and easily as possible I ventured on to the web. With some effort, I found what I was looking for, and on the way I discovered a few tools that are just great for exploratory testing your API’s response handling. If you test or debug API’s on a regular basis, here are a few free tools you’ll definitely want to check out.

 

Read more

Google Test Automation Conference 2016 – When top players talk about Automated Testing

Posted by Tobias Spöcker on December 5, 2016

golden-gate-foggy

This image perfectly pictures my first feelings about the conference, but let me go into a bit more detail first.
I signed up for the Google Testing blog more than a year ago. There I found a lot of interesting and useful reading about the world of automated testing. When I later got an email from Google informing me that there would be a conference held by them, I was not entirely sure wether I should apply. Is it relevant for me? Am I experienced enough to contribute? Well, what’s the worst that could happen? So I applied for it and did not regret it in the end.

Read more