How to handle flaky tests

5 min readApr 19, 2023

Automated functional/UI tests are a necessary evil in any project. I call it evil because reaching a stable state and maintaining it is a difficult feat to achieve.

As the size of a test suite increases, there are two main issues that arise:

time taken to the run the tests increases (due to the sheer volume)
flakiness in the tests increases

We will talk about the second problem, in this article. Let’s see what are flaky tests, their causes and remediation. We will also touch upon “what not to do”.

What are flaky tests?

Firstly, let’s understand what a flaky test is.

Flaky tests are non-deterministic tests in your test suite. They may be intermittently passing or failing for no apparent reason, making test results unreliable.

Flaky tests hinder development, slow down progress, hide design problems, and can cost a lot of money in the long run.

They also tend to make folks (especially leadership) lose faith in the automation suite, whose sole purpose is to give confidence for an error-free and reliable product.

Causes

There are many causes for flaky tests. A few generic ones are :

issue in newly-written code which starts breaking older functionality
issue in the test itself or badly written tests, eg. assertions based on current time, rather than fixed time
issues related to environment, eg. time taken for requests to complete in certain environments
issues in tests that are related to each other
faulty test data management, etc.

The “flaky tests problem” mostly starts appearing only after you have already added a considerable number of tests and you are knee deep into your project development. So, very few project teams have the luxury of halting the development completely to fix flaky test(s). If that were the case there would hardly be any such cases. So let’s face it, this will never happen!

Do’s and don’ts of managing flaky tests

I have listed down what to do and what not to do with flaky tests, based on my experience.

Don’t -> Ignore it or delay it

I can’t think about that right now. If I do I’ll go crazy. I’ll think about that tomorrow — Margaret Mitchell, Gone with the Wind

One common practice which tends to get followed in many projects is skipping or ignoring flaky tests, with a promise to look into them later, which never happens. Over time, the size of these inconsistent tests increases and causes bigger problems.

I saw a very interesting solution in one of my projects which was called a ‘Time Bomb’. To implement this, an annotation was added to the test with a date as a parameter. So, the test will ignored up till the mentioned date and start running after that. If not fixed yet, it would start failing again in CI— encouraging the team to fix it.

First I thought it was a really good idea, definitely better than skipping tests forever. But eventually I realised, although the tests skipped by the time bomb are not ignored indefinitely, the team can still get into a habit of procrastinating the issue by changing the date repeatedly.

Do -> Fix it

The other, the winner’s choice, is to fix the problem of flaky tests. In most cases, fixing them all at once will not work. Hence, slow and steady does it!

How to fix and/or prevent flaky tests?

Following are a few ways to manage and fix flaky tests as well as prevent them in future.

1. Divide and fix: Isolate tests

This is a method where the set of flaky tests can be identified and separated. This can be done by either adding a custom annotation, or using the label feature provided by many test tools, where the tests that are flaky can be marked as “flaky” and they can be run in a different stage or a pipeline in your CI process. After committing code, a developer needs to make sure that the non-flaky tests stage is green.
A tech-debt task can be played every sprint to handle the flaky tests one by one. The flaky test stage/pipeline is run for validation. Once a test is fixed, the “flaky” tag can be removed. Eventually, all flaky tests would move to the bigger/stable pile.

2. Use reliable locator Strategy

Using X or Y coordinates or XPath to identify specific elements can introduce unnecessary rigidity and can make the test unreliable. Not taking care of dynamic elements properly can also lead to flakiness. Hence, the locator strategy must be well thought out and standardised across the test suite to avoid flakiness.

3. Idempotent tests

Just like good unit tests, functional tests also need to be idempotent. They must be repeatable with the same results, with no dependencies on other tests. As compared to unit tests, it is hard to detect such dependency in case of functional tests, since the tests are end-to-end and larger. The dependency can be either due to sequence of the tests run, or incorrect test data management.

4. Better test data management

Sharing data across tests, hard coded values, incorrect initial state etc. are few indications of bad test data management. This are red flags for current or future flakiness in the the test suite and must be immediately addressed.

5. Test environment management

If a defined and consistent environment is available to run your tests, you can ensure that the results are accurate and reproducible. However, what happens when this stable environment starts to become unreliable? This is a million-dollar question. In such cases, using test stubs in the right places can help in reducing flakiness.

6. Using delays/sleep effectively

Many times we use sleep statements to make the test wait for a state change. This is one of the main causes of flakiness, since the time to wait if unpredictable. Wherever possible we must identify if a sleep is even needed or not. Using it judiciously is very important. And if they cannot be done without, it is better to replace them with the waitFor() function.

Good design and coding practices

Finally, test code should be treated like any other production level code. Using the right design patterns, reusable components, appropriate testing frameworks, reliable libraries etc. can ensure a reliable and maintainable test codebase. Test code, like any other code should be peer-reviewed to ensure code quality.

All these measures will give rise to a reliable test suites with very low percentage of flaky tests !

PS: This is not a complete list and I am sure that you will find some more causes and/or innovative ways to fix/prevent tests. Please let me know in comments.