Optimizing Test Execution Runtime in GitLab’s Codebase

GitLab’s extensive test suite is essential to maintaining the platform’s robustness and reliability. With thousands of tests running across various components, ensuring these tests execute efficiently is important to sustaining development momentum. Optimizing test execution accelerates development cycles, makes the continuous CI/CD pipeline more efficient, and helps to save costs in the long run.

This is why merge requests that optimize test performance are highly valued by the GitLab maintainers.

In this article, we want to share experiences and actionable strategies to identify and enhance slow tests, with a particular emphasis on reducing factory usage.

Personal note: This kind of GitLab contributions are fun to work on because you can measure and see the performance improvement directly. 😀

Step 1: Identifying Slow Tests

There are many ways to identify slow tests in GitLab’s codebase.

If you are specifically looking for slow tests, the GitLab team created the project “GitLab RSpec Profiling Statistics”. The project includes a simple CI job that generates basic statistics about the most expensive RSpec tests on a daily basis. The latest report is published at
https://gitlab-org.gitlab.io/rspec_profiling_stats.

Rspec profiling statistics report

However, it is more likely that you will stumble across slow tests when working on another contribution. In these cases, we suggest making a reminder note to tackle this slow test later in a separate merge request in order to stay focused on the current task.

Either way, in the end, it is important to keep track of these todos in GitLab issues. So, we recommend either finding an existing or creating a new GitLab issue related to the slow test file. 👍

Step 2: Analyzing Test Performance Metrics

Once you find a slow test, you can start analyzing its performance metrics to identify bottlenecks and optimization opportunities.

The GitLab documentation includes testing best practices related to test slowness that we found to be quite helpful, e.g. rspec-stackprof, test-prof, etc.

In my experience, factory usage is one of the most common causes of slow tests in the GitLab codebase. Therefore, I usually start by looking at the factory usage in the test file. With FPROF, it is possible to identify the factory calls that are consuming the most time. All you need to do is set the FPROF=1 environment variable when running the test.

FPROF=1 bundle exec rspec spec/requests/api/conan/v1/instance_packages_spec.rb

The console output will show a table looking like this:

Finished in 12 minutes 15 seconds (files took 1 minute 3.87 seconds to load)
431 examples, 0 failures

Randomized with seed 10477

[TEST PROF INFO] Time spent in factories: 05:59.008 (46.08% of total time)
[TEST PROF INFO] Factories usage

Total: 8374
Total top-level: 470
Total time: 05:59.008 (out of 13:11.446)
Total uniq factories: 19

name                    total   top-level     total time      time per call      top-level time

conan_file_metadatum     2185           0       12.5916s            0.0058s             0.0000s
conan_package_file       2185           0       42.1722s            0.0193s             0.0000s
organization              881           0       11.0230s            0.0125s             0.0000s
project                   878           1      301.1067s            0.3429s             0.1532s
namespace                 877           0      100.1130s            0.1142s             0.0000s
conan_package             874         437      503.3127s            0.5759s           352.0380s
conan_metadatum           437           0      153.5347s            0.3513s             0.0000s
ci_pipeline                12           0        4.7466s            0.3956s             0.0000s
ci_build                   12          12        5.3147s            0.4429s             5.3147s
nuget_package               8           8        0.2553s            0.0319s             0.2553s
package_file                8           0        0.1898s            0.0237s             0.0000s
user                        4           1        0.6629s            0.1657s             0.0471s
personal_access_token       3           3        0.8775s            0.2925s             0.8775s
license                     3           3        0.0274s            0.0091s             0.0274s
ip_restriction              2           2        0.0126s            0.0063s             0.0126s
deploy_token                2           2        0.0292s            0.0146s             0.0292s
group                       1           1        0.2536s            0.2536s             0.2536s
namespace_settings          1           0        0.0279s            0.0279s             0.0000s
namespace_ci_cd_settings    1           0        0.0042s            0.0042s             0.0000s

On first glance, it might be overwhelming, but bear with me; we are going to dissect this in a second.

The first test prof info shows that 46% of the test execution time is spent in factories, i.e. [TEST PROF INFO] Time spent in factories: 05:59.008 (46.08% of total time). In most cases, this is too much time and gives us a hint that this test file has potential for improvement.

Looking at the table, we can see all factories that:

We can see that the factory create(:conan_package) alone is called 437 times in an explicit way (directly in the test file) and therefore consumes a lot of time (~500 seconds). The reason for this high number of calls might be that the factory conan_package is called for each test case. After digging deeper into the test code, we can confirm that the factory is indeed called for each test case through the let(:package) { create(:conan_package) } statement.

This makes a good starting point for optimization. 👍

Step 3: Implementing Optimization Strategies

Once you have identified the bottlenecks, you can start implementing optimization strategies to improve test performance. In GitLab’s testing best practice guide, there are several strategies to address slow tests, such as:

In the context of this article, we will focus on the optimization strategy of reducing factory usage by utilizing let_it_be and memoizing factory calls.

Utilizing let_it_be for Shared Test Data

let_it_be allows for shared test data that persists across multiple tests within a suite, reducing redundant setup operations. This means that we can minimize the number of explicit (and implicit) factory calls, thus speeding up test execution.

In the previous step, we identified that the usage of the factory conan_package could be improved. By replacing let(:package) with let_it_be_with_reload(:package), the factory conan_package is called only once and the created conan package is memoized across all test cases. You can follow the change in more detail here and here.

let_it_be_with_reload(:package) { create(:conan_package, project: project) }

We rerun the test with FPROF=1 to ensure that the factory call count was reduced and the test execution time is reduced. 👍

As you can imagine, it is not always as easy as replacing let with let_it_be. The GitLab testing best practice guide has more suggestions regarding let blocks. In some cases, it is necessary to restructure and reorder the factory call hierarchy. In other cases, it is necessary to use let_it_be_with_reload when there are test cases that modify the created conan package.

After applying the same optimization approach for other factories (e.g. project), we can repeat the profiling test run and notice a significant reduction in test execution time, see below.

Finished in 2 minutes 46.4 seconds (files took 22.01 seconds to load)
431 examples, 0 failures

Randomized with seed 2712

[TEST PROF INFO] Time spent in factories: 00:06.531 (3.8% of total time)
[TEST PROF INFO] Factories usage

Total: 232
Total top-level: 40
Total time: 00:06.531 (out of 03:01.824)
Total uniq factories: 20

name                    total   top-level     total time      time per call      top-level time

conan_file_metadatum       55           0        0.1971s            0.0036s             0.0000s
conan_package_file         55           0        0.7538s            0.0137s             0.0000s
conan_package              22          11        6.4576s            0.2935s             3.6878s
organization               19           0        0.1776s            0.0093s             0.0000s
project                    16           3        4.3384s            0.2712s             1.3254s
namespace                  15           0        1.7276s            0.1152s             0.0000s
conan_metadatum            11           0        2.8027s            0.2548s             0.0000s
nuget_package               8           8        0.2086s            0.0261s             0.2086s
package_file                8           0        0.1290s            0.0161s             0.0000s
user                        4           3        0.2787s            0.0697s             0.2546s
license                     3           3        0.0218s            0.0073s             0.0218s
personal_access_token       3           3        0.0689s            0.0230s             0.0689s
ip_restriction              2           2        0.0110s            0.0055s             0.0110s
ci_pipeline                 2           0        0.6070s            0.3035s             0.0000s
project_deploy_token        2           2        0.0150s            0.0075s             0.0150s
deploy_token                2           2        0.0207s            0.0104s             0.0207s
ci_build                    2           2        0.7325s            0.3663s             0.7325s
namespace_settings          1           0        0.0185s            0.0185s             0.0000s
namespace_ci_cd_settings    1           0        0.0043s            0.0043s             0.0000s
group                       1           1        0.1852s            0.1852s             0.1852s

Now, the factory conan_package is called only 16 times compared to 874 times before. And the time spent in factories has been reduced to 3.8% of the total time compared to 46.08% before. 🎉

Metric Before Optimization After Optimization
Total Test Time 13m:11s 03m:01s
Time Spent in Factories 05m:59s 00m:06s
Relative Time Spent in Factories 46.08% 3.8%
Factory Calls for conan_package (total) 874 22
Factory Calls for conan_package (top-level) 437 11

For more details on all changes, please refer to the merge request.

Step 4: Creating a Merge Request

After implementing the optimization strategies, it is time to create a merge request.

To avoid unnecessary (and lengthy) review cycles, we suggest including the following information in the merge request description from the beginning:

When you keep the merge request focused on the test optimization, the review process will likely be smooth, and the merge request will be merged quickly. 🚀

HAPPY CONTRIBUTING! 🎉

Case Studies

Here are more example merge requests that optimize test execution time in GitLab’s codebase:

Conclusion

Optimizing test execution time in GitLab’s codebase is a continuous process that benefits the GitLab community and is a welcomed contribution. By focusing on strategies like reducing factory usage and leveraging let_it_be, contributors can make meaningful improvements that enhance the efficiency and reliability of the CI/CD pipeline. As you contribute to GitLab, keep these optimization techniques in mind to ensure your tests are both fast and reliable, ultimately driving faster, more efficient development cycles.

B310 Digital GmbH, c/o FLEET7, Fleethörn 7, 24103 Kiel hi@b310.de