How to Build a DBUnit Testing Framework

This blog post demonstrates how we created an automation test framework of a data layer part that uses MySQL communication. We’ll take a look at the performance and usability issues and also show how this approach was integrated into the project’s CI process.

Introduction

When development started on the new project, we planned to use the test-driven development (TDD) approach. We already had strategies to write different tests to cover most of the code parts, including business layer logic (BL), UI and API. However, we still didn't have any simple ideas to test the data layer logic (DLL). This is because the DL implements logic with communication between different data sources, like the database source (DB) or any external API. We already had our custom fast-performance ORM framework where almost all queries were to be implemented with plain SQL queries with parameters. This custom ORM framework makes it hard to use existing solutions for SQL query testing. Consequently, our key goal was to build a solution that allowed us to write tests with minimal development time that would improve project code quality.

Note: When we talk about unit testing for a DB in this article, we actually mean the integration tests, which have all the qualities inherent in unit tests.

Motivation

The TDD approach can improve the code quality of the entire project and reduce the time spent on the next code changes. With the expansion of a project comes the increased probability of making a mistake. Unit tests allow for almost all our code to be quickly checked after each change. In most cases, it is a simple task to write new unit tests for BL code. However, writing tests for DL code comes with a few difficulties. Before you start writing or implementing DL tests, you should follow these steps:

Configure the local DB instance
Set up correct connection strings to test the DB
Create all requirements of the DB structure that should be correct and align identically with dev or prod versions of the DB structure
Seed the DB with correct data for each test case

These steps make it hard to develop and maintain tests. In fact, many projects don’t cover the DL code parts at all.

Difficulties with DL code testing

Without the test framework, you can use the following approaches during DL method development.

Test query directly in DB

Actually, almost all DL methods are written with plain SQL. These SQL queries have parameter placeholders and can be used directly with CLI or any other external UI. A developer can use those tools and execute a query with various test cases. This approach is quick and simple but cannot cover all cases. Specifically, it doesn’t completely cover the communication part between SQL, ORM, and the application code. For example, the developer sets all the query parameters in SQL CLI but misses a few parameters in the initialisation in DL.

Call testing method directly

The developer should use any accessible endpoint of the code, call the testing method directly, and then somehow check the output. This approach covers all the required code lines, but it is also somewhat ‘old school’ and will not be acceptable by any code guidelines. The key issue of this approach is that the testing code cannot be used by other team members, and, in most cases, it also cannot be integrated with existing processes of the application. This can lead to any sort of mistake and potential security issues. For example, a developer may forget to delete the test method to output all data from DB, and actually, this method doesn’t have any permissions checks. One more issue with this approach is that the test method is not completely descriptive because it doesn’t contain information about the initial state of data. So, you need to reproduce it manually.

Test from application UI

This is the most common way to test the DL method, but it requires different manipulations and can be time consuming. Before this approach can be used, all data flow must be implemented and tested (UI > backend > BL > DL), and the tester should keep in mind any cases that prevent access to the DL code. For example, a cache or any additional checks can reject direct access to DL methods, and this prevents the target method from being tested. Also, it is hard to cover all the different cases when testing the DL method.

All these examples and approaches presented above make the DL testing process time consuming. According to our previous project bug tracking statistics, around 5% of bugs are related to DL. However, it’s worth mentioning that many issues related to SQL syntax errors or simple bugs of SQL queries are addressed individually and may not be registered in the bug tracker.

Unit test key features

One unit test must cover one test scenario.
The unit test must not depend on time or any random or dynamic data.
The test must be atomic.
The amount of time for test execution must be small.

Data layer test process

During the planning session and after some investigation, we decided to use the following test process.

Initialisation—connect to the DB server instance, load all initialisation scripts, and create a new DB with the required structure.
Test execution—execute the unit test method with the Arrange-Act-Assert (AA) approach.
Shutdown—finalise all used resources, like removing the test DB and closing all active connections.

The scheme of the test process

This test process covers all unit test features and requirements. Each test method is executed in an isolated DB, so the tests are atomic and can be executed in parallel. Let’s check the performance of this process and solution.

Testing of test framework

Performance

This approach is simple and smart, but it does not work with huge projects that already have massive and complex DB structures with many tables populated with initial data. Therefore, the time required for DB initialisation for each test method would be substantial.

The following table presents the different steps of test initialisation:

Initialisation step	Time, ms	%
Script file reading	6	< 1%
Script preprocessing	19	< 1%
Script parsing	211	1%
Script command execution	14660	98%
Total	14896

As you can see, initialising one DB test requires approximately 15 seconds. The projects can contain many tests (e.g., 100 or 1000), and the total time to execute all tests can take from 1.5 hours to one test run, and this is not acceptable. Thus, this approach does not meet the unit test feature requirement provided earlier in this post.

We spent some time analysing all the test initialisation code of the test framework, bound four steps, and measured the time of each individually. The most time was spent on the initialisation query execution in the DB.

To reduce this time, we planned to create additional functionality in the test framework. This new function was DB structure reduction. The developer should specify which tables must be used by the test method and then test the framework by executing queries related only to those tables. Additional dependent tables were also created, and these were related to foreign key indexes.

After that, we improved all the tests and measured the time required for the initialisation steps. The results are shown in the following table:

Initialisation step	Time, ms	%
Script file reading	6	1%
Script preprocessing	22	5%
Script parsing	254	62%
Script command execution	134	32%
Total	416

This table shows that the time spent on test execution was reduced by 97%. The new results indicate that some steps can still be improved later on. For now, this time for test initialisation meets our internal requirements.

Test data generation framework

When we were done with the test framework performance issues, we started to think about how we could improve the test code and make it simpler for developers. Actually, the process of creating the test data was complex, and the result was hard to read and maintain. Take a look at the following code:

[TestMethod]
public void GetInboxMessages_ShouldReturnInboxMessages()
{
       const int validRecipient = 1;
       const int wrongRecipient = 2;
	
       var recipients = new [] { validRecipient };
	
       var message = new MessageEntity();
       HelperDataProvider.Insert(message);
	
       var validInboxMessage = new InboxMessageEntity()
       {
       	      MessageId = message.MessageId,
              RecipientId = validRecipient
       };
       var wrongInboxMessage = new InboxMessageEntity()
       {
              MessageId = message.MessageId,
              RecipientId = wrongRecipient
       };
	
       HelperDataProvider.Insert(validInboxMessage);
       HelperDataProvider.Insert(wrongInboxMessage);
	
       var collection = _target.GetInboxMessages(recipients);
       Assert.AreEqual(1, collection.Count);
       Assert.IsNotNull(collection.FirstOrDefault(x => x.Id == validInboxMessage.Id));
       Assert.IsNull(collection.FirstOrDefault(x => x.Id == wrongInboxMessage.Id));
}

This test method was written with the AAA approach. The code presents just a simple case, but most of the time we have more complex test data setups for DL test methods. When the new data generation sub-system was done, we were able to rewrite the code as follows:

[TestMethod]
public void GetInboxMessages_ShouldReturnInboxMessages()
{
       const int validRecipient = 1;
       const int wrongRecipient = 2;
       const int recipientsCount = 2;
       const int messagesCount = 3;
       var recipients = new [] { validRecipient };
	// Arrange 
	// Create test data builder 
       DataFactory.CreateBuilder<InboxMessageEntity>()
		// Setup relations between different entities in DB 
              .UseForeignKeyRule(InboxMessageEntity inboxEntity => inboxEntity.MessageId, MessageEntity messageEntity => messageEntity.MessageId)
		// Setup possible values for RecipientID field of entity 
              .UseEnumerableRule(inboxEntity => inboxEntity.RecipientId, new[] { validRecipient, wrongRecipient })
		// Specify group provider for data building
              .SetDefaultGroup(new FixedGroupProvider(recipientsCount))
		// Generate and insert entities to DB 
              .CreateMany(messagesCount * recipientsCount)
              .InsertAll();
	// Act 
       var collection = _target.GetInboxMessages(recipients);
	// Assert 
       Assert.AreEqual(messagesCount, collection.Count);
       Assert.IsTrue(collection.All(inboxMessage => inboxMessage.RecipientId == validRecipient));
}

This version specifies how entities should be created and which values can be used. Also, this sub-system provides functionality to group entities and rules to implement N:N and N:1 relations between entities. For now, we have implemented the following rules, which cover almost all our cases:

DataSetterRule—allows for any entity field to be set with a static value
EnumerableDataRule—sets the value of the entity field from the list of allowed values, and each new entity generated by factory uses the next value on the list
RandomDataRule—sets the random value for a specified field of the entity; this rule is very useful when a huge data set must be created to check the performance of the test method
UniqueDataRule—generates a unique value for a specified entity field
ForeignKeyRule—allows for a new sub-entity of another type to be generated and maps those entities; usage of this rule with different entity groups allows for N:N and N:1 relations to be generated.

How works the data generator

Integration in development process

When the teams were learning to work with the test framework and had started to use it for most parts of the project, we began to think about how it could be integrated into our CI process. In fact, it would be too resource-intensive to run tests for every code push, since complete tests run still spent a lot of time. So, we planned to test it nightly, on every merge, to develop it and after each deployment in the test environment.

We added a new step in our build process that deploys tests with all initial scripts on a special test server and then executes PowerShell script. This script runs the test project with MSTestAgent and returns the results in XML format. Then, the script reads all the required information and sends notifications with short chunks of information to chat and detailed information to the team’s email addresses.

Conclusion

During the development of this approach, we faced different issues, two of which were the performance and complexity of the test data seeding for the developers. We resolved all of these issues, and our project is now fully covered with unit tests, so we can deliver on our customer’s code with more, high-quality work.

Alexey Mulyukin, Senior Software Engineer