Posted November 02, 2011. http://thoughts.karmazilla.net/2011/11/02/so-you-want-to-start-doing-td.html

So you want to start doing TDD

I have been practicing TDD for about three years now. The process of starting up on this discipline is fairly fresh in my memory, so I decided to write this post to help those who are just about to go through the same thing.

TDD is one of those things that are easy to learn but difficult to master. It is fairly easy to start going through the TDD cycle and produce a lot of tests. It does not take you many months to get the hang of that part. However, you will soon learn that TDD is not just writing a lot of tests. The tests are part of your code base too. They need to be maintained as well, must be just as well written as the product code. Learning how to write good and maintainable tests takes time and practice. The purpose of this blog post is in part to help you speed this process up a little bit, so you won’t have time to write quite as many crappy, unmaintainable tests. Take a look at this informal non-scientific illustration. It shows the number of tests I write and their quality, as I become increasingly proficient at writing code with TDD:

The chart shows that as you start out on TDD, you quickly end up writing a lot of tests. That part of TDD is easy to learn. However, there are problem here: The tests are hard to write, so they slow you down. They are brittle, so your code becomes harder to change. They are hard to read, so maintaining them slows you down too. And they are slow to run, increasing your build times. You have reached the Chasm of the Many Crappy Tests, but don’t worry. If you stick to it, keep learning and don’t give up, then you will eventually cross it and reach TDD nirvana (or at least it won’t suck so bad anymore).

Roy Osherove says that good tests have three interlocking properties: They are readable, trustworthy and maintainable. A fourth property, fast, is sometimes added to the mix, but it is merely a useful property, not an essential one.

Readability

A readable test is one that reveals its purpose or reason for being. Essentially what the test is testing. A lot of readability can relatively easily be bought simply by giving the test a proper name. If you are testing a queue, for instance, then don’t have a test called “testPoll1” — this is a silly name that reveals little other than the “poll” method might be called somewhere. Instead, name your tests after useful observable behaviour that the code must exhibit. For instance, “itemPushedOntoEmptyQueueMustBePollable” is a name that describes some useful observable behaviour about queues: you must be able to poll an item from a queue, that has been pushed onto the queue.

I consider the names of the tests to be part of the informal specification of the behaviour of the unit. When I have to implement a new class, I often start in its test case by writing to-do comments with names for each of the tests I want to write. I use these names as a sort of up-front design for the unit — an initial draft of the specification for the unit in question.

When your tests are named after useful observable behaviour, you naturally end up only testing for one thing in each test. It is acceptable to have more than one assertion, as long as you only assert on one thing; most likely a single object.

Finding the correct balance between having set-up code inside the tests, or factory methods or a dedicated set-up method, is also an important element of readability. You want to reduce the amount of code in the tests, but you also want it to be plainly clear what the test is doing. It is easy to get into the pitfall of hiding too many details in set-up or factory methods, so a reader have to hunt these methods down if he want to make sense of your test. The Don’t Repeat Yourself principle, DRY for short, is often hammered pretty hard into the brains of good programmers. However, it is perfectly acceptable for test code to be a little “humid.”

Trust

A trustworthy test is one that deterministically fails or passes. Tests that depend on your machine being configured properly, or any other kind of external variable, are not trustworthy, because you don’t know if failure means that your machine is misconfigured, or if the code is buggy.

These tests that depend on external variables are integration tests, and they should put in a separate project along with some documentation on how to get them up and running. Then their slow run times also won’t affect your normal build times.

An external variable is anything that you don’t have complete control over: Files, databases, time, 3rd party code. With regards to time, just create your DateTime instances with a fixed instant rather than the fleeting “now” and call it a day. In a unit test, you want to use the same exact test data every time, but if your test depend on the value of “now,” then you will effectively get a different test every time you run it. As for threads, Roy says these are external variables as well, and I guess they technically are, but I’m still not convinced that they are integration tests (then again, I might be special in that regard).

Maintainability

A maintainable test is one that does not easily break when you maintain it. Loose coupling is probably the single biggest contributor to maintainability. Factory methods decouple you from constructors, that tend to have their parameter lists changed more often than other methods. Only test a unit through its public API, and “protected” is effectively public. When you do this, you tend to be more decoupled from the implementation details of that unit. Abstractions can still leak out, though, but this is a design challenge that you should tackle in your product code.

Avoid putting logic in your test code; things like if-statements, loops and switch-case statements. Where there is logic, there is a potential for bugs, and you don’t want bugs in your tests — especially not long lasting ones. Also avoid magic numbers; you want to be able to tell why a certain value is passed in as a parameter, or why a certain value is returned from a method. Don’t calculate the expected value, because you could end up mirroring the product code, including any bugs it might have. Also don’t share state between tests — they must be runnable in any order — or run a test from within another test. Keep them isolated.

Giving tests meaningful names is also important for maintainability, as well as readability. When you can infer from the name what the test is trying to verify, then the test code can be checked to see that it is actually testing what is says it is testing. You can make sure that it keeps testing for the same behaviour, even when there are changes to the API it is using. It can also be rewritten if the code is a complete mess.

In following the SOLID principles, or the old virtues of loose coupling and high cohesion, you will typically end up with units that do just one thing. However, what “one thing” is depends on your level of abstraction, and the unit may also have to operate in a number of different scenarios. These factors can complicate the set-up code, and complicated set-up code is a maintainability pain. A way to deal with this, is to have multiple test cases, or multiple test fixtures, for a unit — one for each scenario. Then each scenario only need set-up code that is relevant for that particular scenario, and you end up with less clutter in the set-up code. Since many test-bugs are in the set-up code, you will most likely also end up with more trustworthy tests.

When your tests are readable, then they become easier to maintain. When the tests are maintainable, then chances are that they will actually be maintained. When you know that the tests are maintained, and you can tell what they are testing, then you can trust them to be that safety net they are supposed to be.

Summary

  • Start out by writing many tests and get the hang of the TDD rhythm.
  • Gradually improve test quality — read blogs, books or other people code.
  • Stick to it, don’t give up. You will end up a better programmer overall.
  • Test behaviour rather than methods, and only through the public API.
  • Give your tests descriptive names. This is essential for readability.
  • Favour “humid” tests, when you choose where to place set-up code.

And lastly, if you find that some part of your code is particularly difficult to test, then chances are that you are being challenged to come up with a more testable design. Testability often means looser coupling and higher cohesion, so it tends to be a good idea to listen to your tests.

Good luck with it!

Comments

Posted October 04, 2011. http://thoughts.karmazilla.net/2011/10/04/cqrs-and-event-sourcin.html

CQRS and Event Sourcing

I was recently at the Agile Architecture Open Space Conference, and Command/Query Responsibility Segregation (CQRS) as an architecture was a big topic on the conference. Mark Seeman introduced the terminology and the principles, and Jeppe Cramon described a practical application in a concrete case.

The idea is that you make an explicit separation of commands that change the state of the system, from queries that only read state and never changes it. Commands and queries are represented as objects (see the patterns Command, Unit of Work, and Context from DCI) that are sent to the system from a front-end or UI-layer, or external system. Each command or query represents an intention to do something, and may be synchronously or asynchronously rejected based on validation and other rules. When the system performs a command, it generates one or more events. Commands are often queued before they are executed, partly because this decouples the system from its clients, and partly because this allows multiple back-ends to run concurrently, in turn allowing deployment of new versions without any down-time at all. Queries, on the other hand, are often synchronous and executed against a separate set of nodes dedicated to the task. Commands, queries and events can have version numbers in their serialized formats, making Just-In-Time data-migration possible, which in turn makes it easier to have multiple versions of the system running concurrently.

An event is a piece of information about something that has happened. Events are immutable, because they represent the past, and you cannot change the past. Events can be persisted in a database, sent to a number of recipients through some publish-subscribe model, topic or queue, or be handled by generic event handlers. The architecture poses no limits to what can happen to the events. A typical scenario, however, is that the events are used to update a representation for the system state that is optimized for reading. Queries, because they only read state and do not change it, can then operate on the read-friendly dataset. They will have to be able to tolerate the slight lag there exist in between the command executions and the updates to the read-friendly dataset, but since most systems use an optimistic concurrency model they are unlikely to be able to observe this usually small lag anyway.

This way, writes and reads are separated and (can) happen on different databases. This way, writes and reads can scale independently, and the acknowledgement of eventual consistency makes the whole system more scalable in general. Another neat benefit is that audit logging and tracing become trivial to implement, simply by persisting the events. A persisted set of events, or a steady stream of events, also makes real-time Business Intelligence not only possible, but potentially easy, since running a report is really just a query against a specialized dataset – this fact alone can be reason enough to build systems on a CQRS architecture. And one more time for people who might not realize it: Real-time Business Intelligence is awesome!

Concrete CQRS Example

Jeppe Cramon of TigerTeam presented, in broad terms, a user administration system that was built on a CQRS architecture. The system was part of a larger installation, where multiple external systems were interested in information about users, and each of them had their own user representation in their own databases. A command in this system could be a user updating his password. In the absence of a single-sign on deployment, the system verified the password change and stored the hash of the new password in its database. Then it sent password-update events, containing the new password hash, to a number of event handlers that updated the user databases of the external systems. Adding a new system to the installation, or treating some of them slightly differently from the others, was easy to do because they each had their own event handler.

This way, the responsibility of maintaining user information was collected in a single part of a larger whole composed of multiple legacy systems. CQRS is useful for these types of loosely-coupled integrations, but the read-write segregation also presents advantages in enabling scalability, and the immutable nature of events make interesting things possible in terms of BI. On the other hand, there are cases where CQRS makes less sense. For instance, if you have a front-end that is very sensitive to latency, or if you do not have sufficient control over the database setup and schema to make immutable events and the read-write split possible.

Comments

Posted August 26, 2011. http://thoughts.karmazilla.net/2011/08/26/the-tdd-process-pattern---agai.html

The TDD Process Pattern - Again

I have said before that TDD is a process pattern, but I don’t think I was terribly clear on what I meant and how that actually works. To define TDD as a pattern, we need to know what a pattern is. I suppose there are many different interpretations of what “pattern” is suppose to mean, but I draw my definition from the wisdom of Christopher Alexander. According to him, a pattern is a configuration that solves a conflict in a given context — this is the 3 Cs definition. To define TDD, or anything really, as a pattern, we must break it down into these constituent parts and define each. Doing this in the reverse order of the 3 Cs tends to make them easier to read as a whole, though one must be mindful that more than one context can be relevant to a pattern, which shouldn’t be precluded from the start.

The context of TDD is often software development, although it can be any sort of creative act where prolific micro-testing throughout the process is practical and reasonably cheap. It specifically pertains to the creative part, and the people who practice it. That is to say, it is relevant to the software developer, but not to his manager.

The conflict that can arise in this context, and which TDD helps to solve, is a little harder to define. Alexander models the conflict as a field of forces, that are interacting in the given contexts, and which needs to be brought into balance.

One such force is the drive to introduce new features and functionality to the system. This causes changes to happen to the system — changes that introduce complexity and entropy. Another force is our wanting to keep the cost of change low. However, complexity needs to be managed, so it slows us down, and therefore we like to keep the complexity as low as possible. Entropy amplifies the cost of complexity by making it disorderly, an introducing more of the so called accidental complexity. A third force is our aversion to bugs. We want to keep bugs out of our systems; to not introduce them in new code, nor to break existing code that works.

Finally come the configuration part — this is simply the definition of the TDD process as we commonly know it:

  1. Write the simplest failing test you can, that validates a desirable behaviour.
  2. Make the simplest change to the system you can, that makes the test pass.
  3. Simplify the system as a whole, without changing its behaviour.

With this, we now have the parts that makes up the pattern and we can piece them together, to make the complete pattern. The context part is easy enough, but we might have to explain how the configuration resolves the conflict. We can summarise the forces as wanting to change the system, to introduce features rather than bugs, and wanting the changes to be cheap and easy.

When we write a failing test at the beginning of the TDD cycle, we are in essence making the code demand of us that we introduce a certain new behaviour to the product code. So far so good on introducing new features. Then we leave the tests in the code base and keep running when ever we make changes to the system. This way we are prevented from breaking existing behaviour that we have already made to work. So far so good on keeping bugs out1.

When we have a high test coverage, we have a safety net for changes. We can make changes to the system, confident that a test will tell us if we break something. When we can make changes with this kind of confidence, we can make more changes than what is strictly necessary. In other words, we have headroom to make changes that improves the design of the system, without altering its behaviour. This is the refactoring step of TDD. It gives us a designated space for removing any entropy and accidental complexity, that might have sprung into existence when we changed the behaviour of the system. We continuously weed out in our little code base garden, so to speak, and strive to reduce the complexity of the system to its bare essentials. Furthermore, when we continuously test our code, every part of the code at every step of the way, we force it to be testable. Every unit must be testable in isolation, and so it must be possible to isolate each unit. This naturally demands a loosely coupled design. When the parts are properly separated, it also becomes easier for each new functionality to find its proper place in the design, leading to higher cohesion. In essence, we end up with a better design — a testable design.

Thus, the force of wanting to reduce complexity and remove entropy is resolved. While we do often end up with a simpler design, TDD itself does not really drive that part. Rather, the simple design tends to come from the experience of the people who have done TDD for some years. However, if we say that we want a design that is “as simple as can be, but no simpler” then TDD does help us with the “but no simpler” part, by ensuring that our design can provably implement the features we want it to.

With this, we have shown that all of the forces in the context are resolved, to some degree, by the configuration of TDD, and we have made it a pattern proper. The utility of this pattern might be in helping to explain not only what TDD is, but also why people should care. The utility of making it a pattern, the mental exercise of it, is that I had to think a lot about not only the make up of TDD, but also the make up of patterns. My own understanding of both TDD and patterns in general have been made clearer, which is only delightful.

I hope you found this blog post useful.


  1. I am well aware that TDD alone is not enough to keep bugs out of the system. However, it is a significant step of the way, and so I would say that this force is resolved here. Also note that patterns never exist in isolation, but as part of a whole where each part interacts with its neighbours in the system.

Comments