TDD - A demonstration

While working on my Lexical Analyser post, I started thinking about the fact that I was going to show some essentially completed code, and related unit tests. In that post, I am repeating a pattern I have used before, but not copy-pasting from any other project. Just glibly producing a fully formed and fully tested class might not be as useful as showing how I arrived at the class and its unit tests.

At this point in time I’ve not written any code, but since this is a pattern I’ve used before and I know roughly what I need, I can pick out one of the core classes and show step by step how it emerges from the unit tests. (I like to think of the code as being implied by the requirements.)

First, let me set the scene. The problem is that I need to turn a text string into a set of tokens, so that they can be parsed later. To do this is need to work my way along the string and identify the tokens it contains one by one. This can be done left-to-right, because there is no need for a later token to be able to change the nature of a token we have already identified.

Here’s a sample string:

(9/3)+11 - 6

We are not going to look deeply into how we identify tokens, but here’s some code I just bashed out:

        public static IEnumerable<Token> ExtractTokens(string input)
        {
            var pos = new StringKeeper(input);
            while (!pos.Finished)
            {
                Token nextToken = null;
                if (TryTakeOperator(pos, out nextToken)
                || TryTakeNumeric(pos, out nextToken)
                || TryTakeParens(pos, out nextToken)
                {
                    yield return nextToken;
                    continue;
                }

                yield return new ErrorToken(pos);
                break;
            }
        }

(That isn’t quite correct, and it didn’t survive unscathed, but it was my first thought.)

The idea is that I will use some as yet imaginary type (StringKeeper) to hold my position in the string, so that I can give it the responsibility of looking at the input text, and then I don’t need to think about it while I concentrate on how to identify tokens.

For example, TryTakeNumeric is going to need to return true if the next thing in the string is a valid number, or false if it isn’t. In order to do that to a string like “9999” it will need to keep taking characters while it’s getting digits. When it reaches the last “9” it needs to check what’s next and decide whether to produce a token, or detect that the next character is not a valid way to end a number. For example, this might be invalid “9999oops”, but this might be fine “9999+” (i.e. the number is 9999 and the “+” is one of a set of characters that can directly follow a number).

Next I bashed that out too:

    private static bool TryTakeNumeric(StringKeeper pos, out Token nextToken)
    {
        var work = new TokenKeeper(pos);
        var number = string.Empty;
        while (!work.Finished && work.NextIn("0123456789."))
        {
            number += work.Take();
            if (!decimal.TryParse(number, out _))
            {
                var errorData = string.Empty;
                while (!work.Finished && !work.WhiteSpaceNext() && !work.NextIn(Terminators))
                {
                    errorData += work.Take();
                }

                if (errorData != string.Empty)
                {
                    //we know that this is a badly formed number, so we should return an error token
                    nextToken = new ErrorToken(number + errorData, "Invalid number");
                    work.Swap(pos);
                    return true;
                }
            }
        }

        if (number == string.Empty)
        {
            //if we didn't extract anything, this isn't a number at all.
            nextToken = null;
            return false;
        }

        nextToken = new NumericToken(number);
        work.Swap(pos);
        return true;
    }

Again, this hasn’t even been compiled yet (and bonus points if you’ve already spotted that it isn’t quite right), but you can probably see the idea. Essentially, if we identify the data as either a number or as a badly formed number, we create a token and return true. In addition we need to “commit” to the data we’ve extracted so that it’s not examined again, like this:

        nextToken = new NumericToken(number);
        work.Swap(pos);
        return true;

The important line here is work.Swap(pos);, where I’m saying “exchange the string position between two instances of StringKeeper so that the analysis that uses pos will continue from where I stopped.” The same process happens when an invalid number is detected. (I prefer this “commit to it” style to a “rollback style” where you reset things when you realise you are wrong, but it’s just a preference.)

This is a good enough thought experiment to build a version of StringKeeper. I need to confess something at this point. I’ve decided that messing with the string at character level was going to be messy, so I’ve invented the concept of StringKeeper sooner than I should have… Strictly speaking I should have started coding the LexicalAnalyser directly and then refactored StringKeeper into existence when I “realised” that it was a seperate responsibility. I don’t feel bad about this because it’s just the result of experience. The way I think of it is that the first version of the code sort of takes place in my head, and if visualising what I need to do makes it clear to me that it’s easier with an additional class, then that’s a good enough reason to go with it. I don’t need to go through the rigmarole of trying the “wrong” version to prove a foregone conclusion.

The first step is to create a unit test for the unborn StringKeeper. Again, I don’t always do this first. Sophisticated IDEs almost encourage you to create classes with a few keystrokes, and I often allow it to generate all of the empty methods mentioned in my code. However, I try not to fill anything out without a unit test to drive it.

My first test looks like this:

    public class TestStringKeeper
    {
        [Fact]
        public void NextInReturnsTrueIfFirstCharInRange()
        {
            //Arrange
            var keeper = new StringKeeper("test");

            //Act
            var result = keeper.NextIn("stu");

            //Assert
            result.Should().BeTrue();
        }

    }

After making everything compile, StringKeeper looks like this:

    internal class StringKeeper
    {
        public bool Finished { get; internal set; }

        public StringKeeper(string input)
        {
        }

        public StringKeeper(StringKeeper other)
        {
        }

        internal string TakeAll()
        {
            throw new NotImplementedException();
        }

        internal void SkipWhiteSpace()
        {
            throw new NotImplementedException();
        }

        internal string Take()
        {
            throw new NotImplementedException();
        }

        internal bool WhiteSpaceNext()
        {
            throw new NotImplementedException();
        }

        internal bool NextIn(string expectedCharSet)
        {
            throw new NotImplementedException();
        }

        internal void Swap(StringKeeper pos)
        {
            throw new NotImplementedException();
        }
    }

dotnet test gives the expected:

Total tests: 1. Passed: 0. Failed: 1. Skipped: 0.

The next step, if we are being pedantic, is to do the bare minimum required to make the test pass, i.e.

        internal bool NextIn(string expectedCharSet)
        {
            return true;
        }

There’s no harm in doing that if you want. Next though, we would need to add another test to force us to implement the method properly. I’m actually much more likely to start with two tests and not go via the naive implementation. I’m absolutely not advocating that we skip any tests. I want the test suite to genuinely prove that my code works, and to defend it against clumsy changes in the future. Should anyone misunderstand what NextIn is supposed to mean and break it, I want tests to fail.

In this specific case, however, I know how to check for a character in a string, and it’s provided by the framework, so one extra test will be enough, so let’s add another test that forces NextIn to do something other than return true:

        [Fact]
        public void NextInReturnsFalseIfFirstCharNotRange()
        {
            //Arrange
            var keeper = new StringKeeper("test");

            //Act
            var result = keeper.NextIn("nope");

            //Assert
            result.Should().BeFalse();
        }

To make these tests pass, we have to make a couple of decisions (and this is where TDD really scores). We need to start storing the input and remembering our place in it. We can get away with something really simple at this stage because if we discover later that it was too simple, whatever we change will still need to pass our first two tests. This is the safety net in action - I no longer need agonise over refactoring something I’ve already tested, because I can repeat the test easily and conveniently whenever I need to.

I added:

        private string _input;
        private int _position;
...
        public char Next { get { return _input[_position]; } }
...
        public StringKeeper(string input)
        {
            _input = input;
            _position = 0;
        }
...
        internal bool NextIn(string expectedCharSet)
        {
            return expectedCharSet.IndexOf(Next) >= 0;
        }

And both tests passed. However, while doing that it was obvious that this will not work if there’s no input. So I added a new test to prove it:

        [Fact]
        public void NextInReturnsFalseIfNoInput()
        {
            //Arrange
            var keeper = new StringKeeper("");

            //Act
            var first = keeper.NextIn("nope");

            //Assert
            first.Should().BeFalse();
        }

Which duly failed with Index was outside the bounds of the array, as expected.

As it happens, we need to be able to tell when we are out of data, and we have a Finished property for that. So we can add a few more tests:

        [Fact]
        public void FinishedIsFalseWhenInputIsNotEmpty()
        {
            //Arrange
            var keeper = new StringKeeper("a");

            //Act/Assert
            keeper.Finished.Should().BeFalse();
        }

        [Fact]
        public void FinishedIsTrueWhenInputIsEmpty()
        {
            //Arrange
            var keeper = new StringKeeper("");

            //Act/Assert
            keeper.Finished.Should().BeTrue();
        }

        [Fact]
        public void FinishedIsTrueWhenInputIsNull()
        {
            //Arrange
            var keeper = new StringKeeper((string)null);

            //Act/Assert
            keeper.Finished.Should().BeTrue();
        }

To make those pass, I need Finished to be correct:

        public bool Finished { get { return _input == null || _position >= _input.Length; } }

And I can check it in NextIn:

        internal bool NextIn(string expectedCharSet)
        {
            return !Finished && expectedCharSet.IndexOf(Next) >= 0;
        }

So that’s 6 passing tests. We haven’t addressed the other implied methods, so let’s get them working. WhiteSpaceNext() makes sense to go next. I’m deliberately avoiding anything that requires us to consume a character because I’ve started with simple “what’s the next thing” methods. There’s no particular reason for this choice, other than it feels more comfortable to finish off the methods that have strong similarities before switching to something that works differently.

        [Theory]
        [InlineData(" ")]
        [InlineData("\t")]
        [InlineData("\r")]
        [InlineData("\f")]
        [InlineData("\n")]
        public void WhiteSpaceNextReturnsTrueWhenItShould(string whitespace)
        {
            //Arrange
            var keeper = new StringKeeper(whitespace);

            //Act/Assert
            keeper.WhiteSpaceNext().Should().BeTrue();
        }

        [Fact]
        public void WhiteSpaceNextReturnsFalseForNonWhitespace()
        {
            //Arrange
            var keeper = new StringKeeper("non whitespace"); //only 'n' will be considered

            //Act/Assert
            keeper.WhiteSpaceNext().Should().BeFalse();
        }

        [Fact]
        public void WhiteSpaceNextReturnsFalseForEmptyInput()
        {
            //Arrange
            var keeper = new StringKeeper(string.Empty);

            //Act/Assert
            keeper.WhiteSpaceNext().Should().BeFalse();
        }

Once again, I’m taking a shortcut that allows me to implement WhiteSpaceNext() rather than taking the pedantic route. I’m only getting away with this because the nature of the method is simpler than NextIn(...). In fact, thinking about it, let’s implement it literally as a call to NextIn(whiteSpaceChars):

        private const string WhiteSpaceChars = " \t\r\f\n";
...
        internal bool WhiteSpaceNext()
        {
            return NextIn(WhiteSpaceChars);
        }

All the tests passed:

Total tests: 13. Passed: 13. Failed: 0. Skipped: 0.

Here I was hit with an uneasy feeling which is the consequence of writing those tests and having them pass first time… To satisfy myself I removed the “\n” from the WhiteSpaceChars constant so that one of the theory tests would fail, which it did. Intellectually I know that I have both positive and negative tests in the test suite and it was not necessary to prove it, but it felt slightly better to check. (This is an interesting side effect of test driven development - you get used to knowing the code you just wrote is correct. Not having that knowledge now makes me uncomfortable, but because I have a simple way to prove it does work, taking literally seconds out to gain the confidence is well worth it.)

        [Fact]
        public void TakeReturnsTheNextCharacter()
        {
            //Arrange
            var keeper = new StringKeeper("test");

            //Act
            var result = keeper.Take();

            //Assert
            result.Should().Be("t");
        }

        [Fact]
        public void TakeMovesThePositionAlong()
        {
            //Arrange
            var keeper = new StringKeeper("test");

            //Act
            keeper.Take();

            //Assert
            keeper.Next.Should().Be('e');
        }

        [Fact]
        public void TakeReturnsNulWhenNoDataRemains()
        {
            //Arrange
            var keeper = new StringKeeper("test");
            keeper.Take(); //t
            keeper.Take(); //e
            keeper.Take(); //s
            keeper.Take(); //t

            //Act
            var result = keeper.Take();

            //Assert
            result.Should().Be("\0");
        }

        [Fact]
        public void FinishedShouldBeTrueWhenNoDataRemains()
        {
            //Arrange
            var keeper = new StringKeeper("test");
            keeper.Take(); //t
            keeper.Take(); //e
            keeper.Take(); //s
            keeper.Take(); //t

            //Act/Assert
            keeper.Finished.Should().BeTrue();
        }

This time, pre-implementation:

Total tests: 17. Passed: 13. Failed: 4. Skipped: 0

And the implementation:

        internal string Take()
        {
            if (Finished)
                return "\0";

            var result = Next.ToString();
            _position++;
            return result;
        }

And:

Total tests: 17. Passed: 17. Failed: 0. Skipped: 0.

That’s nice. Next:

        [Fact]
        public void TakeAllReturnsFullString()
        {
            //Arrange
            var keeper = new StringKeeper("test");

            //Act
            var result = keeper.TakeAll();

            //Assert
            result.Should().Be("test");
        }

        [Fact]
        public void TakeAllReturnsRemainingText()
        {
            //Arrange
            var keeper = new StringKeeper("test");
            keeper.Take();

            //Act
            var result = keeper.TakeAll();

            //Assert
            result.Should().Be("est");
        }

        [Fact]
        public void FinishedShouldBeTrueAfterTakeAll()
        {
            //Arrange
            var keeper = new StringKeeper("test");

            //Act
            keeper.TakeAll();

            //Assert
            keeper.Finished.Should().BeTrue();
        }

And:

Total tests: 20. Passed: 17. Failed: 3. Skipped: 0.

Implementation:

        internal string TakeAll()
        {
            if (Finished)
            {
                return string.Empty;
            }

            var data = _input.Substring(_position);
            _position = _input.Length;
            return data;
        }

And:

Total tests: 20. Passed: 20. Failed: 0. Skipped: 0.

I was actually pleasantly surprised by that, I wasn’t sure whether the Substring call was going to turn out to be off by one. However, test is specific about what it expects, and the code duly returned it so it’s fine. Next:

        [Fact]
        public void SkipWhitespaceSetsPositionOnNextNonWhitespace()
        {
            //Arrange
            var keeper = new StringKeeper(" \t\n\r\ftest");

            //Act
            keeper.SkipWhiteSpace();

            //Assert
            keeper.Next.Should().Be('t');
        }

        [Fact]
        public void SkipWhitespaceCanSkipToTheEnd()
        {
            //Arrange
            var keeper = new StringKeeper("\n \t\f");

            //Act
            keeper.SkipWhiteSpace();

            //Assert
            keeper.Finished.Should().BeTrue();
        }

        [Fact]
        public void SkipWhitespaceWorksIfNoWhitespace()
        {
            //Arrange
            var keeper = new StringKeeper("test");

            //Act
            keeper.SkipWhiteSpace();

            //Assert
            keeper.Next.Should().Be('t');
        }

        [Fact]
        public void SkipWhitespaceWorksIfFinished()
        {
            //Arrange
            var keeper = new StringKeeper("test");
            keeper.TakeAll();

            //Act
            keeper.SkipWhiteSpace();

            //Assert
            keeper.Finished.Should().BeTrue(); //well, still true anyway
        }

        [Fact]
        public void WhitespaceSkippedDoesNotHaveToBeAtTheStart()
        {
            //Arrange
            var keeper = new StringKeeper("X \t\n\r\ftest");
            keeper.Take();

            //Act
            keeper.SkipWhiteSpace();

            //Assert
            keeper.Next.Should().Be('t');
        }

And:

Total tests: 25. Passed: 20. Failed: 5. Skipped: 0.

Implementation:

        internal void SkipWhiteSpace()
        {
            while (!Finished && WhiteSpaceNext())
                Take();
        }

It’s almost too easy…

Total tests: 25. Passed: 25. Failed: 0. Skipped: 0.

Next, copying:

        [Fact]
        public void InstancesCanBeCopied()
        {
            //Arrange
            var keeper = new StringKeeper("test");

            //Act
            var keeper2 = new StringKeeper(keeper);

            //Assert
            keeper2.TakeAll().Should().Be(keeper.TakeAll());
        }

        [Fact]
        public void CopiedInstancesShouldHaveTheSamePositionAsTheOriginal()
        {
            //Arrange
            var keeper = new StringKeeper("test");
            keeper.Take();

            //Act
            var keeper2 = new StringKeeper(keeper);

            //Assert
            keeper2.TakeAll().Should().Be(keeper.TakeAll());
        }

        [Fact]
        public void CopyingAFinishedInstanceShouldProduceAFinishedInstance()
        {
            //Arrange
            var keeper = new StringKeeper("test");
            keeper.TakeAll();

            //Act
            var keeper2 = new StringKeeper(keeper);

            //Assert
            keeper2.Finished.Should().BeTrue();
        }

And:

Total tests: 28. Passed: 26. Failed: 2. Skipped: 0.

An interesting result… one of our new tests is already passing. I think I know why. Currently the copy constructor looks like this:

        public StringKeeper(StringKeeper other)
        {
        }

Naturally, this produces an empty StringKeeper, and Finished will be true by default. So our test that says “copy a Finished one” is passing. Lets run just that one:

Total tests: 1. Passed: 1. Failed: 0. Skipped: 0.

Yep. It’s always best to check an unexpected result.

Implementation:

        public StringKeeper(StringKeeper other)
        {
            _input = other._input;
            _position = other._position;
        }

And:

Total tests: 28. Passed: 28. Failed: 0. Skipped: 0.

Excellent. After you’ve gotten a clean test run, you should think about what you just implemented, and ask yourself a couple of questions:

Is there a possibility that the implementation I just wrote might fail for some reason not in the tests?
Is there behaviour I didn’t explicitely test for?

In this case, I didn’t check that copies and the original were independent. As far as the implementaion is concerned, they should be fully independant, but since independence is a requirement for proper operation I should enforce it in the test suite in case that’s not fully understood in the future. That means more tests:

        [Fact]
        public void CopiedInstancesDoNotAlterTheOriginal()
        {
            //Arrange
            var keeper = new StringKeeper("test");
            var keeper2 = new StringKeeper(keeper);

            //Act
            keeper2.TakeAll();

            //Assert
            keeper.Finished.Should().BeFalse();
        }

        [Fact]
        public void CopiedInstancesDoNotFollowTheOriginal()
        {
            //Arrange
            var keeper = new StringKeeper("test");
            var keeper2 = new StringKeeper(keeper);

            //Act
            keeper.TakeAll();

            //Assert
            keeper2.Finished.Should().BeFalse();
        }

We expect those tests to just pass:

Total tests: 30. Passed: 30. Failed: 0. Skipped: 0.

We are left with just one function that is not implemented. These tests prove it:

        [Fact]
        public void SwapCopiesTheCorrectPositionIntoAnotherInstance()
        {
            //Arrange
            var keeper = new StringKeeper("test");
            var keeper2 = new StringKeeper(keeper);
            keeper.TakeAll();

            //Act
            keeper.Swap(keeper2);

            //Assert
            keeper2.Finished.Should().BeTrue();
        }

        [Fact]
        public void SwapImportsTheDetailsFromTheOtherInstance()
        {
            //Arrange
            var keeper = new StringKeeper("test");
            var keeper2 = new StringKeeper(keeper);
            var takenData = keeper.TakeAll(); //keeper is now finished

            //Act
            keeper.Swap(keeper2);

            //Assert
            var result = keeper.TakeAll(); //if swap did not change keeper, this will return string.Empty
            result.Should().Be(takenData);
        }

And:

Total tests: 32. Passed: 30. Failed: 2. Skipped: 0.

Implementation:

        internal void Swap(StringKeeper other)
        {
            var otherInput = other._input;
            var otherPosition = other._position;

            other._input = _input;
            other._position = _position;

            _input = otherInput;
            _position = otherPosition;
        }

And:

Total tests: 32. Passed: 32. Failed: 0. Skipped: 0.

Finally, we need to think about our implementation, and whether it is as well factored as it could be? Here it is:

    internal class StringKeeper
    {
        private const string WhiteSpaceChars = " \t\r\f\n";

        private string _input;
        private int _position;

        public bool Finished { get { return _input == null || _position >= _input.Length; } }
        public char Next { get { return _input[_position];  } }

        public StringKeeper(string input)
        {
            _input = input;
            _position = 0;
        }

        public StringKeeper(StringKeeper other)
        {
            _input = other._input;
            _position = other._position;
        }

        internal string TakeAll()
        {
            if (Finished)
            {
                return string.Empty;
            }

            var data = _input.Substring(_position);
            _position = _input.Length;
            return data;
        }

        internal void SkipWhiteSpace()
        {
            while (!Finished && WhiteSpaceNext())
                Take();
        }

        internal string Take()
        {
            if (Finished)
                return "\0";

            var result = Next.ToString();
            _position++;
            return result;
        }

        internal bool WhiteSpaceNext()
        {
            return NextIn(WhiteSpaceChars);
        }

        internal bool NextIn(string expectedCharSet)
        {
            return !Finished && expectedCharSet.IndexOf(Next) >= 0;
        }

        internal void Swap(StringKeeper other)
        {
            var otherInput = other._input;
            var otherPosition = other._position;

            other._input = _input;
            other._position = _position;

            _input = otherInput;
            _position = otherPosition;
        }
    }

I think it’s okay. Where possible, methods are implemented in terms of other methods rather than repeating any code. My only reservation is that we may want to match a string (like a keyword) and you’d have to match it one character at a time, so a bool IsNext(string word) could be useful, but since we don’t currently have a requirement for that it would be speculation to add it now (YAGNI, you know it makes sense). We may be back here to add features later, but right now we have a fully functional class which we have a reason to trust (because it passes the tests), even though it’s not been used in anger as yet.

Incidentally, there’s another test that I’ve not been running that failed before I started this post and gave rise to the ExtractTokens method that we started with. It still fails now:

    public class TestLexicalAnalyser
    {
        [Fact]
        public void TokensAreExtracted()
        {
            //Arrange
            var testStrings = new [] {
                "1 + 2 + 3",
                "",
                null
            };

            var output = new Output();

            //Act
            foreach (var testString in testStrings)
            {
                var result = LexicalAnalyser.ExtractTokens(testString);

                output.WrapLine($@"Analysis of ""{testString ?? "NULL"}"":");
                output.FormatTable(result.AsReport(rep => rep
                                                    .AddColumn(r => r.TokenType, cc => cc.LeftAlign())
                                                    .AddColumn(r => r.Text, cc => {}))
                );

                output.WriteLine();
                output.WriteLine();
            }

            //Assert
            output.Report.Verify();
        }
    }

Sometimes tests can be a kind of a task list and it can be useful to leave them broken for a while. Unless you are checking in, of course. That would be bad.

Thanks for reading,

Jamie.

Written on August 8, 2019