Design, Test and Develop like it's heaven on Earth - Part 4

Published on 6/11/2019

We’re now getting into the swing of developing this service, so we carry on with the next verb, Create.

Before we start let’s refresh ourselves with the requirements that we saw in part 3.

Functional requirements

The blog title should be in the url, with all escaped characters like punctuation removed.
The blog title will be displayed as a heading
The blog publish date will be displayed under the heading
A blog article image or lead-in image will be displayed if it exists
The blog content will be displayed

Non-functional requirements

The blog should load in less than 2 seconds
The blog should scale correctly on all popular screen sizes

Red/Green/Blue continued

We start with a unit test as part of the existing service test class.


[TestMethod]
public async Task Create_Verify()
{
    //arrange
    Blog blog = new Blog() { Title = "Hello Test!" };

    //act
    await Service.Create(blog);

    //assert
    _dbAdaptorMock.Verify(x => x.Create(blog));
}

This is a simple test to verify that the service calls the database adaptor with the supplied object. Like before we use the code completion features to create all the methods. Fortunately, because we did that refactor in the previous part we can simply use the Service property which is already correctly initialized. Now we need to pass our test with some code changes.


public async Task Create(Blog blog)
{
    await _dbAdaptor.Create(blog);
}

That one was easy, but we have a bit more to do here. We’ve touched on the title in the URL in part 3. If we are to be able to lookup a blog entry from its title, then we have to persist that title in the database as a key (either as a primary key or on some index). As we’ve seen before, given the simple title from the URL, we have no way to reconstitute the original title. This means that we’ll have to transform the rich title to become a key when we save the blog. Let’s add that to the unit test also.


[TestMethod]
public async Task Create_Verify()
{
    //arrange
    Blog blog = new Blog() { Title = "Hello Test!" };

    //act
    await Service.Create(blog);

    //assert
    _dbAdaptorMock.Verify(x => x.Create(blog));
    Assert.AreEqual("hello_test", blog.Key);
}

Obviously, the Key property doesn’t exist, so we code-complete that. And then we have to write code to assign it after we strip out all punctuation and encoded characters so that our test can pass.


 public async Task Create(Blog blog)
 {
    blog.Key = blog.Title;
    char[] punctuation = new char[] { '`', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '.', ',', '?', '\'', '"', '~', '+', '-', ':', ' '};
    for (int i = 0; i < punctuation.Length; i++)
    {
        blog.Key = blog.Key.Replace(punctuation[i], Char.MinValue);
    }

    await _dbAdaptor.Create(blog);
 }

But our unit test doesn’t pass. We see from the failed assert that Message: Assert.AreEqual failed. Expected:<hello_test>. Actual:<Hello\0Test\0>.

There are three things wrong here. Punctuation is actually replaced by \0 and not simply removed (I learnt that Char.MinValue is not the same as empty or no value while writing this blog). The space was replaced as part of the punctuation, and we still have capital letters. We can address the char issue by switching to strings, and the missing space issue by handling it specifically instead of generally. The framework can do the heavy lifting for us on the casing issue.


public async Task Create(Blog blog)
{
    blog.Key = blog.Title;
    string[] punctuation = new string[] { "`", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", ".", ",", "?", "'", "\"", "~", "+", "-", ":" };
    for (int i = 0; i < punctuation.Length; i++)
    {
        blog.Key = blog.Key.Replace(punctuation[i], string.Empty);
    }
    blog.Key = blog.Key.Replace(" ", "_");
    blog.Key = blog.Key.ToLower();

    await _dbAdaptor.Create(blog);
}

Our unit test now passes, so let’s move to blue to refactor.

Look at that Create method again. It’s purpose is to tell the database to persist a blog entry. It’s secondary purpose is to transform the blog post into something that is meaningful to us for persistence. However it should not also be encumbered with the details of each transformation. The for-loop in that method adds to the conditional complexity of the entire method, and it’s a massive preamble that a reviewer or co-contributor has to work through before they get to the main purpose. To give a specific term to that purpose: it should orchestrate the persisting of the blog entry. Everything else is secondary and probably best detailed elsewhere. So, let’s refactor that bit out.

SOLID: S for Single Responsibility


public async Task Create(Blog blog)
{
    blog.Key = AsUrlTitle(blog.Title);
    await _dbAdaptor.Create(blog);
}

private string AsUrlTitle(string title)
{
    string[] punctuation = new string[] { "`", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", ".", ",", "?", "'", "\"", "~", "+", "-", ":" };
    for (int i = 0; i < punctuation.Length; i++)
    {
        title = title.Replace(punctuation[i], string.Empty);
    }
    title = title.Replace(" ", "_");
    title = title.ToLower();
    return title;
}

You’ll notice that this method works slightly differently from the original code. It doesn’t even know about the type Blog. It is only concerned with strings. Previously I first had to assign the Title value to the Key property, and then transform it in place. Now the private method leverages off of a value type parameter which can directly be manipulated and returned as a result, without affecting where the value came from. Why did I not take in the type Blog here?

Having to transform a rich title to a simple URL title seems like a useful thing to be able to do. It is actually an integral part of this requirement. So I want this piece of code, this method, de-coupled from the type Blog. This means that later I can move it out of the Blog namespace completely and just have it as an extension method perhaps. But whatever emerges or doesn’t emerge in the future of this development, this method is now responsible for only one thing and it has zero dependencies, where-as the Create method now just string along a set of calls in the correct order. It’s one job is to orchestrate. "But hold up" I hear you say. "What about that array of punctuation strings? That should be a constant declaration!" you say.

Clean code over working code

Well, if this was a string manipulation library or component, then sure. It would make sense to have something like that available to the different functionalities inside such a component. But here? I don’t think that littering the top of the BlogService class with the details of known punctuation where it will be completely out of context serves any purpose. Yes, there is a very small performance impact because it’s an additional memory allocation in the method (and we can make that better through ArrayPool also), but it’s marginal at best and totally insignificant when you consider the numbers I have to serve here. And while the effort to move it to a constant is also minimal, at this point it’s cleaner to keep the responsibility and dependencies of the method enclosed with-in the method itself.

To put it another way, I don’t think that the array declaration in this method is where my focus for performance improvements should be now, or even later on. It’s well known in software development that performance tuning happens too early, and happens on only 20% of the code. And while it might be an obvious improvement, it will also muddy the BlogService class and make turning this method to an extension later on more complicated. All for a very small gain that I will barely be able to measure.

But what about some LINQ?

There is another refactor that you can consider here. The for-loop is more than sufficient, but old-school. You can rewrite that using inline (or fluent) declaration and LINQ.


private string AsUrlTitle(string title)
{
    new List() { "`", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", ".", ",", "?", "'", "\"", "~", "+", "-", ":" }
        .ForEach(x => title = title.Replace(x, string.Empty));

    title = title.Replace(" ", "_");
    title = title.ToLower();
    return title;
}

I know a lot of people that prefer this. Personally I’ve got a split opinion. Again, the code is more concise and terse, but difficult to understand and read (I do recognize that it would read much easier if the array was a constant declaration outside of the method though). Also, the implementation detail is abstracted from you. Not everyone understands that there is still a for-loop happening here, it’s just not in your code. Most projects now use LINQ to objects almost everywhere, and I’d say use this code if you and your team is comfortable with it, can effectively read and maintain it, and understand the consequences of using LINQ extensions.

Deeper into the requirements

So we’ve done some refactoring and confirmed that our unit test still passes despite the code changes. This is one of the biggest benefits of unit testing. It is then time to look deeper into the requirements and see how we can fulfill them. The blog has a publish date. The user will probably be able to set this, but what if they don’t? It’s always a good idea to show the date of an article. How many times have you read a technical piece or a tutorial and you only realized it was out of date halfway through? This will require two unit tests.


[TestMethod]
public async Task Create_NullDate_IsSet()
{
    //arrange
    Blog blog = new Blog() { Title = "Hello Test!", PublishDate = null };

    //act
    await Service.Create(blog);

    //assert
    Assert.AreEqual(DateTime.Today, blog.PublishDate);
}

[TestMethod]
public async Task Create_HasDate_NotSet()
{
    //arrange
    DateTime expectedPublishDate = DateTime.Today.AddDays(-4);
    Blog blog = new Blog() { Title = "Hello Test!", PublishDate = expectedPublishDate };

    //act
    await Service.Create(blog);

    //assert
    Assert.AreEqual(expectedPublishDate, blog.PublishDate);
}

We need two because we have two distinct cases to handle. One where the date is null, and one where it isn’t. To pass our tests we need to add some code to the service method.


public async Task Create(Blog blog)
{
    blog.Key = AsUrlTitle(blog.Title);

    if (blog.PublishDate == null)
    {
        blog.PublishDate = DateTime.Today;
    }

    await _dbAdaptor.Create(blog);
}

Looking at this code, do you see the two different branches? One is if the date is null, and we give it a value. But the other?

Branching Code

The other branch is implied, and it is to do nothing when the date is already set. Always when there is no explicit else section, there is an implied else section, and we have to ensure our unit tests cover that one too. We do this to guard for a change in the code that would introduce an else if or explicit else section in the future.

Now that the tests pass again, what can we refactor? An easy one would be to reduce the multi-line if-statement to a single line null-coalescing operator. This is a feature in many languages that makes your code more terse and compact (it is similar to the ternary operator). If you feel that it is not obvious enough what the code does, or that your team isn’t ready to deal with this style of code, then do not use it. Being more explicit or verbose is always better if you’re unsure. For me and for my project, I’m making this change.


public async Task Create(Blog blog)
{
    blog.Key = AsUrlTitle(blog.Title);

    blog.PublishDate = blog.PublishDate ?? DateTime.Today;

    await _dbAdaptor.Create(blog);
}

Now consider, does the service method still have a single responsibility? Does it still only orchestrate persistence? It’s entirely correct to argue that this new line of code (or the if-statement) should be in its own method. Perhaps a Validate method, which will either correct or fill in what’s missing from the blog if it can, or throw an exception when it can’t. And this is exactly what I would do if there were other validations to do, or if I had decided to keep that multi-line if-statement. So before I decide on this refactor and because I changed to a null-coalescing operator, let’s consider if there are any other validations to do here that would constitute a new method. What is the only thing we need to exist in order to fulfill the requirement to save a blog? The title must have a value! Even if there is no content, no publish date or no image, without a title the blog post cannot exist in the database. It is our key, and our URL will result in a 404 Not Found error. Do we need more unit tests for this? Yes, because we have to cover the null case, and the not null case. Let’s do the null case first.


[TestMethod]
public async Task Create_TitleIsNull_Throws()
{
    //arrange
    Blog blog = new Blog();

    //act/assert
    await Assert.ThrowsExceptionAsync(() => Service.Create(blog));
}

You’ll see that here the act and assert sections are combined, because we have to act in order to use the exception assert method. Now we update the Create method to handle this case for us in order to pass our test.


public async Task Create(Blog blog)
{
    blog.Key = AsUrlTitle(blog.Title);

    blog.PublishDate = blog.PublishDate ?? DateTime.Today;

    if (string.IsNullOrEmpty(blog.Title))
    {
        throw new ArgumentNullException(nameof(blog.Title));
    }

    await _dbAdaptor.Create(blog);
}

Our unit test still fails, but with an unexpected error: Message: Assert.ThrowsException failed. Threw exception NullReferenceException, but exception ArgumentNullException was expected.

Validations

What does this mean? ?NullReferenceException is encountered when a reference (to a class, interface, etc) is null and we try to use it. When we follow the stack trace we see that the exception originates in the method where we transform the rich title to a URL title. Specifically, it is trying to call the Replace method on a null instance. This means we need to change the order of our orchestration to first do the validation, and then do the transformation.


public async Task Create(Blog blog)
{
    if (string.IsNullOrEmpty(blog.Title))
    {
        throw new ArgumentNullException(nameof(blog.Title));
    }

    blog.PublishDate = blog.PublishDate ?? DateTime.Today;

    blog.Key = AsUrlTitle(blog.Title);

    await _dbAdaptor.Create(blog);
}

I also moved the date null-coalescing statement because previously we decided that this was also validation.

Concept: Error handling strategy

But why throw an ArgumentNullException? Let’s quickly consider different methods of reporting an error back to the caller, and why.

You can return some sort of object that contains a string detailing the error. Perhaps it also has an error code property.
You can return null, (or -1, or "Undefined", depending).
You don’t do anything and rely at runtime on the NullReferenceException we’re seeing now.
You can throw your own custom exception type, or throw one of the framework provided exceptions.

Certainly all these options are valid, but where all the options are equal, some are more equal than others. You want to provide the caller of your code with all the information they might need in order to help themselves when they encounter an error, so put yourself in the shoes of whoever is going to call your method.

Solution 1: return error model

Under the first solution, the caller have to understand your custom error model, and probably do some error code lookup if there is one. And where would the return data be if it wasn’t giving an error? On a property that is sometimes null and other times not? Perhaps there will have to be a switch statement in your code to determine the error code that can be returned from a larger set? Is this set exhaustive? Is it null when there is no error? Do you have documentation to provide them about this error model? Do you have unit tests that cover every possible permutation of the values of this error model?

Solution 2: return null (-1, “Undefined”)

Also known as error handling by convention. In this second solution all the calls to your code will have to be followed by an if-statement: if (result == null) or (if result == -1). And what happens when some time in the future null or -1 suddenly becomes a possible valid result, when no error occurred? It’s even worse when the return type is bool! True, False and… grey area?. Additionally, all those if-statements come with overhead for the CPU branch prediction engine. It’s also much harder to JIT branching code into compiled code that can run faster. Your caller will always have to incur these costs because they cannot call your code without doing if-statements afterwards (this is also actually the case for the first solution: if error code has value x).

Both these solutions introduce accidental and cognitive complexity and incur a lot of technical debt (more on these concepts in a later post). It also requires a lot of documentation and edge-case and branching unit testing. And not just for you, but also for who-ever is going to call your code.

Throwing

Now consider the last two options. There is overhead with exception handling, no-one denies that. But that overhead is only prevalent when an error actually occurs. As long as you are in the try block, it is irrelevant. This is in stark contrast to the first two solutions, where no matter if an error occurred or not, the if-statement following it will always have to execute.

But here’s the rub. An exception is just a specific type of error model, like the first solution. The base System.Exception class has a message property and a stacktrace property, and the type of the exception is essentially the error code. The difference here is that the .NET designers have already thought about and answered all the questions put at the first solution. The result is that the C# language has a first-class citizen approach to exception handling, and you would be well advised to make use of it. Learn about the different exception types that the framework offers, and throw them in appropriate places. Create your own exception types by extending System.Exception and throw those in appropriate places. And then provide documentation about your exception types.

More generally, make an effort to understand what is the best-practice approach for the language that you use, learn it and understand it in-depth and make use of it, instead of trying to reinvent the framework.

Finally, you can omit a lot of defensive code and rely on the runtime exceptions (like this NullReferenceException from the unit test). Just know that it will make debugging your code and providing support to callers of your code much harder. NullReferenceException and ArgumentNullException have two very different meanings, and it might spill over to your API responding with a 500 Internal Server error as opposed to a 400 Bad Request error.

CopyPasta

Now our test passes. We don’t need to write another test for the not null condition this time. Why? Because I copied the code from the Create_Verify test to create all the other tests. By doing that, all the other tests had a title set in the arrange section. Had I written the date tests from scratch I would not have set the title at all because it wasn’t being tested in those tests, and they would have failed on the same NullReferenceException. But because those tests do set the title, we’re already covering the not null branch multiple times e.g there is no exception being thrown.

There is value in not copying and pasting code. Even your own code. When you copy code you almost always include stuff from the source that you don’t need at the target. This is even more true when you copy code from sources like StackOverflow. Had I written the date tests from scratch, I would have encountered the NullReferenceException early and it would have emerged to me that I was missing additional validation. Instead I had to sit and think about my design and my tests and realize that I was missing the validation for the title. In fact, if I was not doing this write-up of my work, it would probably never have occurred to me at all. Beware the copypasta!

Refactor

Now that we’ve finalized the last of the validation and all our tests pass, let’s reconsider that Validate method mentioned before. The Create method now has quite a few lines of code, and it starts with that if-statement. It’s not complicated to understand, but it looks messy. We can restructure this to be much more explicit about its orchestrating responsibility by creating that separate method for some of these lines.


public async Task Create(Blog blog)
{
    Validate(blog);

    blog.Key = AsUrlTitle(blog.Title);

    await _dbAdaptor.Create(blog);
}

private void Validate(Blog blog)
{
    if (string.IsNullOrEmpty(blog.Title))
    {
        throw new ArgumentNullException(nameof(blog.Title));
    }

    blog.PublishDate = blog.PublishDate ?? DateTime.Today;
}

Conclusion

In this part we fulfilled two requirements when persisting a blog entry, and discussed single responsibility. We built on the infrastructure that we put in place in part 3. We wrote our first algorithm of sorts to cleanup a piece of text, and we could execute that algorithm without having a console or web app, but by using unit testing. We saw how unit testing helps us to immediately test for different conditions or branches in our code, and that a branch is not always obvious to us. We saw how copying code can lead to missed implementation or even errors. We saw how we managed to convince ourselves that we could extract another method as part of refactoring in order to keep our code clean. We also discussed method implementation details like keeping coupling and dependencies to a minimum, not wasting effort on premature optimization and the benefits of supporting exception handling by throwing.

Design never stops. From the highest overview to the lowest line of code, our design today is always important. Every single decision that we made in this part will play a role in the next part, and thereafter, for future-us. But what is also important to understand is that your design today cannot be perfect. The future will always bring changes that will challenge your design, and the only way to insure that you can change effectively and efficiently is with clean code.