2012-08-26

Why should you learn and use TDD in the first place?

I've been a preacher of test-driven development (TDD) since my initiation to test-first development some time around 2001. And as far as programming goes, I feel crippled if I can't plug a feature in by means of TDD. Sorry. Wrong. Correction follows. I am crippled if I can't plug a feature in by means of TDD. For me TDD is primarily a design aid - a way to be able to think about the solution at hand. If I don't have that tool available, due to e.g. too tightly coupled legacy code, my ability to be productive is reduced. If I work with you, I want you to do TDD, because then I know you'll make the same design considerations I am making. And that is the main reason for me to advocate TDD. It's a hygiene factor, where the absence of it feels like if I'm typing on a keyboard that lies beneath a mattress.

But what should be your motivation for doing TDD? I dont believe that me saying cheese pretty please on my knees is very convincing in that regard. I remember some debacle from 2007 and onward when Robert C. Martin stated that in his opinion a software developer cannot consider him self a "professional" if he did not do TDD. I'm not really sure that it convinced anyone to do TDD, maybe even the exact opposite. So why do I think you should do TDD? Maybe because it provide better quality products? Or maybe it yield better designs? Because it improves communication? Should you do it to avoid being slapped in the face with a cod from your other team-mates that are doing TDD? Does it come out on top after doing a cost-benefit analysis? What should be the important motivational factor here? I might give an answer before I'm done with this blogpost, but first some good old-fashioned arguing.

I'm writing this, because recently I read this blogpost by Jacob Proffitt (it trickled up on the frontpage of Hacker News) and it ripped open some old battle-scars on the topic. The blogpost is somewhat old, but still a good read. The topic of the post is a criticism of another blogpost written by Paul Haack and the article "On the Effectiveness of Test-first Approach to Programming" by Erdogmus et al. Proffitt points out several problems with both the conclusions of Haack's blog post and the article he criticizes. And he allows me to highlight yet another problem that can show up in discussions about TDD. Let me quote the following section from Proffitt's blogpost:

More interesting still would be the correlation between "Testing First" and unit test quality. Ask yourself this, which unit tests are going to be better, the ones created before you've implemented the functionality or those created after? Or, considered another way, at which point do you have a better understanding of the problem domain, before implementing it or after? If you're like me, you understand the problem better after implementing it and can thus create more relevant tests after the fact than before. I know better where the important edge-cases are and how to expose them and that could mean that my tests are more robust.
He concludes that your tests will be better, if they are written after implementing an artefact. Sounds like a reasonable argument, right? Well. I'm so sorry, and I hate to say this so bluntly, but the argument is baloney. The first question in the paragraph should rather be the following: "Ask yourself this, which unit tests are going to be better, the ones created after writing a lot of tests while implementing the functionality, or those bolted on after just implementing the functionality?" I guess it's obvious what my answer to that question is. The point being - I'm not entirely convinced that the author really get TDD, and that is a big obstacle in these kinds of discussions. Anyhow. Bad form by me - as mentioned, it is somewhat old, and the paragraph is taken out of context.

I have been looking for academically published empirical results regarding the merits of TDD for quite some time, and I've managed to dig up some publications around this topic (including the previously mentioned article by Erdogmus et al.) Maximillien and Williams report that, measuring a team in IBM using TDD, the defect rate was reduced by 50% with minimal extra effort, compared to using ad-hoc testing. Later George and Williams reported higher quality with slightly higher cost (The TDD'ers being 18% more expensive than the control-group, and the authors speculate that the cost-difference was because the developers in the control-group did not write automated tests that they were required to do.) Nagapan, Maximilien, Bhat and Williams studied several teams working at Microsoft, and the core finding was that development cost increased 15-35% (measured in a subjective fashion, details omitted) whilst the pre-release bug density decreased with 40-90% (yes, one of the teams had a pre-release bug-density at 90% lower). There are some more studies, involving e.g. using student groups (with lacking experience) implementing the same features using either TDD or… that other approach - I don't know what to call it really - plain-old-hacking? But I think studies involving software professionals are more relevant. This is not a survey or meta-study, but the trend seems to be that if developers use TDD, quality is improved, with a small increase in pre-release cost.

As far as I'm concerned, the academic results places the choice between TDD and plain-old-hacking, squarely in the management camp. The development cost might be slightly higher,1 but an increase in quality that reduces both uncertainty and time spent in the "stabilization" period between feature-ready and actual release-time, should tilt the decision in favor of TDD. Plus, I'm quite confident that we can extrapolate the evidence to mean lower cost during maintenance and future development as well. Therefore, from a managerial and economical perspective, the only time you don't use TDD, is if you know you are building a short lived (e.g. development and maintenance totaling to less than a man-year) prototype that you know you are going to throw away (but damnit, the prototype was so successful, that you couldn't afford to chuck it… So there you are then, doing plain-old-hacking again writing legacy code from the start.) But this is opinion, based on others and my own anecdotical experience. And of course I want TDD to be "better" in some way, so I am definitely susceptible to confirmation bias.

Any such studies are, by their very nature, anecdotes. And the weaknesses with anekdotes are easy to point out by nay-sayers. E.g. we don't know the differences in skill-sets, perhaps the team who did TDD was rockstar programmers and still used more time, whilst the other team was mediocre hackers who could barely run the compiler and still delivered their product quicker than the rockstars. Who knows. Stupid rockstars who used TDD then. TDD-nay-sayers will latch on to any such weakness that can be found and never let go. Any (positive) academic result on TDD is bound to fall dead on the ground (I guess confirmation bias works both ways.)

On the other side, empirical evidence about TDD (or any other fab engineering tactics, like for example pair-programming) are also bound to be short sighted. Of practical reasons, they don't shed any light on the long term effects on a codebase evolving over the full lifetime of a library or a product. This puts us back to plain old rhetoric and creation of hot-air, breathing out what-if scenario argumentation, where the better and most convincing lawyer win the truth. We can not know how TDD influences productivity, quality and costs in a truly positivistic sense. That is, until we can simulate a skilled team of programmers, and then start two simulations at exactly the same time in their lives - in one which they use TDD and the other where they use plain-old-hacking - run the experiments a gazillion times (I don't have the time to brush up on the maths required to figure out how many experiments we need to run before we get a statistically significant result) and compare which one come out top cost-wise.

So what am I getting at really? After reading Proffitts blog-post, I realized that the academics is pretty much a straw-man when it comes to any one persons motivation for doing TDD. The real motivation should be the sentiment that as a software developer is a crafts-man. TDD, BDD, Design patterns, SOLID, Pair Programming or what ever technique or practice cannot be dismissed, without having put enough effort in to learning it, and practicing it long enough to do it well, and then choose to do it or not. After putting in the effort, I can make up my own mind. It's an important part of evolving and growing. That's why you should learn how to do TDD. Not because the quality of code becomes better (subjective and hard to measure, but it does, trust me on this), not because it make you work faster (you probably don't, especially not at the beginning), not because it makes the release-phase more predictive (it does that to), not because it makes you sexy (try "I do test-driven development" as a pick-up line) but - because in the end it expands your skill-set and the way you think about code.

If you don't do TDD, I don't think it says anything about your professionalism or skills as a developer. I do on the other hand believe that it might say something about you as a person. If you are not doing TDD, you are not doing TDD because you don't know how to do it. Because it's a skill, and learning this skill takes effort. Learning TDD takes a lot more effort than just doing the bowling kata a couple of times. Applying it in all kinds of different situations takes practice practice practice. And some have a bias towards not making that effort, unless the payoff in the other end is known. That's okay. I can understand that. Others yet have a bias towards not wanting to feel incompetent while learning the new skill, as plain-old-hacking have always worked in the past. I understand that as well. And then I think you are either one or both of those kinds of persons, or you just havent heard about TDD yet (unlikely, if you read this… )

So it boils down to these two kids swimming in a lake, where one say "Come over here, the water is warmer." The other kid can't know if that's the case. The only way to find out is to swim over and find out if the other kid was telling the truth. You can't understand the feeling you get, when you can do a totally safe large refactoring of your codebase, unless you have done it. Or the joy of plugging together the components you wrote using TDD and then have everything just working, without having experienced it. You can't understand how the unittests guides your design choices, without having the experience to understand how the unittests guides your design choices. When you start to get that, you are sucked in. You are test infected. And now, real quick, why isn't this a circular argument! Come over here, the water is warmer, and you'll understand it all so much better when you view it from this angle. I promise.

Footnotes:

1 I'm not convinced there is any added cost if you use TDD with, say, dynamically typed languages like Ruby, Python or Lisp.

2012-08-12

Presenter First with Qt

Somewhat recently a friend of mine sent me an article on the Presenter First pattern - a 'recipe' to build applications using the Model-View-Presenter (MVP) pattern with Test Driven Development (TDD). The article seemed to be tuned against C# and Java, so I wondered how I would apply Presenter First with Qt and C++. This is a short experiment with implementing a UI with Qt, using the Presenter First pattern.

The idea with the 'Presenter First' pattern, is to implement the Presenter first in a test-driven fashion. In the MVP-pattern, the presenter only see an abstract representation of the view and the model, listens to events from them and glue their behaviors together. In the PresenterFirst formulation from the aforementioned article, model and view are interfaces. Since we are using Qt, we want to utilize the signal and slot mechanism so both classes would be abstract classes that somehow inherits from QObject. For starters the Model can inherit QObject, and the View can inherit QWidget. Both will be abstract with only pure virtual functions and slots. While going through this blog-post I will put emphasis on the test-code and the Presenter code, and for the most part regard the View and Model code as 'given'. As you will see, the tests we implement turns out to correspond very closely to the usecases we are implementing and we are able to test all aspects of the behaviour of the application under user interaction.

The working example will be a simple "editor" with two controls: a list-view to list available entities by name on the left, and a text edit-view for editing the content associated with the name on the right. Lets imagine that the content are equations, so we call it an equation editor. When writing this, I wanted to try using a component from the Qt itemviews, because they are intentionally written in a way that merges the View and Controller components of a MVC-design. 1 This makes them a less-than-perfect fit with the MVP and PresenterFirst, but we'll see if we can make due never the less.

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj71IdBwxZNY7GK4Mc8eM1Y_NsAyHkf1kqmk2WtYBqPFtKncyoxvNGq08XSXDXJzvd8lQnEK6WAfuyf0lrnrt_C020ctKHjKeil5hsYD1-hdAvaU_mf8IK_Yo6YFgvWPCJNO8MJkfob-YEL/s1600/presenterfirst_ui.jpg

The ui should look something like this. Equation names in the left listview, the actual equation in the textedit on the right.

When interacting with the editor, we would like to implement the following user-stories:

  • When I select equation named E in the dropbox, the equation should show in the equation editor
  • When I click 'new equation', a dialogue should show for me to select a name for the new equation, and the new equation should afterwards show in the item-view on the left
  • When the user edit the equation-text for an equation X, then select a different equation Y, and then again select the recently edited equation X, the originally written text for equation X should be displayed in the editor
  • When I click on a selected equationname, I should be able to edit it, and only the new name should be used to reference the equation

It won't be an equation editor yet after implementing them, but we'll be off with a good start! It doesn't matter which of these stories we implement first, even though the last story could benefit somewhat from doing the first one in the list first.

Let's dive right in. We start with the add-new-equation story. The first tests for the presenter could thus look something like this:

void
PresenterTest::test_addNewEquation(){
  MockEquationView * v = new MockEquationView;
  MockEquationModel * m = new MockEquationModel;

  Presenter p( v, m );
  QString equationName("new equation");
  v->user_createNewEquation( equationName );

  m->verify_newEquationCreated( equationName );
  v->verify_modelChangedRecieved( 1 );
}

This reveals the basic MVP design. The Presenter have references to an abstract Model and an abstract View, and listens for updates via some sort of event-model to glue their behaviours together.

http://www.gliffy.com/pubdoc/3791985/L.png

The overall design of Model View Presenter, and presenter first. The presenter only knows about the abstract representations of view and model, and listens to events.

During the development we will implement the abstract classes EquationView and EquationModel, with their mock-objects MockEquationView and MockEquationModel. The 'View' here represents the target UI form, while the 'Model' stores equation-names and equations and ties them together. After we are done, we add ConcreteEquationView and ConcreteEquationModel - the latter developed with the help of TDD.

We drive the test by telling our fake view that the user want to "create a new equation", and afterwards we verify that there was a new equation created in the model and that the view got an update about changes in the model.

To see what's going on better, I'll prefix all the functions on the fake version of the view used to "simulate" user interactions in order to drive the presenter and the model with "user_" - so we don't confuse the method-names from the test-double with the methods of the "real" view. Also - I'll roll my own mocks, I will not use a mocking library in these examples (even though I could have benefited from using something like Google Mock.)

How should we implement this behavior? If the presenter listen to a signal from the view that asks for creation of a new equation we would be in good shape!

Presenter::Presenter(EquationView* v, EquationModel* m)
 : view_(v),
   model_(m)
{
  connect(view_, SIGNAL(createNewEquation()), this, SLOT(addNewEquation()));
}

This makes it obvious where the events come from. It also clarifies a presenter-first idiom: We 'drive' the Presenter through simulated user behavior, created by programmatically interacting with a fake view-class.

Note that the createNewEquation and addNewEquation signal-slot pair have no arguments, but from the user story we are expecting a specific name for the equation to be created - the story even specify that a dialog should pop up. We can't have dialogues popping up in our automated unittests - there's no user on the other end - so I want to stub out that behavior. There are many ways to achieve this. In the example article that inspired this blog post, they fired an event to show the dialogbox. Instead in this example I just ask the view to create a new equation name for me through createNewEquationName, and let ConcreteEquationView pop up the dialogue, get the name, and return this from an interface function. For the test-view, I implement createNewEquationName so it returns the name passed with user_createNewEquation().

void
Presenter::addNewEquation()
{
  QString eqname = view_->createNewEquationName(); 
  model_->addNewEquation(eqname);
  view_->modelChanged();
}

The code above should be sufficient to make our test pass. The modelChanged() function can be a pure virtual function or slot on the View object, and in our MockView we just register that it has been called - we don't need to introspect the view in order to know that this test passes.

Below is the state of EquationView, MockEquationView and EquationModel at this point. The verification code in MockEquationModel is similar to that of MockEquationView.

class EquationView : public QObject
{
public:
  Q_OBJECT;
  EquationView(QObject * parent = NULL);
  virtual ~EquationView();

  virtual void modelChanged() = 0;
  virtual QString createNewEquationName() const = 0;
signals:
  void createNewEquation();
};

class MockEquationView : public EquationView
{
public:
  Q_OBJECT;
  MockEquationView(QWidget * parent = NULL)
    : modelchanged(0)
  { 
  }

  virtual ~TestEquationView()
  {}

  void user_createNewEquation(QString equationname) 
  {
    createNewEquationName_ = equationname;
    emit createNewEquation();
  }

  virtual QString createNewEquationName() const 
  {
    return createNewEquationName_;
  }

  virtual void modelChanged() 
  {
    modelchanged++;
  }

  void verify_modelChangedReceived( int correctupdatecount ){
    QVERIFY(modelchanged == correctupdatedcount);
  }

private:
  QString createNewEquationName_;
  int modelchanged;
};

class EquationModel : public QObject
{
public:
  Q_OBJECT;
  EquationModel(QObject * parent = NULL);
  virtual ~EquationModel();
  virtual void addNewEquation(QString equationname) = 0;
};

This completes the first story. Next we enable selection of different equations in the listview. To be certain that this story is complete, we need a model with several items, and we need to verify that the correct equation text is set in the text-edit after changing the index.

void
PresenterTest::test_indexChanged()
{
  MockEquationView * v = createView();
  EquationModel * m = new MockEquationModel;
  populateModel(m); // adds three equations. 

  Presenter p ( v, m );

  int indexnr = 1; // indexnr < number of equations defined in Model.
  v->user_changeIndexTo(indexnr);
  v->verify_currentEquationIs(m->getEquation(indexnr));
}

This drives a little bit of design. We expect that, when we change equation in the listview, the view must be told what equation to put in the equation text-box. So it seems that the Presenter must listen to some change-of-index event, be able to fetch the equation for this index from the model, and then pass the correct text from the model, to the view. The presenter needs a slot like this:

void
Presenter::changeEquationIndex(int equationidx)
{
  v->setEquationText(m->getEquation(equationidx));
}

The view class need a corresponding signal that we can emit from MockEquationView::user_changeIndexTo(int indexnr) (or from the concrete view when we implement that.) This signal-slot pair needs to be connected in the Presenter constructor.

Presenter::Presenter(EquationView* v, EquationModel* m)
{
  connect(v, SIGNAL(createNewEquation()), this,  SLOT(addNewEquation()));
  connect(v, SIGNAL(equationIndexChanged(int)), this, SLOT(changeEquationIndex(int x)));
}

Our fake view can then verify that it got the correct equation text. Our test passes. Now we know that the View have been told to display the correct data both in the listview and the equation editor. The story is complete.

The third story, and the third test, is for the edit equation story. When we edit the equation text in the text-editor on the right in the view, the text should be sent to the model.

void
PresenterTest::test_editedTextIsSetInModel()
{
  MockEquationView * v = new MockEquationView;
  MockEquationModel * m = new MockEquationModel;
  populateModel(m); // adds a couple of name-text pairs

  QString newText = "My new Equation.";
  int activeIndex = 1;
  v->user_changeIndexTo(activeIndex);
  v->user_editEquationText(newText);

  m->verify_equationText(activeIndex, newText);
}

In the test we simulate that the user chooses an equation through user_changeIndexTo, and then simulate that the user_editEquationText to something. Then, in order to verify that our Presenter works as we want it to we check that the equation-text sent to the model is correct.

In order to make this test pass, we need two things: A signal from the view that tells the presenter that text have changed, and a corresponding slot on the presenter that updates the equation-text associated with the current active equation. That means we would need to keep track of what the current equation is. The tracking of current state belongs in the presenter.

Presenter::Presenter(EquationModel * m, EquationView * v, QObject * parent) :
  QObject(parent),
  model_(m),
  view_(v),
  currentEquationIndex(0)
{
  connect(v, SIGNAL(createNewEquation()), this,  SLOT(addNewEquation()));
  connect(v, SIGNAL(equationIndexChanged(int)), this, SLOT(changeEquationIndex(int x)));
  connect(v, SIGNAL(equationChanged(QString)), this, SLOT(currentEquationTextChanged(QString)));
}

void
Presenter::changeEquationIndex(int equationidx)
{
  currentEquationIndex = equationidx;
  v->setEquationText(m->getEquation(currentequation));
}

void
Presenter::currentEquationTextChanged(QString equation)
{
  m->setEquation(currentEquationIndex, equation);
}

We added the connection in the constructor, added currentEquationIndex as state in the presenter, and implemented setActiveEquation. A handful of lines of code, and our test is passing.

No for the final test: Editing the name of an already added equation. So far we've only used signals and slots from Qt, and we have talked about how new equations come to be (and that this requires the View to receive a modelChanged signal) and how the text in the equationview is updated. But we have not talked about how the listview with equation-names to edit is populated - this have been a detail we could just gloss over so far. Now that we should be allowed to change the name of the equation in the listview, we need to supply an editable model from Qt's modelview components to the view and then test that the data stored in it is updated. Granted, we could make this work without using an QAbstractItemModel - we could somehow have kept the names in our model in synch with what's in the view by setting a QStringList of equation-names every time the model changes or adding and removing names by some other means. This would increase the amount of logic we would have to add to the concrete View-implementation - the code we are trying to keep as minimal as possible. By using an QAbstractItemModel on the itemview and simulate the interaction from the MockView, everything is more or less automated from the ConcreteEquationView's viewpoint.

For our next test then, the view must change the name for an item, and we need to check that the name is updated in the model.

void
PresenterTest::test_equationNameIsChanged()
{
    MockEquationView * v = new MockEquationView;
    MockEquationModel * m = new MockEquationModel;
    QString originalname = "Name0";
    int idx = m->addNewEquation( originalname, "Some text");
    Presenter p(m, v);
    QString newname = "Name1";
    v->user_changeEquationName(idx, newname);

    QCOMPARE( m->getEquationName(idx), newname );
}

The key driving function of the test - user_changeEquationName - simulates the behavior of the listview by first creating a correct QModelIndex for the name and then call abstractlistmodel->setData(index,data) to update the name stored in the model.

There are several valid alternatives for how to get an QAbstractListModel from our EquationModel:

  • The Presenter can act as a proxy and inherit from QAbstractListModel and get the necessary data from it's EquationModel reference.
  • We could create a concrete EquationListModel that inherit from QAbstractListModel and which take an EquationModel pointer as a constructor argument and sends getData and setData calls to it (again representing our model by proxy.)
  • The EquationModel it self could inherit from QAbstractItemListModel, and be both.

What matters most at this point, is that these are the options. There are three. Not four or five. A getter or factory-method on EquationModel that returns an QAbstractItemModel is a jolly bad idea. 2

In order to quickly make the test pass, I chose the last option - the EquationModel inherit from QAbstractItemListModel - then I only need to add a setData function to it, and use it to update the equation-name.

The constructor of Presenter now only need to be updated so that it passes the EquationModel (as a QAbstractListModel) to the EquationView, and after a correct implementation of setData my test passes.

Presenter::Presenter(EquationView* v, EquationModel* m)
{
  connect(v, SIGNAL(createNewEquation()), this,  SLOT(addNewEquation()));
  connect(v, SIGNAL(equationIndexChanged(int)), this, SLOT(changeEquationIndex(int x)));
  connect(v, SIGNAL(equationChanged(QString)), this, SLOT(currentEquationTextChanged(QString)));
  v->setModel( (QAbstractListModel*) m ); // cast for emphasis
}

Voila, all user stories implemented test first, without ever having to open an application with a Widget. So far so good.

There are several observations we can make while wrapping up here. Notice first how all the tests "simulate" the user interaction. The driving functions get names that are very simmilar to the user stories: user_editsEquationText is a good example. Also, notice how none of the tests actually calls functions on the Presenter, save from the constructor. The tests only calls functions on the mocks, and the behavior of the Presenter is then implicitly verified. All Presenter slots and functions can in other word be private. Both of these observations are in line with the experiences reported in the previously mentioned article.

I also want to emphasis how the tests are agnostic to many of our design decisions. Notice for example how test_changeEquationName is totally ignorant to our actual design choice - we can choose any one of the three alternatives proposed for getting a QAbstractItemModel, and the test still stays the same. Presenter First drives you to implement your tests on a comfortable distance from your design, where they support your design decisions without geting into the way when refactoring.

When using Presenter First with Qt, all UI-classes in most cases only need to relay signals from the relevant ui-components to the Presenter via it's own signals, and implement the slots to set the data to display in the correct ui components. A test of the GUI simply need to verify that it is hooked up - e.g. that pressing a button makes the view emit the correct signal - as the test-driver for the presenter tests the actual functionality behind.

There is a full example with a view implementation in addition to the tests at bitbucket. It's not an equation editor yet, but as far as being an example of Presenter First with Qt, it suffice. Thank you for reading!

Footnotes:

1 This is, btw, the design choice in Qt I have the biggest beef with as it lures the unwary developer to add tons of behavior in view classes. The result is often classes that mix responsibilities and are difficult to test.

2 I also might mention here that I usually just avoid the QTreeWidget by default. In order to avoid the duplication of data and the efforts needed to keep the QTreeWidgetItems in synch with the underlying datamodel I like to implement a QAbstractItemModel around my data instead.

2012-05-30

Testable C++ and getters in interfaces

There are written plentiful about the evils of getters and setters in objects, with the simplest - and most extreme - version perhaps being "dont use getters in your classes, period." The reasons can often be on a quite abstract level - "getters and setters promote coupling" (what's coupling again?), "an interface must only define behavior and not state" (eh… why's that?), "getters breaks encapsulation" (encapsuh… muh?) One can refer to some code-smell we haven't read about - like that getters are an indication of feature-envy and then top it off with simplistic catch-phrases like "tell, don't ask" - a shorthand for the Law of Demeter for functions (Law? We have laws now? I just want to program!). Mockery aside, I of course agree with all of them. Writing getter-less code is often a very sensible strategy - in the same sense that breathing is often a very sensible strategy, even though there are times you actually would like to hold your breath1. And in my opinion the worst kind of getter, is an abstract getter - or a getter in a interface.

I really like the concept of interfaces. 2 Inserted at the right places they can, as a hot knife through butter, decouple and/or encapsulate large pieces of otherwise strongly coupled unmaintainable legacy code. Used well, interfaces promote the elusive, valuable and sought-after property we might call "testability" - the property code has when it is easy to write a (unit-) test for new functionality that also utilize existing parts of the code (e.g. the usual stuff one does while developing applications and libraries.)3 Used wrongly, interfaces kill enough "testability" to ruin my working-day in a second. And take "ruin my working-day" to mean "incurring unnecessary extra cost on the project."

Lets just get a couple of things off the table right away. First of all, I'll just assert without further argument that testability is a property that we both want and need, and that it is cost efficient to ensure that we have it. Second, testability is brittle - there are probably as many ways to break it as there exists software developers. Third, in order to be testable our code must be SOLID. In other words - testability is something that don't come naturally, and it is something that we, as software developers, need to be a little bit obsessed about. I've usually worked with C++, and I would prefer to do my job doing test-driven development 4. In order to be efficient in doing so, the codebase must be testable. The goal with this post is just to elaborate a little bit on how getters in interfaces totally kills testability (and my TDD-zen.)

Consider the following artificial source, purpose built to demonstrate what happens when interfaces define getters. The system-under-test (SUT) is the class NewFeature. This is the object that has behavior I need to test. The method I need to test is of course behaviorToTest, which needs an instance of type Foo as an argument. Foo is an interface, and Foo exposes some other types and interfaces through getters (put there to exemplify different situations arising from getters in interfaces.) Foo, Bar and Baz are written and maintained by "someone else".

class Foo;

class NewFeature {
public:
  NewFeature();
  void behaviourToTest(Foo & the_Foo);
};


class Bar;
class Baz;

typedef std::shared_ptr<Bar> BarPtr_t;
typedef std::shared_ptr<Baz> BazPtr_t;
typedef std::vector<BazPtr> BazPtrVector_t;

class Foo {
public:
  virtual ~Foo(){}
  virtual size_t getCurrentNumber() const = 0;
  virtual BarPtr_t createBar() const = 0;
  virtual const BazPtrVector_t & getBazVector() const = 0;
  virtual void nonConstBehaviour() = 0;
};

class Bar {
public:
  virtual ~Bar(){}
  virtual void keepBarIng() = 0;
};


class Baz {
public:
  virtual ~Baz() {}
  virtual void identifyYourBearings() = 0;
};

Let's just agree that as far as we can see, this code is SOLID. Or, okay - whether it follows the single responsibility principle or not is hard to tell, as they are all just junk-names (and if I squint I'd might say that there are indications that implementations of Foo is supposed to do a lot more than one thing.) We don't know much about Liskov's substitution principle either, as being compliant is in many ways up to those implementing these interfaces - but lets assume all implementations of Foo are compliant. Open-Closed, Dependency Inversion and Interface segregation looks all catered for.

So… how does this SOLID code manage to break my TDD-zen? Lets begin with writing a test testing one aspect of the SUT…

void
TestNewFeature::test_thatTheFeatureBehaves()
{
  NewFeature sut;
  sut.behaviourToTest(
  // Doh. I need a concrete Foo. And thus I need a concrete Bar. And I'm
  // also likely to need a bunch of concrete Baz-es, in a vector of shared
  // pointers that is. Who the h#%% wrote this $#|+?
}

And behold - before long, I need to roll three test-doubles (or mocks) of my interfaces, and the stuff returned by their getters must make sense to my code. Further, the getters are giving quite strong requirements on how these stubs or mocks must be implemented.

The first method, Foo::getCurrentNumber(), is probably the easiest to implement. It seems like it reports about some internal state of Foo, and that it will have some effect on the behavior we are implementing in NewFeature. So at least, the number returned must be meaningful (the application context will provide the context for figuring what is 'meaningful') It is also likely that such a getter in a real interface force us to implement and keep track of some meaningful state within the stub - complicating the implementation of the test-code (increasing the possibility that the stub it self contains a bug.) All in all, in addition to increase the time spent writing the test, this getter makes the test harder to maintain as "more code must be right" for the test(s) to be right.

Foo::createBar() is obviously a factory method. Hence, it can be allowed to return a new instance of Bar every time. That is actually quite okay - "create" is a behavior good as any. But - still I do need to find or write a concrete Bar-class, and this created object perhaps need to track the number of calls to keepBarIng() in order to verify the behavior of NewFeature. The work needed to perform just to write this one simple test is increasing, and the likelihood that Average Joe Developer actually will write it is shrinking dramatically with this added amount of work and hazzle.

Then, the last getter is Foo::getBaz(). This is the worst kind of getter. This is the kind of getter that makes me go to Hell for use and abuse of profanities. Because it returns not only a vector of pointers, but a const reference to a vector of pointers. In other words, in order to get my stub up and running, it must have a field to hold that vector of Baz-pointers. Actually, all classes that inherits from Foo requires a field to hold that vector of Baz-pointers… (The getter breaks encapsulation, because it reveals details of the implementation, and it actually enforce duplication across implementations.) Not to mention the work I need to perform to roll meaningful concrete Baz types and populating the test-vector with them. The likelihood that this test ever gets written, is plummeting. Like a sack of tomatoes dropped out of a low-flying aeroplane as a step towards making the worlds biggest greek salad 5.

Finally for the worst part. NewFeature::behaviorToTest should function correctly no matter what version of Foo it get. And it is not in the place of NewFeature to define what the correct behavior of Foo and friends is - like for example asserting or throwing exceptions if Foo does something NewFeature don't expect. This is a touch of the Liskov Substitution Principle: Foo's behavior is defined by the already more or less established contract for correct Foo-behavior, and NewFeature must roll with that - the only 'change' to the established contract of correct Foo-ing NewFeature can perform, is accepting a wider range of behaviors. And - what can happen behind the scenes in implementations of the Foo interface? Take the methods Foo::nonConstBehavior() and Baz::identifyYourBearings() - NewFeature can use them (and they might be the main reason for being passed Foo in the first place), and through internal wiring in Foo, calling those methods might end up changing the internal state of the Foo-implementation. As this state leaks out via the getters, NewFeature must (surprisingly) anticipate such changes. Disaster ensues! Both methods can for example potentially change what Foo::getCurrentNumber() and Foo::getBazVector() would return. Even worse, so can Bar::keepBarIng() as far as we can tell. And even though Foo::createBar() looks like a regular factory method, it would come as a unwelcome surprise if the created object actually can change the internal state of the Foo object. All these side-effects are stupid6, but possible7. How the classes are allowed to interact in a 'correct' implementation of Foo when Foo have getters, is at best muddy. Some of these cases are likely to occur and then they actually need to be tested (if we want to be certain NewFeature behaves correctly) - adding to the number of tests that need to be written. All of a sudden, testing NewFeature::behaviorToTest have turned in to an ambiguous time-consuming nightmare.

And this is then the point of this blogpost: While interfaces in C++ is a honking great idea and in general promotes testability, getters in interfaces require clients to write more scaffolding, more test-code and more tests in the face of ambiguous behavior (caused by the getters!) This makes it harder to write tests, thus reducing testability. It makes it more likely that code using these interfaces are not covered by unittests.

So - TDD-Zen broken and all… What to do, if I end up in some situation like this? With a handful of getter-ridden coupled interfaces provided and maintained by others? Awh… then I might just think "craps", scrap my test, and cram my stuff in to the NewFeature and go home to a sleepless night contemplating the havoc that might be caused by NewFeature not being tested in a proper fashion. I also might of course actually write the stubs and mocks I need, or find some that someone else wrote, and then test NewFeature. But I still would be annoyed enough with both the unnecessary cost incurred while implementing the (scaffolding needed for the) test and the problems (e.g. future costs) that will arise from maintenance of the SUT when I'm long gone from the project, that I would write a blogpost about it.

Footnotes:

1 So there's no need to tell me that you need a getter from time to time. I know.

2 Java and C# come with interfaces defined as part of the language, in C++ this means classes with only pure virtual methods.

3 My "definition" of testability implies code that have low coupling.

4 Or Test Driven Design if you so prefer

5 Did I talk about the likelihood, or the plummeting?

6 Stupid, in software development jargon, means unmaintanable, bug-ridden and unnecessarily complex

7 Possible, in software development jargon, means "it occurs somewhere in your current codebase, and YOU made it happen."

2012-02-19

Discovering Design

Discovering Design

1 A simple editor

In this post, I'll try to demonstrate how good and reusable designs and components sort of pop up - or emerge - almost all by them selves, as long as you pay attention to some very simple rules - the system for simple design described by various eXtreme Programming practitioners, and communicated by Kent Beck in his book Extreme Programming Explained as follows:

In prioritized order, your code always:

  1. Runs all the Tests
  2. Reveals all the intention
  3. Have no duplication
  4. Contain the fewest number of classes or methods

In this post, most (if not all) programming is "rear mirror driving", meaning that I intentionally avoid thinking to much of the future. Instead I try deliberately to keep only three things in my head (leaving four of my seven memory banks for the programming part.) The first thing is the simple rules, second the current test/requirement and finally the current state of the code (you can be a fusspot and point out that this is a lot more than three things. I'll let that pass.)

I'm not claiming that rear-view-only is a brilliant strategy for writing software. But it's a good way to program in order to make the point I'll be making in this post - namely that these simple rules fosters good designs, and that reusable components (some times surprisingly) tend to just show up in the source, without much forethought, when adhering to them. Keeping my attention to the simple design-rules make the source code itself talk, and it describes how it want to be in order to elegantly assimilate the features I'm adding. Putting (accurate) forethought into the mix might of course improve efficiency - but no forethought today.

Also, as a side-note I might mention that while working with this I discovered that this blogpost is just an attempt at a practical demonstration of what Martin Fowler wrote about in his excellent keynote from 2004.

The programming task we are embarking on is to write a simple text editor with undo and redo capabilities. A short list of requirements are known up front and written in a working document I'll refer to as my todo-list (given below.)

- Add characters to a "document"
- Delete characters from the "document"
- After performing any edit, the user should be able to undo the
  last edit.
- The user also should have the ability to regret undoing, by
  performing a "redo".
- The capability to perform copy and paste will be very nice to
  have.

The code in this post will be written in D (Because I just spent a little time to learn the basics - be warned, I might get some D-idioms totally wrong. Feel free to let me know if I do.)

2 Starting with baby-steps

The content of my file initially looks like the following - the minimum I need, to have something that compiles in to an executable and to which I can add unittests.

unittest{
}
int main(){
  return 0;
}

This is one of the many nice features of D, btw. It has built in unittests. The code in the unittest-sections is compiled in and executed before the main function, if the program is compiled with the -unittest switch. A language that takes unittests seriously at its core. Nice. I can relate to that.

Being a practicing TDDer, I'll start with my first simple test - testing that an empty document is empty. The following does of course not compile:

unittest {
  Editor ed = new Editor;
  assert(ed.size() == 0);
}


int main(){
  return 0;
}

By "faking it 'til I make it" I add the base scaffolding and a null-op implementation that will make my test pass. I'll toss in a run returning 1 just to se the test fail - it does, with a nice display of the current stack. After adding the necessary code, my file looks like this:

class Editor { 
  size_t size() {
    return 0;
  }
}

unittest {
  Editor ed = new Editor;
  assert(ed.size() == 0);
}


int main(){
  return 0;
}

This first section just establishes the basic familliar TDD working pattern:

  1. Write a failing test (failing compilation counts as a failing test)
  2. Make it run (possibly with a minimum effort or by faking it)
  3. Refactor in the green.
  4. Lather, rinse, repeat

I'm writing and programming as I go, trying to mimic the style of Kent Beck's book "Test Driven Development". Also the case-studies in Robert C. Martin's "Clean Code" has been an inspiration.

3 Writing to the document

I'll write the first real test: add some text to the document, and check that the size is identical to the size of the added text. The test fail. I then concatenate any edits to the internal document in the write-function, and the test passes. Fail, pass, clean, fail, pass, clean… The testrunner yields no output - and no news is good news with our current testing framework. The code looks like the following at this point, with a test that the size of the current document matches what we added to it.

class Editor { 
  size_t size() {
    return document.length;
  }
  void write(string edit){
    document ~= edit;
  }

  private string document;
}

// Editor is empty after initialization
unittest {
  Editor ed = new Editor;
  assert(ed.size() == 0);
}

// Editor contains as many characters added, after a write.
unittest {
  Editor ed = new Editor;
  string text = "some text we're adding";
  ed.write(text);
  assert(ed.size() == text.length);
}

At this point we have a simple, but not very useful, document editor that allows us to add a string to an internal document and check how much we've added to it. The unittests are separated in two commented blocks, in order to run with separate states - avoiding leakage of state in between tests.

4 Revealing test intentions

Looking in the rear-view mirror, I already notice two things. First - the comments I've added to describe what's happening in the unittest blocks. Rule 2, the code reveal all intention. Doesn't the comments reveal the intent? No - the comments are not code. They are prose. The code it self should reveal the intent. The comments could instead be names for functions on an object - lets call it a "fixture" (Ok, I didn't invent that just now. Sorry, contaminated state) - then the code would reveal the exact intent.

My first step would thus just be to move the tests as is in to a class, and call this function from the unittest-blocks like follows:

class EditorTest{
  void test_ed_is_empty_after_initialization(){
    Editor ed = new Editor;
    assert(ed.size() == 0);
  }
}

unittest {
  EditorTest fix = new EditorTest;
  fix.test_ed_is_empty_after_initialization();
}

It is equivalent with the old code in functionality, and the name of the function clearly expresses it's intent. I run tests, all green (still no output) and follow up with an equivalent edit for the other unittest-block.

5 Adding undo to the editor

At this point I'm quite happy with fixing the expressed intent, so I would like to put some more production code in there, postponing the duplication of initialization that now cropped up in the file.

Adding undo seems like a natural choice from my todo. I can think of three situations I can undo and test right away - undo on an empty document, undo after one edit, and undo after multiple edits. I add those to my todo-list, and starts on implementing undo after one edit. Wich is equivalent with resetting the state of the document - I should probably put reset on the todo-list as well.

I add a failing test for undo after one edit, fake it so it passes by implementing reset, done, commit.

6 Derailed by rule two and three again

The fixture looks like this now, and as you might observe, I have three lines initializing the Editor and three unittest-blocks executing a named function on our fixture.

class EditorTest{

  void test_ed_is_empty_after_initialization(){
    Editor ed = new Editor;
    assert(ed.size() == 0);
  }

  void test_ed_contains_as_many_characters_added_after_a_write(){
    Editor ed = new Editor;
    string text = "some text we're adding";
    ed.write(text);
    assert(ed.size() == text.length);
  }

  void test_undoing_one_edit_is_equivalent_with_resetting_the_state(){
    Editor ed = new Editor;
    string text = "some text we're adding";
    ed.write(text);
    ed.undo();
    assert(ed.size() == 0);
  }
}

unittest {
  EditorTest fix = new EditorTest;
  fix.test_ed_is_empty_after_initialization();
}

unittest {
  EditorTest fix = new EditorTest;
  fix.test_ed_is_empty_after_initialization();
}

unittest { 
  EditorTest fix = new EditorTest;
  fix.test_undoing_one_edit_is_equivalent_with_resetting_the_state();
}

My last clenaup fixed the intention. Now I have some duplication I would get rid of. From the looks of it, I'll probably do something simmilar for every test I'll be adding. It seems like a regular pattern of sorts, doesn't it? Create an object, manipulate it, and then verify it's state or behaviour (now, was that familiar… )

I also would like to keep the clean state between every function in the test as it is now - this calls for some kind of re-initiation mechanism to set up the test before calling the test method, and maybe something to tear it down afterwards.

The three unittest blocks look identical on an abstract level, so I should be able to find some kind of abstraction or mechanism to merge them in to one. In other words, I want the single unittest block to look somewhat like this:

unittest {
 EditorTest fix = new EditorTest;
 fix.runTests();
}

My plan will be to have an array of closures, and I can build that in the constructor of EditorTest for now. Also, following the pattern from xUnit, I add a setUp function in order to re-initialize the fixture.

This is a point where I have to leap - I know all the changes I need to do, but I don't understand at the time of writing how to "ease in to it" in small, stable intermediary steps.

I start by adding the new unittest codeblock as above, with a failing runTests function. It fails with a long stack-trace that looks like the following.

banach:Discovering rolvseehuus$ dmd -unittest editor.d && ./editor 
core.exception.AssertError@editor(26): Assertion failure
----------------
5   editor                              0x0000958d onAssertError + 65
6   editor                              0x000129f2 _d_assertm + 30
7   editor                              0x000025ef void editor.__assert(int) + 27
8   editor                              0x000026d9 void editor.EditorTest.runTests() + 21
9   editor                              0x00002873 void editor.__unittest4() + 39
10  editor                              0x000025d2 void editor.__modtest() + 26
11  editor                              0x00009a01 extern (C) bool core.runtime.runModuleUnitTests().int __foreachbody260(ref object.ModuleInfo*) + 45
12  editor                              0x00004db3 int object.ModuleInfo.opApply(scope int delegate(ref object.ModuleInfo*)) + 79
13  editor                              0x000098f2 runModuleUnitTests + 134
14  editor                              0x00013122 extern (C) int rt.dmain2.main(int, char**).void runAll() + 38
15  editor                              0x00012c99 extern (C) int rt.dmain2.main(int, char**).void tryExec(scope void delegate()) + 29
16  editor                              0x00012c33 main + 179
17  editor                              0x000025a5 start + 53
banach:Discovering rolvseehuus$ 

Now quick, which test failed? I notice that the time I'm spending to figure out what test is failing, is increasing with the number of unittest blocks I'm adding. This time it's obvious what is failing as it failed intentionally, but pretty soon, I'll start refactoring some model code, a couple of unforeseen tests will fail, and I'll be wasting time by trying to figure out which one. The two seconds spent looking for a familiar function-name in the above stack-trace, warrants a reporting mechanism in my test-system. I add the issue to my todo-list, and go back to setting up my list of test-functions in the constructor.

I move the tests in to a list of closures one by one, running the tests in between every time. I also add the setUp function. When done, my fixture and the unittest-block looks like this:

class EditorTest{

  this(){
    functions = [ delegate(){ test_ed_is_empty_after_initialization();},
                  delegate(){ test_ed_contains_as_many_characters_added_after_a_write();},
                  delegate(){ test_undoing_one_edit_is_equivalent_with_resetting_the_state();} ];
  }

  void setUp(){
    ed = new Editor;
  }

  void runTests(){
    foreach(func; functions){
      setUp();
      func();
    }
  }

  void test_ed_is_empty_after_initialization(){
    assert(ed.size() == 0);
    assert(false);
  }

  void test_ed_contains_as_many_characters_added_after_a_write(){
    string text = "some text we're adding";
    ed.write(text);
    assert(ed.size() == text.length);
  }

  void test_undoing_one_edit_is_equivalent_with_resetting_the_state(){
    string text = "some text we're adding";
    ed.write(text);
    ed.undo();
    assert(ed.size() == 0);
  }

  private Editor ed;
  private void delegate()[] functions;
}

unittest {
  EditorTest fix = new EditorTest;
  fix.runTests();
}

I "registered" all known test-functions in the constructor, and created a runTest function that loops through all tests, executing setUp before every test. I've succeeded in eliminating the duplicate initialization line for the editor - rule 3 compliant. Nice.

But darn! This refactoring subtly changes the behavior of the test-system: All tests are not executed, if one of them fail (How could that happen? Mhm… I know - now test for that functionality…) I decide to quickly fix this with a try-catch-finally block with some state variables and do some experiments with failing tests in order to make sure all is hunky-dory. The completed runTests function now looks like the following:

class EditorTest{
  // ...
  void runTests(){
    int passedCount = 0;
    int failedCount = 0;

    foreach(func; functions){
      bool passed = true;
      try{
        setUp();
        func();
      }catch(Throwable e){
        passed=false;
      }
      finally{
        passed ? passedCount++ : failedCount++; 
      }
    }
    writeln(" Executed ", passedCount + failedCount, " tests");
    writeln(" Passed ", passedCount);
    writeln(" Failed ", failedCount);
  }
  // ...
}

There - all tests are executed no matter what happens, and I get a report telling me if all or just some succeeded. As you might notice, I don't get that full report I was asking for some sections ago about exactly what test failed, but now it is quite clear to me how I can add such a report. A mapping between function and function name instead of an array, and a collecting mechanism in runTests to gather the results. I don't need to add this right away though - I can add some production code instead to get some traction.

Was the refactoring a lot of work? It is hard to tell while I'm writing this - I'm constantly being interrupted by my 8 months old son, while squeezing in some programming and writing every now and then when he is sleeping or playing for himself - but I don't think I ever have spent more than two minutes in "coding mode" thus far in this post (as opposed to "writing mode")

7 Back on track - undo some more

The next test I want to add, is for undoing the last edit after multiple edits. I write a test that iterates through a list of strings and add them to the editor while concatenating them. Finally I write a string to the editor and undo that edit. The editor's internal document and the one generated while executing the test should now be identical:

void test_undo_after_multiple_edits(){
  string[] edits = ["one", "two", "three"];
  string document;
  foreach(edit; edits){
    document ~= edit;
    ed.write(edit);
  }
  ed.write("lasttoberemoved");
  ed.undo();

  assert(ed.getDocument() == document);
}

I compile and execute and - doh - everything passes, and it still looks like only three tests were executed… Didn't I just add the fourth?

banach:Discovering rolvseehuus$ dmd -unittest editor.d && ./editor 
 Executed 3 tests
 Passed 3
 Failed 0

Of course! I needed to add the function in the constructor. That short term memory.. The test fails as expected when fixed:

banach:Discovering rolvseehuus$ dmd -unittest editor.d && ./editor 
 Executed 4 tests
 Passed 3
 Failed 1

This makes it clear to me, that the duplication I have in the tests - that I must add them two times, both in the fixture and the constructor, is a form of duplication that can result in some confusion and Zen-breakage. I add it to my todo-list and continue with the current task - making the failing test pass.

When having the failing test for undo after performing several edits, we are finally getting to the core of the matter. How should we go about storing and changing states inbetween the history of edits? A really good solution would be doing what versioning systems like e.g. git and mercurial does - just store the latest diffs or edits in a stack, and use these to recreate the old document given the current document. It sounds like a lot of work though, so in order to just quickly make my test pass a straight forward solution would be to push the entire current document on to the stack, and then make the edit afterwards. Here is the resulting editor:

class Editor {

  size_t size() {
    return document.length;
  }
  void write(string edit){
    history ~= document;
    document ~= edit;
  }
  void undo(){
    document = history.back();
    history.popBack();
  }
  string getDocument(){
    return document;
  }

  private string document;
  private string[] history;
}

While implementing this functionality, an extra test failed - and I didn't know which one. I wasn't so deep in to things that I wasn't able to immediately figure out what was wrong, but the lack of a good report about what tests are failing is starting to get on my nerves. What will happen when I have a hundred tests? I move reporting from the test-runner up a notch on my todo-list (The reason for the surprising fail was popping the end of an empty array - so this prompts me to write a test for calling undo on an empty editor as well.)

But behold. Now I clearly have a useful editor-ish component! I can add things to it, and I can undo edits. Awsome.

8 Naming the tests

What to do next? This might be a good time to show my todo-list so far. It contains all requirements mentioned at the beginning, things I've discovered underway, and the technical issues that have surfaced.

x Add characters to a "document"
x undo empty
x undo with only one edit
x undo with multiple edits
x After performing any edit, the user should be able to undo the last edit.
- clearing or resetting the editor, and that this can be undone.
- Report from test
- Delete characters from the "document"
- The user also should have the ability to regret undoing, by performing a "redo".
- Duplication w.r.t creating tests.
- The capability to perform a copy and paste will be very nice to have.
- Optimize storage for undo-stack.
- Duplication when adding tests to the sut

The number of tests are increasing, so I choose to go for the reporting mechanism. I could add a name together with the function delegate in the constructor. This adds to the amount of things needed to be done in order to set up a test, but I decide that knowing what test is failing is worth that extra cost, at least temporary - I'll get back to this later when resolving the todo about duplication when creating tests. Here is the changes for the edit required in order to add a name for each test:

banach:Discovering rolvseehuus$ hg diff editor.d 
diff -r 56b27d29e4be editor.d
--- a/editor.d  Mon Jul 25 12:10:03 2011 +0200
+++ b/editor.d  Mon Jul 25 14:36:02 2011 +0200
@@ -27,11 +27,11 @@
 class EditorTest{

   this(){
-    functions = [ delegate(){ test_ed_is_empty_after_initialization();},
-                 delegate(){ test_ed_contains_as_many_characters_added_after_a_write();},
-                 delegate(){ test_undoing_one_edit_is_equivalent_with_resetting_the_state();},
-                 delegate(){ test_undo_after_multiple_edits();},
-                 delegate(){ test_undo_on_empty_does_not_raise_an_exception();}];
+    functions = [ "test_ed_is_empty_after_initialization" : delegate(){ test_ed_is_empty_after_initialization();},
+                 "test_ed_contains_as_many_characters_added_after_a_write": delegate(){ test_ed_contains_as_many_characters_added_after_a_write();},
+                 "test_undoing_one_edit_is_equivalent_with_resetting_the_state": delegate(){ test_undoing_one_edit_is_equivalent_with_resetting_the_state();},
+                 "test_undo_after_multiple_edits": delegate(){ test_undo_after_multiple_edits();},
+                 "test_undo_on_empty_does_not_raise_an_exception":delegate(){ test_undo_on_empty_does_not_raise_an_exception();}];
   }

   void setUp(){
@@ -42,7 +42,7 @@
     int passedCount = 0;
     int failedCount = 0;

-    foreach(func; functions){
+    foreach(name, func; functions){
       bool passed = true;
       try{
        setUp();
@@ -98,7 +98,7 @@
   }

   private Editor ed;
-  private void delegate()[] functions;
+  private void delegate()[string] functions;
 }

Even though changing to an associative array in this case was incredibly simple, I cringed a little bit. This is a point where I really would have liked to have a test for the runTests function. I add this to the todo-list. Observing a test failing is still test good enough for me though.

I implement the reporting mechanism in runTests just by collecting a list of names for tests that fail, and write a report on it afterwards. The new runTests function looks like this:

void runTests(){
  string[] failedTests;
  foreach(name, func; functions){
    bool passed = true;
    try{
      setUp();
      func();
    }catch(Throwable e){
      passed=false;
      failedTests ~= name;
    }
    finally{
      passed ? writef(".") : writef("F");
    }
  }
  writeln("\n\nExecuted ", functions.length, " tests\n");
  if( !failedTests.length ){
    writeln("All tests executed successfully!");
  }else{
    writeln("There were errors!\n");
    foreach(fail; failedTests){
      writeln("Failed test: " ~= fail);
    }
    writeln("");
  }
}

I added some formatting, and the resulting code - with formatting and test naming - is not any bigger than the previous version in terms of lines of code. A successful run looks like the following:

banach:Discovering rolvseehuus$ dmd -unittest editor.d && ./editor 
.....

Executed 5 tests

All tests executed successfully!
banach:Discovering rolvseehuus$ 

Injecting an error looks like this:

banach:Discovering rolvseehuus$ dmd -unittest editor.d && ./editor 
F....

Executed 5 tests

There were errors!

Failed test: test_ed_is_empty_after_initialization

banach:Discovering rolvseehuus$ 

If you think the report look a bit like if it was inspired by Python's test framework, you are indeed right - I liked it a lot when I did Python programming for a living.

9 Redo!

After having working tests, and a testing system that tells me what test fails if any, I feel confident enough to add the next feature. I choose Redo. How should Redo look? Redo should only work after performing an undo (A clarification was needed with the customer. E.g. me.) So redo after writing, is a null-op. Also, writing something to the document clears the redo-stack. I add two cases to my todo-list: Redo after a plain edit does nothing to the document and checking that redo works after doing several undos (being psychic, I know that I can easily fake that redo after one undo, while multiple undos/redos are more difficult to fake.) I quickly implement the test for no change with redo after edit - it is almost to easy to write, but I'm guessing it will be a nice hedge later on when I implement the real redo functionality.

Implementing the test for multiple undos/redos is trivial: Add several edits, then undo all and redo all. The internal document after redoing should then match the one we had after adding the edits.

void test_redo_with_undos(){
  string[] edits = ["one", "two", "three"];
  string document;
  foreach(edit; edits){
    document ~= edit;
    ed.write(edit);
  }

  foreach(count; 0..edits.length){
    ed.undo();
  }
  foreach(count; 0..edits.length){
    ed.redo();
  }
  assert(ed.getDocument() == document );
}

The test fails predictably. (Phew - I remembered the tedious scaffolding this time.) Implementing by pushing the items popped while undoing over to a redo stack, and then push these back when redoing will work nicely. My first obvious implementation make the test pass, but it makes the previously test for redos without any undos fail. (I somewhat expected that, nice.)

It turned out that it was because an exception was rised when I tried to pop the back from an empty array again. There is no way to separate a regular failure from an error like this in the current testing framework… I make a note of that in my todo-list.

The undo/redo functions now looks like this (I renamed the history member variable to undoStack, as it paired up nicely with the redoStack name. This communicates intent nicely.)

void undo(){
  if(undoStack.length > 0){
    redoStack ~= document;
    document = undoStack.back();
    undoStack.popBack();
  }
}
void redo(){
  if(redoStack.length > 0){
    undoStack ~= document;
    document = redoStack.back();
    redoStack.popBack();
  }
}

I noteice that these two functions looks almost identical, and add undo/redo duplication to my todo-list.

I also notice some more duplication in my unittests. Both the redo-tests was cut-and-paste jobs, and I copied from testundoaftermultipleedits. The duplications I'm thinking of can be seen in the following snippet:

 void test_undo_after_multiple_edits(){
   string[] edits = ["one", "two", "three"];
   string document;
   foreach(edit; edits){
     document ~= edit;
     ed.write(edit);
   }
   ...
 void test_redo_without_undos(){
   string[] edits = ["one", "two", "three"];
   string document;
   foreach(edit; edits){
     document ~= edit;
     ed.write(edit);
   }
   ...
void test_redo_with_undos(){
   string[] edits = ["one", "two", "three"];
   string document;
   foreach(edit; edits){
     document ~= edit;
     ed.write(edit);
   }
   ...

I forge ahead and eliminate the duplication by wrapping multiple consecutive edits in one function with a name that communicates it's intent, reducing the total line-count from 180 to 172:

class EditorTest {
  ...
  string addEditsToEditor(string[] edits){
    string document;
    foreach(edit; edits){
      document ~= edit;
      ed.write(edit);
    }
    return document;
  }
  ...
}

Out of convenience I allow it to return the expected document after concatenating (that squeamish feeling again… My function does two things, and the "and return what you should expect from it" intention isn't communicated in the name. I hope to get back to that later.)

Reading over my todo-list, I notice the point I wrote about clearing the redo-stack after an edit. At this point I was not 100% sure how it would behave but suspected that it wouldn't work (a new display of my very short-lived short-term memory) - and the following test confirmed my suspicion.

void test_edit_after_undoing_reset_redo_state(){
  addEditsToEditor(["one", "two", "three"]);
  ed.undo();
  ed.write("Unrelated");
  auto document = ed.getDocument();
  ed.redo();
  assert(ed.getDocument() == document);
}

Clearing the redo-stack in the write-function fixed it. And that completes the redo-functionality. I feel a surge of accomplishment due to traction!

10 Fixing the painful scaffolding

I really miss the convenience of just adding a function with a name that start with test, in order to add a test to the test case - I keep forgetting to set up the name-function-map. Lets just have a look at the constructor of my testcase:

class EditorTest:
  ...
  this(){
    functions = [ "test_ed_is_empty_after_initialization" : delegate(){ test_ed_is_empty_after_initialization();},
                  "test_ed_contains_as_many_characters_added_after_a_write": delegate(){ test_ed_contains_as_many_characters_added_after_a_write();},
                  "test_undoing_one_edit_is_equivalent_with_resetting_the_state": delegate(){ test_undoing_one_edit_is_equivalent_with_resetting_the_state();},
                  "test_undo_after_multiple_edits": delegate(){ test_undo_after_multiple_edits();},
                  "test_undo_on_empty_does_not_raise_an_exception":delegate(){ test_undo_on_empty_does_not_raise_an_exception();}, 
                  "test_redo_without_undos":delegate(){test_redo_without_undos();}, 
                  "test_redo_with_undos":delegate(){test_redo_with_undos();}, 
                  "test_edit_after_undoing_reset_redo_state":delegate(){test_edit_after_undoing_reset_redo_state();}];
  }

That bunch of duplicated names and functions are clearly a hog in the machinery. It also looks like something that could be generated automatically.

In other modern languages, one might at this point reach for a reflection mechanism in order to go through the functions on the testcase object, and then execute every function that starts with the name "test" between a setUp and tearDown pair. Due to the lack of reflection on runtime objects in D, we cannot follow this procedure, but we CAN do something I've blogged about previously: Utilize the very capable code-generating and compile-time reflection mechanisms in D in order to build this map for us. Also, the responsibilities for EditorTest have grown to function as both test runner and test case implementation - the name doesn't really reveal what the class is doing. I don't feel like naming it EditorTestAndTestRunner to fix that. It seems that I could separate these responsibilities by extracting the test runner from EditorTest.

I start with separating the testcase from the testrunner. Poking at the code a bit, I realized that I could change my constructor to a function, and use it from the TestRunner to query about what test-functions it contains. I also realized I needed a TestCase abstraction in order to make the setUp function available to the runner. The resulting code for the TestRunner together with the relevant parts of my test and the unittest-block itself looks like follows.

class TestCase {
  public abstract void setUp();
}

class TestRunner {
  this(TestCase tCase){
    testCase = tCase;
  }

  void addTestFunction(string name, void function(Object o) func){
    testFunctions[name] = func;
  }

  void runTests(){
    string[] failedTests;
    foreach(name, func; testFunctions){
      bool passed = true;
      try{
        testCase.setUp();
        func(testCase);
      }catch(Throwable e){
        passed=false;
        failedTests ~= name;
      }
      finally{
        passed ? writef(".") : writef("F");
      }
    }
    writeln("\n\nExecuted ", testFunctions.length, " tests\n");
    if( !failedTests.length ){
      writeln("All tests executed successfully!");
    }else{
      writeln("There were errors!\n");
      foreach(fail; failedTests){
        writeln("Failed test: " ~= fail);
      }
      writeln("");
    }
  }

  private void function(Object)[string] testFunctions;
  private TestCase testCase;
}


class EditorTest : TestCase {

  // This was the constructor, now a function that takes a TestRunner
  // as an argument
  void collectTests(TestRunner tr){
    tr.addTestFunction("test_ed_is_empty_after_initialization", 
                    function(Object o){ (cast(EditorTest)o).test_ed_is_empty_after_initialization();});

    tr.addTestFunction("test_ed_contains_as_many_characters_added_after_a_write", 
                    function(Object o){ (cast(EditorTest)o).test_ed_contains_as_many_characters_added_after_a_write();});

    tr.addTestFunction("test_undoing_one_edit_is_equivalent_with_resetting_the_state", 
                    function(Object o){ (cast(EditorTest)o).test_undoing_one_edit_is_equivalent_with_resetting_the_state();});
    tr.addTestFunction("test_undo_after_multiple_edits",
                    function(Object o){ (cast(EditorTest)o).test_undo_after_multiple_edits();});

    tr.addTestFunction("test_undo_on_empty_does_not_raise_an_exception",
                    function(Object o){ (cast(EditorTest)o).test_undo_on_empty_does_not_raise_an_exception();});
    tr.addTestFunction("test_redo_without_undos",
                    function(Object o){ (cast(EditorTest)o).test_redo_without_undos();});
    tr.addTestFunction("test_redo_with_undos",
                    function(Object o){ (cast(EditorTest)o).test_redo_with_undos();});
    tr.addTestFunction("test_edit_after_undoing_reset_redo_state",
                    function(Object o){ (cast(EditorTest)o).test_edit_after_undoing_reset_redo_state();});
  }
  ...


unittest {
  EditorTest fix = new EditorTest;
  TestRunner runner = new TestRunner(fix);
  fix.collectTests(runner);
  runner.runTests();
}

The testrunner can only run the tests for one test case. This is clearly limiting. I've already mentioned the need for a test of our test-runner. I't seems like I can easily add that test now, save for the urge I have to have multiple test-cases in one runner. But the refactoring I need to do in order to support several testcases in the runner need to test that the runner works. A catch 22 of sorts. In order to "bootstrap" testing of the TestRunner, I add the test to EditorTest and plan to separate the tests when I have support for multiple testcases later on (I add multiple testcases to the todolist.)

Also, notice how the testcase could lend a hand and allow it self to be added to the TestRunner, if it defined the collectTests function on EditorTest. I choose to utilize that when adding the test of the TestRunner.

After forcing the test to fail, and implementing it, the test looks like this, with inlined test case and all.

void test_test_runner_calls_test(){
  string callSignature = "callMe called";
  class LoggingTestCase : TestCase {
    void test_callMe(){
      log ~= callSignature;
    }

    void collectTests(TestRunner runner){
      runner.addTestFunction("callMe",
                             function(Object o){(cast(LoggingTestCase)o).test_callMe();});
    }
    public string log;
  }

  LoggingTestCase tcase = new LoggingTestCase;
  TestRunner runner = new TestRunner(tcase);
  tcase.collectTests(runner);
  runner.runTests();
  assert(tcase.log == callSignature);
}

The test passes, I have my self-referential test, phew. But look - the output from my testrunner suddenly is a bit gibberishy:

banach:Discovering rolvseehuus$ dmd -unittest editor.d && ./editor 
.........

Executed 1 tests

All tests executed successfully!
.

Executed 9 tests

All tests executed successfully!
banach:Discovering rolvseehuus$ 

The output from the extra testrunner created and executed, are folded into the output of the "real" testrunner and clutters the communication of the reporting mechanism. I would like the testrunner to have some kind of "silent" mode, or maybe I can create a modification point for how it generates it's reports. As this is a matter of cosmetics, and considering that I'm very eager to roll the function map compile-time, I'm adding a note on the testrunner output on my todo-list.

11 Automatically generating the test-list

Now then, back to automatically generating the test-functions. How will I test this? I can create a testcase with a test, call the magic function that adds the test-functions, and check if it's called by the test-runner. This sounds like a little extension of the test of the test-runner, doesn't it? After adding the appropriate forward-declared mixin template I'll use to roll the test function map, it looks like this - a cut-and-paste job from testtestrunnercallstest.

...
mixin template TestBuilder(T){
  void collectTests(TestRunner tr){
  }
}
...
class EditorTest : TestCase {
  ...
  void test_test_creator_creates_called_test(){
    string callSignature = "callMe called";
    class LoggingTestCase : TestCase {
      mixin TestBuilder!LoggingTestCase;

      void test_callMe(){
        log ~= callSignature;
      }
      public string log;
    }

    LoggingTestCase tcase = new LoggingTestCase;
    TestRunner runner = new TestRunner(tcase);
    tcase.collectTests(runner);
    runner.runTests();
    assert(tcase.log == callSignature);
  }

The empty TestBuilder template is there only to make the things compile, and to implements the collectTests function we use to add the tests. As it is empty at the moment, the test fails as expected:

banach:Discovering rolvseehuus$ dmd -unittest editor.d && ./editor 
..

Executed 0 tests

All tests executed successfully!
F.......

Executed 1 tests

All tests executed successfully!
.

Executed 10 tests

There were errors!

Failed test: test_test_creator_creates_called_test

banach:Discovering rolvseehuus$ 

As previously noted, the junk is piling up when I'm having multiple test runners being executed during tests - but it is still obvious to me what test is failing as long as its only one.

Adapting the code to generate the test-function lookups turns out to be easy enough, and after creating a TestBuilder mixin as previously mentioned, the tests run okay. The code that allows me to automatically register and list the functions (adapted from this post looks like the following.

class TestEnlister(T, string prefix = "test"){
  static void collectTests(TestRunner tr){
    foreach(idx; 0..testNames.length){
      tr.addTestFunction(testNames[idx], testFunctions[idx]);
    }
  }

  static string buildStaticFunctionWrappers(string[] names){
    string res = "";
    foreach(name; names){
      res ~= "static " ~ name ~ "(Object o){ (cast(T)o)." ~name ~ "();}\n";
    }
    res ~= "static void function(Object)[] testFunctions = [";
    foreach(name; names){
      res ~= "&"~name~",";
    }
    res ~="];";
    return res;
  }

  static string[] enlistTestFunctions(){
    string[] res;
    foreach(fun; __traits(allMembers, T)){
      if(fun.indexOf(prefix) == 0){
        res ~= fun;
      }
    }
    return res;
  }

  mixin(buildStaticFunctionWrappers(enlistTestFunctions()));
  static string[] testNames = enlistTestFunctions();
}

mixin template TestBuilder(T){
  void collectTests(TestRunner tr){
    TestEnlister!(T).collectTests(tr);
  }
}

That means I can simply replace the collectTests function on the original test-case with the TestBuilder mixin, and everything should look like before. I replace 10-15 lines of code with one:

class EditorTest : TestCase {
  mixin TestBuilder!EditorTest;
  ...tests...
}

Now there is one simple item I'm not entirely happy with - I need to tag every test-class with the mixin template, thus needing to write the class name twice. I think I can live with that for the time being, so I don't bother adding that to my todo-list. I note though that "testtestrunnercallstest" and "testtestcreatorcreatescalledtest" are identical, save for the compile-time generated table so I remove the superfluous test.

But finally! I've removed the need to add a new obscure line of code every time I define a new test in the TestCase. I'm of course seriously noticing that we are on the brink of having something useful here: A nice little easy-to-use test library. Its being born from paying attention to duplication and communicating intent. The last todos on my list regarding the test-framework is enough for it to make the final leap over to the realm of usefulness. I add a todo on moving it out in a separate file.

12 Back on track

It's time to get cranking with some production code again - we've been refactoring our test-system for a long time. For reference, here is my current todo-list:

TODO:
x Add characters to a "document"
x undo empty
x undo with only one edit
x undo with multiple edits
x After performing any edit, the user should be able to undo the last edit.
x Redo without undo
x Redo all edits, after undoing all edits.
x Report from test
x Doing something else than a undo, resets redo-state
x The user also should have the ability to regret undoing, by performing a "redo".
x Test for runTests function
x Duplication w.r.t creating tests.
x Triplication! when adding test to the sut
- Multiple testcases in one test-runner.
- clearing or resetting the editor, and that this can be undone.
- Delete characters from the "document"
- The capability to perform a copy and paste will be very nice to have.
- Optimize storage for undo-stack.
- Needs to be able to modify the output from the TestRunner.
- Need to discern between failures and exceptions thrown during
  execution in test system.
- Move test-library to separate unit.

Deleting characters from the document looks like a feature that will bring me the feeling of progress I crave right now, and it might also be usable when creating copy and paste later on. I suspect cutting and then pasting also should be a thought, so we should allow the user to store the text deleted from the document. Something that looks like this would accomodate these needs:

void test_cutting_a_region_of_text(){
  string prefix = "This is a ";
  string suffix = " text";
  string infix = "DELETEME";

  addEditsToEditor([prefix, infix, suffix]);

  string res = ed.cutRange(prefix.length, infix.length);

  assert(res==infix);
  assert(ed.getDocument() == prefix ~ suffix);
}

In other word, I choose to specify from where to begin the cut, and how many characters to cut from the document.

This time, the new (failing, of course) test is automatically picked up and executed by the runner due to the code generation! Oh the joy. I'm feeling a deep sense of closure. During my little celebration I add a todo on handling out-of-bounds ranges when cutting.

I also may note that having two asserts in this test turned out to be a somewhat bad idea. I worked with one assert commented out for an extended period to have a grip on what was going on - shame on me! This might as well have been two separate tests - one for the returned result, and one for the resulting document. I guess I just got lazy on that one.

After the test run, I realize that the cutting test should be undoable as well. I add a failing test to demonstrate this behavior:

void test_cutting_a_region_of_text_can_be_undone(){
  string document = "The text Im adding to the editor is this";
  addEditsToEditor([document]);
  ed.cutRange(document.length/4, document.length/2);
  ed.undo();
  assert(ed.getDocument() == document);
}

Its easy to fix the cut-function, I just needed to push the document on the undo stack. After the fix it now looks like the following - I include the write function for a comparison.

class Editor {
  ...
  void write(string edit){
    undoStack ~= document;
    document ~= edit;
    redoStack = [];
  }
  ...
  string cutRange(size_t pos, size_t num){
    undoStack ~= document;
    string sub = document[pos..pos+num];
    document.replaceInPlace(pos, pos+num, "");
    return sub;
  }
  ...

Pushing the undo stack seems to be something I would like to do every time I edit the string, and the next feature - pasting a copied or cut range - will need the same thing. This leads to some duplication, can I resolve that in some way? I choose not to ponder that to much right now, and note my concerns in the todo-list (annotating the functions with push and pop aspects seems too easy.) Also, I decide that out-of-range as handled by the array-range suffice (it throws an exception), but still adds a test for it now that it has become so easy to add them. It passes - no surprise really… We can get more specific on the type of exception later on if I need it.

void test_cutting_a_region_of_text_out_of_range_throws(){
  string document = "The text Im adding to the editor is this";
  addEditsToEditor([document]);
  assertThrown!Throwable(ed.cutRange(document.length, 10));
}

13 Cleaning up the test reports

At this point I would like to remove the clutter when running a successful set of tests - that extra output from testing the testrunner. If the TestRunner got a report generator as an argument to it's constructor, to which it reported successes and failures, I could replace it with a null-op report generator and get a test-runner without the cluttered output. I add a TestReport interface with the abstract functions reportSuccess, reportFailure and generateReport and implement a default implementation that reproduces the current output.

class DefaultTestReport : TestReport {
  void reportSuccess(string name){
    testCount++;
    writef(".");
  }
  void reportFailure(string name){
    testCount++;
    writef("F");
    failedTests ~= name;
  }

  void generateReport(){
    writeln("\n\nExecuted ", testCount, " tests\n");
    if( !failedTests.length ){
      writeln("All tests executed successfully!");
    }else{
      writeln("There were errors!\n");
      foreach(fail; failedTests){
        writeln("Failed test: " ~= fail);
      }
      writeln("");
    }
  }

  private size_t testCount;
  private string[] failedTests;
}

class TestRunner {
  this(TestCase tCase){
    testCase = tCase;
    report = new DefaultTestReport;
  }
  ...
  void runTests(){
    foreach(name, func; testFunctions){
      bool passed = true;
      try{
        testCase.setUp();
        func(testCase);
      }catch(Throwable e){
        passed=false;
      }
      finally{
        passed ? report.reportSuccess(name) : report.reportFailure(name);
      }
    }
    report.generateReport();
  }
  ...

The runTests function became much cleaner, and the output is identical (as confirmed by visual inspection - that's a test too, no need to get all fanatic about this.) Now I can add a constructor to the TestRunner that takes a TestReport as an argument. I create a TestReport that just swallows the extra output, and use this in the tests that uses TestRunner. Here is the new improved output:

banach:Discovering rolvseehuus$ dmd -unittest editor.d && ./editor 
............

Executed 12 tests

All tests executed successfully!

Cleaning up the things I'm staring at all the time gives me a great sense of relief. This has been annoying me for such a long time now…

14 Support for multiple testcases in one TestRunner

At this point the testcase for the Editor is getting really big - and it still does a bunch of things that doesn't belong there - the name of the test communicate that it is the test for the editor, but it also tests the testrunner. I want to separate the two test-cases, so I add the test for multiple tests first so I can perform the separation safely.

void test_test_runner_can_handle_multiple_tests()
{
  class LoggingTestCase : TestCase {
    mixin TestBuilder!LoggingTestCase;

    this(string logmsg){
      logMessage = logmsg;
    }

    void test_callMe(){
      log ~= logMessage;
    }

    string logMessage;
    string log;
  }

  LoggingTestCase cOne = new LoggingTestCase("This is one test");
  LoggingTestCase cTwo = new LoggingTestCase("This is another test");
  TestRunner runner = new TestRunner(new EmptyTestReport);

  runner.addTest(cOne);
  runner.addTest(cTwo);
  runner.runTests();

  assert(cOne.log == cOne.logMessage);
  assert(cTwo.log == cTwo.logMessage);
}

It fails as expected of course, after adding the necessary parts to make it compile. Also, I notice that the collectTests function I created earlier, just in order to keep that functionality within the test-case, now proves to be very useful in the general case - I could call that, after adding a test case to the TestRunner. I just need to make it available in the TestCase abstract class!

I'm struggling a bit with the design choices here, though. Adding the collectTests-functions to TestCase introduces a circular dependency between TestCase and TestRunner. As TestCase is abstract and only an interface so far, I think I don't mind that too much though. And if I keep TestCase and TestRunner in the same file, it wouldn't pose any problems later on (and it can easily be yanked apart later by creating an additional TestCollector for that functionality.) Poking around a bit in the TestRunner though, makes me realize that I need something to keep track of what test-functions belongs to what TestCase. The TestCollector step forth and make itself a necessity even before the circular dependency between TestCase and TestRunner became established. Nice.

I perform a refactoring (in the green, after disabling the failing test to make the TDD-police happy) in two steps: Introduce the TestCollector and use it instead of the TestRunner for collecting tests, then add an internal class to the TestRunner that implements the TestCollector and collects the testfunctions with the testcases together in an object. Adding these to a container with the testcase-testfunction mappings, make my new test pass with flying colors.

class TestRunner {

  class TestCaseFunctions : TestCollector { 
    this(TestCase tc){
      tCase = tc;
    }

    void addTestFunction(string name, void function(Object o) func){
      testFunctions[name] = func;
    }

    public TestCase tCase;
    public void function(Object)[string] testFunctions;
  };

  this(TestReport treport){
    report = treport;
  }

  this(){
    report = new DefaultTestReport;
  }

  void addTest(TestCase tCase){
    TestCaseFunctions fns = new TestCaseFunctions(tCase);
    tCase.collectTests(fns);
    testCaseFunctions ~= fns;
  }

  void runTests(){
    foreach(tc; testCaseFunctions){
      foreach(name, func; tc.testFunctions){
        bool passed = true;
        try{
          tc.tCase.setUp();
          func(tc.tCase);
        }catch(Throwable e){
          passed=false;
        }
        finally{
          passed ? report.reportSuccess(name) : report.reportFailure(name);
        }
      }
    }
    report.generateReport();
  }

  private TestCaseFunctions[] testCaseFunctions;                   
  private TestReport report;
}

I don't like the double loop first over testCaseFunctions, and then over individual testfunctions. And it hits me like a ton of bricks: I've been wasting my time big-time. Feeling a little bit stupid, I move the inner loop in to the TestCaseFunctions, and add a runTests function to it - basically completing the extract class refactoring by coming full circle back to having something that looks exactly like the original One-test-case-only TestRunner for each test. There is one part that is different though - I send the reporter as an argument to the runTests function on the TestCase. I could have gotten to the same point with fewer edits, just by renaming TestRunner to TestCaseFunctions (I don't really like that name but I don't feel overly creative right now and add it to the todo-list instead), and created a new TestRunner that looks like the one below.

class TestCaseFunctions : TestCollector { 
  this(TestCase tc){
    tCase = tc;
  }

  void addTestFunction(string name, void function(Object o) func){
    testFunctions[name] = func;
  }

  void runTests(TestReport report){
    foreach(name, func; testFunctions){
      bool passed = true;
      try{
        tCase.setUp();
        func(tCase);
      }catch(Throwable e){
        passed=false;
      }
      finally{
        passed ? report.reportSuccess(name) : report.reportFailure(name);
      }
    }
  }
  public TestCase tCase;
  public void function(Object)[string] testFunctions;
}

class TestRunner {
  this(TestReport treport){
    report = treport;
  }

  this(){
    report = new DefaultTestReport;
  }

  void addTest(TestCase tCase){
    TestCaseFunctions fns = new TestCaseFunctions(tCase);
    tCase.collectTests(fns);
    testCaseFunctions ~= fns;
  }

  void runTests(){
    foreach(tc; testCaseFunctions){
      tc.runTests(report);
    }
    report.generateReport();
  }

  private TestCaseFunctions[] testCaseFunctions;                   
  private TestReport report;
}

Now that I have support for executing multiple testcases in the testrunner, I can separate the TestRunner's tests from the Editor's tests - the TestEditor and TestTestRunner names finally communicates their intent. Here is the resulting TestTestRunner:

class TestTestRunner : TestCase {
  mixin TestBuilder!TestTestRunner;

  class EmptyTestReport : TestReport {
    void reportSuccess(string name){}
    void reportFailure(string name){}
    void generateReport(){}
  }

  void test_test_builder_creates_called_test(){

    string callSignature = "callMe called";
    class LoggingTestCase : TestCase {
      mixin TestBuilder!LoggingTestCase;
      void test_callMe(){
        log ~= callSignature;
      }
      public string log;
    }

    LoggingTestCase tcase = new LoggingTestCase;
    TestRunner runner = new TestRunner(new EmptyTestReport);
    runner.addTest(tcase);
    runner.runTests();
    assert(tcase.log == callSignature);
  }

  void test_test_runner_can_handle_multiple_tests()
  {
    class LoggingTestCase : TestCase {
      mixin TestBuilder!LoggingTestCase;

      this(string logmsg){
        logMessage = logmsg;
      }

      void test_callMe(){
        log ~= logMessage;
      }

      string logMessage;
      string log;
    }

    LoggingTestCase cOne = new LoggingTestCase("This is one test");
    LoggingTestCase cTwo = new LoggingTestCase("This is another test");
    TestRunner runner = new TestRunner(new EmptyTestReport);

    runner.addTest(cOne);
    runner.addTest(cTwo);
    runner.runTests();

    assert(cOne.log == cOne.logMessage);
    assert(cTwo.log == cTwo.logMessage);
  }
}

Its some more duplication here, for example the two implementations of LoggingTestCase, and the within-test creation of the testrunner with one test. I feel compelled to remove the duplication, and while removing the duplication I realize that the test for multiple tests renders the test for calling a test function superfluous. After removing the duplication and the superfluous test, the rest looks like this - substantially less code to wrap my head around (which is a good thing, considering my short term memory… )

class TestTestRunner : TestCase {
  mixin TestBuilder!TestTestRunner;

  class EmptyTestReport : TestReport {
    void reportSuccess(string name){}
    void reportFailure(string name){}
    void generateReport(){}
  }

  class LoggingTestCase : TestCase {
    mixin TestBuilder!LoggingTestCase;

    this(string logmsg){
      logMessage = logmsg;
    }

    void test_callMe(){
      log ~= logMessage;
    }

    string logMessage;
    string log;
  }

  void setUp(){
    runner = new TestRunner(new EmptyTestReport);
  }

  void test_test_runner_can_handle_multiple_tests()
  {    
    LoggingTestCase cOne = new LoggingTestCase("This is one test");
    LoggingTestCase cTwo = new LoggingTestCase("This is another test");

    runner.addTest(cOne);
    runner.addTest(cTwo);
    runner.runTests();

    assert(cOne.log == cOne.logMessage);
    assert(cTwo.log == cTwo.logMessage);
  }

  private TestRunner runner;
}

15 A self-tested unit testing module

The size of my little test-library has become a substantial portion of my file - so substantial that my editor-module primarily looks like a testing framework at first glance (obscuring the intent of the editor-module.) I decide that it's time to move it to a separate module.

So far we've been cruising along, and the refactorings have been relatively low risk, because all source is contained in one source-file that (at times at least) have been relatively easy to read. Also, adding a failing test every now and then has posed as a stand in for the test of the TestRunner with a failing test. There have been, though, a couple of times where I've had this "uh-oh, I jumped plane without a parachute" feeling - prompting me to deliberately break some tests to check for firm ground beneath my feet. In other words, before moving it out of the editor-file, I finally need to ad the test that verify that a failing test behaves correctly. This will make the testing framework self contained, self tested and reusable.

Looking for a way to implement the failing test, I grab hold of the test-report - it can function as a mock (now how useful didn't that turn out to be?) A successful test will be followed by a reportSuccess-call, and a failing test will be followed by a reportFailure call. By making a TestReport that just log it's successes and failures, I can check that the report is correct in face of a failing test.

void test_a_failing_test(){
  class LoggingTestReport : TestReport {
    this(){}
    void reportSuccess(string testname){
      successes ~= testname;
    }
    void reportFailure(string testname){
      failures ~= testname;
    }

    void generateReport(){}

    string failures;
    string successes;
  }

  class CaseWithFailingTest : TestCase {
    mixin TestBuilder!CaseWithFailingTest;

    void test_failing_test(){
      assert(false);
    }

    void test_passing_test(){
    }
  }

  LoggingTestReport report = new LoggingTestReport;
  TestRunner trunner = new TestRunner(report);
  trunner.addTest( new CaseWithFailingTest);
  trunner.runTests();

  assert(report.failures == "test_failing_test");
  assert(report.successes == "test_passing_test");
}

Unsurprisingly, it works. But now I'm certain it will work in the future as well. I also notice that it doesn't quite fit in the TestTestRunner class - because it does not use the member variable yet (I had to create a new TestRunner in order to change the reporting object) That can wait - lets move this baby.

Moving the entire test-framework to the module dunit, removing the parts not needed by our editor and then importing it in the editor-file, takes two minutes. It does - again - change our output slightly though:

banach:Discovering rolvseehuus$ dmd -unittest editor.d dunit.d && ./editor 
...........

Executed 11 tests

All tests executed successfully!
..

Executed 2 tests

All tests executed successfully!
banach:Discovering rolvseehuus$ 

This change is caused by every unittest-block setting up and running a separate unittest. Originally I aimed for that single dotted line, with one dot for every test runned. If I wanted to keep it as is, it seems I would need to be able to register test-suites in some global test repository as part as the unittest-block - by all means doable. Choosing exactly how the output should look, is a matter of taste though, and I'm fine with keeping it this way for now - I don't have any customers telling me otherwise yet…

16 Was it worth it?

It's time to conclude this blog-post. The rest of the work will be one feature and the not-so-very-interesting cleanups and minor technicalities.

By paying attention to the four principles outlined at the beginning - writing tests that always run, paying attention to the intention of the pieces of the software and make sure it is communicated, removing duplication and seeking to keep the number of separate classes/functions and modules at a minimum - I've ended up with a useful reusable test-framework that I'm sure I'll use and evolve in the future. Without that being the plan at all - reusable useful bits and pieces emerge seemingly without a plan for them to do so, if you just follow the "rules" (okay - I admit it. I kind of knew that it would happen. But that's beside the point…)

I did two things with the tests: I added new tests, and I performed maintenance on the tests - that is, refactoring in order to meet the "rules" of simple design. And it was this maintenance that made the testing framework emerge. The maintenance I performed on the tests might, from reading this, seem like a big part of the development, and granted - it was. In hindsight - and I'm only guessing based on gut feel - It might amount to half of the total time spent coding. The rest, I expect was evenly divided between writing tests and writing the editor - slightly skewed towards writing tests.

Was it worth it? To keep track of and pick up those loose ends? If I were never to touch any of this code again, and nobody needed to perform maintenance on it - If this was a totally isolated development effort - then maybe I've wasted my and - worse - the customers time. But for most cases - I believe it would certainly be worth it. But that would fill another blog-post.

Finally, about the rear-view-only part. I wanted to write this while only paying attention to the next requirement and the source. For a small system like this, it is clearly a sub optimal strategy - some of the changes I did could clearly be anticipated with some forethought - building the needed designs from the start and thus I would have done less refactoring underway. But sometimes, working on a large-ish legacy system with history beyond the start-date of the most senior team member after changes in staffing during the life of the project, that's all you've got: The source, a window of knowledge about starts and ends that is very small compared to the entire life of the project, and the next handful of requirements. So pay attention to what happens in the rear view mirror as well.

17 The source for the final editor module

As I've presented you with segments here and there of the code, It is good form to display the final look of both the testing framework and the editor-module with it's unittest.

The testing framework can be found at bitbucket under the name dxunit (because dunit was taken for a Delphi framework.) The source for the editor is found below.

import std.array;
import std.exception;
import dunit;

class Editor {

  size_t size() {
    return document.length;
  }

  void write(string edit){
    undoStack ~= document;
    document ~= edit;
    redoStack = [];
  }

  void undo(){
    if(undoStack.length > 0){
      redoStack ~= document;
      document = undoStack.back();
      undoStack.popBack();
    }
  }

  void redo(){
    if(redoStack.length > 0){
      undoStack ~= document;
      document = redoStack.back();
      redoStack.popBack();
    }
  }

  string cutRange(size_t pos, size_t num){
    undoStack ~= document;
    string sub = document[pos..pos+num];
    document.replaceInPlace(pos, pos+num, "");
    return sub;
  }

  string getDocument(){
    return document;
  }

  private string document;
  private string[] undoStack;
  private string[] redoStack;
}


class EditorTest : TestCase {
  mixin TestBuilder!EditorTest;

  void setUp(){
    ed = new Editor;
  }

  string addEditsToEditor(string[] edits){
    string document;
    foreach(edit; edits){
      document ~= edit;
      ed.write(edit);
    }
    return document;
  }

  void test_ed_is_empty_after_initialization(){
    assert(ed.size() == 0);
  }

  void test_ed_contains_as_many_characters_added_after_a_write(){
    string text = addEditsToEditor(["some text we're adding"]);
    assert(ed.size() == text.length);
  }

  void test_undoing_one_edit_is_equivalent_with_resetting_the_state(){
    addEditsToEditor(["some text we're adding"]);
    ed.undo();
    assert(ed.size() == 0);
  }

  void test_undo_after_multiple_edits(){
    string document = addEditsToEditor(["one", "two", "three"]);
    ed.write("lasttoberemoved");
    ed.undo();

    assert(ed.getDocument() == document);
  }

  void test_undo_on_empty_does_not_raise_an_exception(){
    try{
      ed.undo();
    }catch(Throwable e){
      assert(false);
    }
  }

  void test_redo_without_undos(){
    string document = addEditsToEditor(["one", "two", "three"]);
    ed.redo();
    assert(ed.getDocument() == document);
  }

  void test_redo_with_undos(){
    string[] edits = ["one", "two", "three"];
    string document = addEditsToEditor(edits);

    foreach(count; 0..edits.length){
      ed.undo();
    }
    foreach(count; 0..edits.length){
      ed.redo();
    }
    assert(ed.getDocument() == document );
  }

  void test_edit_after_undoing_reset_redo_state(){
    addEditsToEditor(["one", "two", "three"]);
    ed.undo();
    ed.write("Unrelated");
    auto document = ed.getDocument();
    ed.redo();
    assert(ed.getDocument() == document);
  }

  void test_cutting_a_region_of_text(){
    string prefix = "This is a ";
    string suffix = " text";
    string infix = "DELETEME";

    addEditsToEditor([prefix, infix, suffix]);

    string res = ed.cutRange(prefix.length, infix.length);


    assert(ed.getDocument() == prefix ~ suffix);
    assert(res==infix);
  }

  void test_cutting_a_region_of_text_can_be_undone(){
    string document = "The text Im adding to the editor is this";
    addEditsToEditor([document]);
    ed.cutRange(document.length/4, document.length/2);
    ed.undo();
    assert(ed.getDocument() == document);
  }

  void test_cutting_a_region_of_text_out_of_range_throws(){
    string document = "The text Im adding to the editor is this";
    addEditsToEditor([document]);
    assertThrown!Throwable(ed.cutRange(document.length, 10));
  }

  private Editor ed;
}

unittest {
  TestRunner runner = new TestRunner;
  runner.addTest(new EditorTest);
  runner.runTests();
}