The previous article of this series explained how humans run software tests. Or rather, being an “in a nutshell” article, it gave a very brief overview. Now let’s continue the article with the testing techniques involving test automation.
Test methods in a nutshell
As you may or may not remember, I quickly stumbled over a post resonating with my thoughts about testing when I started researching for this article. It’s a pity this post has been removed in the meantime. Be that as it may, the original post of Itamar Turner-Trauring inspired me to draw this picture, covering most of the popular test approaches and summarizing the post. This article covers the two right-hand side quadrants of the image. After that, it returns to the first quadrant. Machines may play an important role in software testing, but at the end of the day, it’s still humans paying our bill.
Quadrant 3: Guaranteeing stable functionality
In the previous article, all our tests were run by humans because we need the ingenuity of human beings. Sometimes these tests are also run because there are no automated tests yet. In particular, the manual UI tests fall into these class. Many teams are happy with that, so there’s no need to change anything. However, after publishing a couple of versions, testing becomes a very repetitive and boring task. High time to automate these tests. The tests we cover now may include a lot of human ingenuity when they are conceived. Some of them involve a lot of thinking. For example, it’s not easy to invent a compiler check helping thousands of developers. Thing is, that’s a one-time process. After that, the tests can be replayed endlessly.
There are zillions of web pages covering unit tests, so I assume you’re already familiar with them. Suffice it to say that unit tests are a well-established test tool inspecting individual methods or functions of your software. Another peculiarity of unit tests is that they are almost always written by the developer who’s also written the algorithm that’s tested. If everything goes well, the tests are also a decent documentation of the source code. Don’t waste time writing Javadoc. Everybody knows it’s never up-to-date. Unit tests are different. If they are not up-to-date, the famous “green bar” showing the results of the test suite turns red. That’s something most developers avoid at all cost. In other words: unit tests are much more reliable than Javadoc, Confluence, and documentation written in Powerpoint or Word.
Introducing unit tests as an afterthought is painful. You have to write your program differently if you want to put it to test with unit tests. Hence, I recommend implementing unit tests from day one.
For some reason, there are few (if any) articles telling you how to write unit tests efficiently. Key strategies are equivalence class partitioning, testing corner cases, and trying to limit the number of
if statements in your methods. Unit tests are one of the most important reasons why you’re taught to keep your methods short.
Another key to writing unit tests sucessfully is to simulate difficult modules. That’s called mocking. This topic is well covered by many tutorials, so I’ll leave it at that. There’s even an article on BeyondJava.net covering a special class of mock services.
Unit test coverage
Another exciting topic is test coverage. There are clever tools measuring how big the part of the program covered by the unit tests is. Problem is, this is a bad indicator. The interesting question is how much non-trivial code is tested. Judging a developer team by test coverage rewards testing the wrong kind of tests. In particular, it rewards testing simple things like getters and setters. For what it’s worth, I don’t see any business value in testing such trivial code.
Many teams also test the UI layer of their application with unit tests. Again, I’m a bit skeptical. I don’t see any point in testing whether an input field is mandatory. Many frameworks allow you to make an input field mandatory by jotting down a simple declaration. It’s unlikely it’ll ever get lost unintentionally. Before writing a unit test I recommend asking yourself if it’s worth the effort. I prefer testing non-trivial code. I recommend testing code that’s likely to break. I don’t recommend stuff that’s implemented in a declarative way. As a rule of thumb, that’s everything that’s defined by an annotation in JavaEE 7+. It goes without saying that the same idea applies to other languages. For example, I wouldn’t test the decorators of an Angular program. Neither would I test things that can be written as an HTML attribute. In particular, testing is not a good argument to prefer reactive forms over template driven forms in Angular. Reactive forms allow you to test stuff I don’t consider worth testing.
Automated UI tests
Automated UI tests are a very tempting approach to testing. You record the user’s interactions with the program, record the screens of the programs and replay the user interactions later. If the program shows different screen, something’s gone wrong.
Or not. UI tests test the entire program – including your SAP backend. That’s quite a challenge. You have to write your tests in such a way they are repeatable.
Sometimes, it’s almost impossible to make your tests repeatable. For instance, you can never delete anything in SAP. All you can do it to cancel the original transaction. In most cases, that’s fine, but even so, the database state has changed. Murphy’s law states that this tiny difference ruins your day.
A popular tool for UI tests of web applications is Selenium. As to my experience, it’s not a bad tool, but writing tests doesn’t come for free. Plus, don’t try to run tests on Internet Explorer. These tests often fail for no obvious reason. Integration tests running in a real browser are also slow, so most teams decide to run them on a dedicated computer. As a consequence, developers tend to forget to run them on their local machine. Even worse, they tend to forget to check the results on the dedicated machine. You have to assign this task to one of your team members to make sure the unpopular task isn’t forgotten.
Another problem with automated UI tests is that they break quickly if the UI changes. You don’t want to fix every test just because you’ve installed a new corporate design. You need a good strategy to write resilient tests detecting real bugs instead of nagging you with false positives.
Automating integration tests is a very challenging task. An integration test checks whether two modules play well with each other. You’re crossing at least one boundary to do that. So everything I’ve mentioned in the previous section applies to integration tests as well. Integration tests need to be idempotent to be meaningful. However, many backends have not been written with repeatable tests in mind.
Truth to tell, most projects I’ve seen so far prefer to run their integration tests manually. Writing idempotent and reliable integration tests usually requires too much thought.
Compiler checks and linters
Chances are compiler checks didn’t cross your mind when you read the headline of this article. However, they are one of the most valuable tools. Compilers can detect countless errors and mistakes at a very early stage. Of course, they only detect low-level errors, but these errors are extremely frequent.
If you are serious about quality assurance, choose a programming language with strong and static typing, activate as many compiler checks as possible and add a good linter to detect code smells. I know about the virtues of dynamic typing, and I love it. But both weak and dynamic typing always bear the risk of shipping trivial bugs to the customer. Most teams fix this by writing more unit tests. That’s the trade-off: relaxed compiler checks and powerful tools like duck typing are more fun. The price you pay is writing many unit tests (or running code reviews, or shipping buggy software).
Static code analysis
Tools like Sonar detect many errors by analyzing the source code. You can integrate them with the IDE or run them as part of the CI build. The problem of the IDE integration is that powerful code analysis is slow. So most teams opt for the second option. However, you have to make sure someone examines the results. It almost never works voluntarily, so you’re better off picking one of the team members and make them check the code analysis on a daily or weekly basis. Someone has to be the bad guy. It’s for the benefit of the entire team.
If possible, install the static code analysis tools early. If your program already has a hundred classes, the tool detects so many bugs that everybody gives up because there are only so many bugs you can fix during a working day.
What’s the problem about checking the functionality automatedly?
Most of my friends and co-workers tell me there’s nothing wrong with automated tests. My personal experience is a bit different. When the green bar of the unit test turns red, that doesn’t necessarily mean the program is broken. Maybe it’s just the business requirements that have changed.
I’ve always been working in projects with rapidly changing requirements. Actually, I’ve hardly ever worked in projects. Most of the time, I’ve been a product developer. By definition, a product is a project living particularly long. In theory, projects have a clearly defined deadline. After that, they are finished. Products are different. If your company is lucky, they’ve got a product surviving decades, being profitable all the time. In other words: Software products live so long that they need to change over time. Employing tests which have specifically invented to prevent change is a mixed blessing.
Plus, I know many business departments who can’t define the product they need before starting the project. That’s not a bad thing. It’s just the way it is. We all learn during the time of a project. Developers make this a virtue by calling the project “agile” and by embracing an agile project management tool like Scrum.
The only problem is that tests preventing change don’t match projects embracing change. You can use them, but you have to be aware of the price tag. Don’t hesitate to cut down investments in test automation as soon as it starts to slow you down.
Quadrant 4: Examining how the software we’ve written copes with real life
Next, we’ll cover a group of tests that are hardly ever run. Who want’s to know if the program behaves nicely if put under stress if you’ve already crossed the deadlines multiple times? So the easiest and most common strategy is to let the customer run these tests in production. If you’ve read the previous article, you may remember that German developers mock this as “banana principle”: the software is shipped too early and ripens at the customers’. Just like real fruits, the quality is better if the fruit is allowed to ripen while still attached to the tree.
More often than not, software suffers the same fate. It’s delivered in a fairly immature state. Let’s have a look at the test predicting how the software is going to cope with the requirements of the production environment.
When too many users work with your program simultaneously, all kinds of errors happen. Most applications use common resources for each user: a common database, a common application server, things like this. The common service can serve only so many requests per second. After that, it responds with error messages.
It’s important to test this before shipping. Sometimes, the error messages are interesting to hackers. Sometimes, other applications crash, too. Or the application server needs to be restarted.
API gateways can help you to mitigate the pain. They can throttle the traffic before it becomes dangerous. Of course, many users get an error message, but the application continues running, so it continues to serve most users.
Soak tests are similar to stress tests, but with less load and over an extended period of time. The idea being that sometimes bottlenecks become visible after a long time. For example, memory leaks are an example of such an error. They cause the program to crash after several weeks. The idea of a soak test is to detect this kind of error earlier.
The application logs contain a lot of information about the application’s behavior. Ask your operations department to show you the error logs regularly (or fetch them yourself if you’re allowed to do so). Analyze the error messages. More often than not, you’ll recognize problems and bugs before the users report them to you.
Penetration tests care about internet security. Basically, you hire a paid hacker, asking them to hack your application.
There are tools to hack a web application automatedly. Nonetheless, penetration tests are such a special branch of the IT that it’s usually best to hire an external consultant for a few days.
You’re never prepared for an emergency unless you practice it regularly. That’s the idea of a product by Netflix. Chaos monkey randomly brings your servers down. As weird as it may sound, that’s a good idea. Chaos monkey terminates your servers during the regular office hours. So every expert is available to fix the problem. After a while, you’ve gathered enough experience to restart your servers three o´clock in the morning. Even better, you’ve probably hardened your software and your infrastructure to a point that it recovers from the problem without having to call you in the dead of night.
Quadrant 1, pass 2: The user’s perspective (and their final judgement)
Image source: Gerd Altmann, published under a Creative Commons CC0 licenseNow that we’ve managed to finish and ship our application, we’re in the user quadrant again. Remember, that’s where we started in the previous article of this series. This time, our focus is a bit different. We want to learn what our users think about the application. Do they like it? Which part of it can be improved? Is the new version of the software an improvement, or did we make things worse?
After all this talk about test automation and testing systematically, it’s time to do exactly the opposite. The idea of explorative testing is to explore a system you don’t know. The approach works best if your testers don’t know the application yet. Give them a task to solve, and watch them doing it. Or ask them to jot down what they liked, what they didn’t like, which rough edges they encountered and so on. There are many approaches to explorative testing. The bottom line is to let people “play around” with the application.
Explorative testing isn’t about proving the correctness of the software. It’s about finding out how people cope with the software. As you can imagine, that’s a popular topic.
A more radical variant of explorative testing is “guerilla testing”. The idea is to go into a café and ask other people to evaluate your software. The café house atmosphere limits the duration of these tests to a quarter of an hour. It also makes you focus on UX topics.
It goes without saying that you don’t need to go into a public café. Your company’s cafeteria will do. The bottom line is that the participants don’t know the software in advance and that they can’t prepare. Guerilla tests are all about the spontaneous reactions.
A/B tests and Continuous Delivery
A/B tests are another class of tests with unprepared people. This time, it takes place in production. The idea is that you’ve written a new version of your software. Or you need to decide between two different designs. So you deploy the new version of the software. After that, you start watching what happens. Are people more productive? Do they buy more products? If they don’t, just install the previous version of the software and start over.
That’s why A/B tests are a good match to Continous Delivery. CD makes it easy to bring just another version to production.
There are many variations of the A/B tests. Companies like Twitter often show the new version of the software to a limited audience. So they can compare the success of the new version in real time.
I like this approach a lot. The only problem is you can hardly ever use it. Most projects are company-run projects delivering two releases a year. That’s not the ecosystems A/B test dwell in. A/B tests also need large numbers of users, so you can tell random fluctuations from statistically signicant differences.
Usability tests observe how your users cope with your application. Usability tests and UX design are huge topics. The internet has many useful resources for you. So instead of explaining the topic in depth, I point to Melissa Ng’s slides.
Basically, these tests are conducted by the customer. They look at your program and accept it. Or not. In theory, they follow the acceptance criteria defined at the beginning of the project. But we all know that the difference between theory and practice is that there’s no difference – in theory.
User bug reports
By definition, user bug reports crop up after shipping and installing the software. In a way, they are the ultimate test results. They’ve been written by the customer. In other words, by the person paying your bills.
Actually, user bug reports are the tests you’re probably trying to avoid at all cost. But if they happen, react quickly. Show your customers you listen to them. There’s nothing worse than a user bug report that isn’t answered in time. Mind you: the user wasn’t just frustrated. They took the time to fill a form and to describe an error. That kind of feedback should be rewarded. Plus, answering quickly improves the image of your project.
Wrapping it up
Even after writing 5.000 words, I don’t think I’ve covered software testing in depth. It’s more like a teaser. I’ve written this series of articles because I’ve met several developers who thought testing is identical to unit testing. However, the world of software testing is a lot broader and a much more diverse. Plus, humans play a crucial role in software testing. I don’t think we’ll ever be able to replace them by machines.
This infographic I’ve found on Wikimedia summarizes this post nicely. Also note the differences to my posts. I never claimed to know it all.
Image published by Nahn Ngo under the Creative Commons Attribution-Share Alike 4.0 International license at Wikimedia