# Writing good tests

Date: 2026-03-25

This document describes principles for writing good tests. It is intended for both human developers and AI coding agents. Much of the guidance here draws from Google's [Testing on the Toilet](https://testing.googleblog.com/2007/01/introducing-testing-on-toilet.html) series.

## Table of contents

- [Test expected behavior, not current behavior](#test-expected-behavior-not-current-behavior)
- [Choose the right testing target](#choose-the-right-testing-target)
  - [Test behavior, not implementation](#test-behavior-not-implementation)
  - [Test public APIs, not implementation-detail classes](#test-public-apis-not-implementation-detail-classes)
  - [Test behaviors, not methods](#test-behaviors-not-methods)
  - [Don't test constants](#dont-test-constants)
- [Write clear, readable tests](#write-clear-readable-tests)
  - [Write descriptive test names](#write-descriptive-test-names)
  - [Keep tests focused](#keep-tests-focused)
  - [Keep cause and effect clear](#keep-cause-and-effect-clear)
  - [Don't put logic in tests](#dont-put-logic-in-tests)
- [Manage test data carefully](#manage-test-data-carefully)
  - [Avoid sharing input data between tests](#avoid-sharing-input-data-between-tests)
  - [Prefer DAMP over DRY](#prefer-damp-over-dry)
  - [Cleanly create test data](#cleanly-create-test-data)
- [Test your test helpers](#test-your-test-helpers)
- [Use tests in code review](#use-tests-in-code-review)

## Test expected behavior, not current behavior

The single most important principle of writing good tests: a test should verify the behavior the code _should_ have, not the behavior it _currently_ has. Before writing a test, think carefully about what the correct behavior is. Then write a test that asserts that correct behavior.

This may seem obvious, but it is easy to get wrong in practice, especially when writing tests for existing code. The temptation is to run the code, see what it produces, and then write a test asserting that output. This is backward. A test written this way will happily pass even if the code has a bug -- it is just verifying that the code does what it does, which is a tautology. A test that was written thoughtfully, on the other hand, might _catch_ a bug: if you think carefully about what the output should be and write a test asserting that, the test will fail if the code produces the wrong output.

It is fine if writing tests doesn't find bugs -- not every piece of code has bugs. The point is that tests _can_ catch bugs, but only if the expected values in the tests are derived from independent reasoning about the correct behavior, not from copying the code's current output.

This principle is the core motivation behind test-driven development (TDD), in which tests are written _before_ the code. When you write the tests first, there is no existing output to copy -- you are forced to think about what the behavior should be. Whether or not you practice full TDD, the principle applies: derive your expected values from the specification, not from the implementation.

## Choose the right testing target

### Test behavior, not implementation

Tests should verify _what_ the code does, not _how_ it does it. When tests are coupled to implementation details, they break every time the implementation changes, even if the behavior is unchanged.

Consider a `Calculator` class:

```java
public class Calculator {
  public int add(int a, int b) {
    return a + b;
  }
}
```

A behavioral test asserts the result of `add`:

```java
public void testAdd() {
  assertEquals(3, calculator.add(2, 1));
  assertEquals(2, calculator.add(2, 0));
}
```

If the implementation is later refactored -- say, to delegate to an `Adder` object internally -- these tests do not need to change, because the externally visible behavior is the same. Tests that instead assert which internal methods were called, or that verify the structure of intermediate objects, would break after the refactoring even though nothing is wrong.

Tests that are independent of implementation details are easier to maintain, easier to understand, and provide better documentation for users of the code. (See [Test Behavior, Not Implementation](https://testing.googleblog.com/2013/08/testing-on-toilet-test-behavior-not.html).)

### Test public APIs, not implementation-detail classes

When a class is only used internally as an implementation detail of another class, it usually does not need its own tests. Instead, test the public API that uses it.

For example, suppose a `UserInfoService` uses an internal `UserInfoValidator`:

```java
public class UserInfoService {
  private UserInfoValidator validator;
  public void save(UserInfo info) {
    validator.validate(info);
    writeToDatabase(info);
  }
}
```

`UserInfoValidator` is an implementation detail of `UserInfoService`. Testing `UserInfoValidator` directly has several downsides: the tests must be updated if the validation logic moves elsewhere, the tests incorrectly enshrine the internal class as a requirement, and if the validation is tested only through the internal class, it may not actually work correctly through the public API. Test the public API -- `UserInfoService.save` -- and the validator will be exercised as part of that. (See [Prefer Testing Public APIs Over Implementation-Detail Classes](https://testing.googleblog.com/2015/01/testing-on-toilet-prefer-testing-public.html).)

### Test behaviors, not methods

Avoid the trap of writing one test per method that verifies everything the method does. Instead, write one test per _behavior_. A single method may have multiple behaviors, and each should be tested separately.

Bad -- one test verifying multiple behaviors:

```java
@Test public void testProcessTransaction() {
  User user = newUserWithBalance(LOW_BALANCE_THRESHOLD.plus(dollars(2)));
  transactionProcessor.processTransaction(
      user, new Transaction("Pile of Beanie Babies", dollars(3)));
  assertContains("You bought a Pile of Beanie Babies", ui.getText());
  assertEquals(1, user.getEmails().size());
  assertEquals("Your balance is low", user.getEmails().get(0).getSubject());
}
```

Good -- separate tests for separate behaviors:

```java
@Test public void testProcessTransaction_displaysNotification() {
  transactionProcessor.processTransaction(
      new User(), new Transaction("Pile of Beanie Babies"));
  assertContains("You bought a Pile of Beanie Babies", ui.getText());
}

@Test public void testProcessTransaction_sendsEmailWhenBalanceIsLow() {
  User user = newUserWithBalance(LOW_BALANCE_THRESHOLD.plus(dollars(2)));
  transactionProcessor.processTransaction(user, new Transaction(dollars(3)));
  assertEquals(1, user.getEmails().size());
  assertEquals("Your balance is low", user.getEmails().get(0).getSubject());
}
```

Splitting behaviors into separate tests makes each test simpler, makes it easier to see which behaviors exist, and ensures that a failure in one behavior does not mask failures in others. (See [Test Behaviors, Not Methods](https://testing.googleblog.com/2014/04/testing-on-toilet-test-behaviors-not.html).)

### Don't test constants

If the code defines a constant like `const MAX_RETRIES = 3`, there is no need to write a test asserting that `MAX_RETRIES` is 3. Such a test verifies nothing useful -- it is just restating the constant's value.

## Write clear, readable tests

Tests serve as documentation: they are concrete examples of how the code behaves. To serve this purpose well, tests should be easy to read and understand.

### Write descriptive test names

A test name should describe the scenario being tested and the expected outcome. A reader should be able to understand what a test verifies without reading the test body.

Bad:

```java
@Test public void isUserLockedOut_invalidLogin() { ... }
```

Good:

```java
@Test public void isUserLockedOut_lockOutUserAfterThreeInvalidLoginAttempts() { ... }
```

Descriptive names have several advantages: you can read a list of test names to understand the full set of behaviors being tested, you can quickly tell if some behavior is missing from the test suite, and when a test fails, you immediately understand what functionality is broken. Explicit names also naturally encourage you to separate different behaviors into different tests, since it is awkward to give a single test a name that describes multiple behaviors. (See [Writing Descriptive Test Names](https://testing.googleblog.com/2014/10/testing-on-toilet-writing-descriptive.html).)

### Keep tests focused

Each test should verify a single scenario. When a test exercises multiple scenarios, it becomes harder to understand, harder to name, and harder to debug when it fails.

Bad -- three scenarios in one test:

```cpp
TEST_F(BankAccountTest, WithdrawFromAccount) {
  Transaction transaction = account_.Deposit(Usd(5));
  clock_.AdvanceTime(MIN_TIME_TO_SETTLE);
  account_.Settle(transaction);

  EXPECT_THAT(account_.Withdraw(Usd(5)), IsOk());
  EXPECT_THAT(account_.Withdraw(Usd(1)), IsRejected());
  account_.SetOverdraftLimit(Usd(1));
  EXPECT_THAT(account_.Withdraw(Usd(1)), IsOk());
}
```

Good -- one scenario per test:

```cpp
TEST_F(BankAccountTest, CanWithdrawWithinBalance) {
  DepositAndSettle(Usd(5));
  EXPECT_THAT(account_.Withdraw(Usd(5)), IsOk());
}

TEST_F(BankAccountTest, CannotOverdraw) {
  DepositAndSettle(Usd(5));
  EXPECT_THAT(account_.Withdraw(Usd(6)), IsRejected());
}

TEST_F(BankAccountTest, CanOverdrawUpToOverdraftLimit) {
  DepositAndSettle(Usd(5));
  account_.SetOverdraftLimit(Usd(1));
  EXPECT_THAT(account_.Withdraw(Usd(6)), IsOk());
}
```

Each test is simple, has its own name describing the scenario, and is independent of the others. If one scenario fails, the others still run. (See [Keep Tests Focused](https://testing.googleblog.com/2018/06/testing-on-toilet-keep-tests-focused.html).)

### Keep cause and effect clear

A test should make the relationship between its inputs and expected outputs immediately obvious. When the cause (setup) and the effect (assertion) are far apart, the test becomes hard to verify.

Bad -- the setup is 200 lines away from the assertion:

```java
private final Tally tally = new Tally();

@Before public void setUp() {
  tally.increment("key1", 8);
  tally.increment("key2", 100);
  tally.increment("key1", 0);
  tally.increment("key1", 1);
}

// ... 200 lines later ...

@Test public void testIncrement_existingKey() {
  assertEquals(9, tally.get("key1"));
}
```

To understand why the expected value is 9, you have to scroll up to the setup method, find the relevant lines among the irrelevant ones, and mentally add 8 + 0 + 1. The connection between cause and effect is buried.

Good -- cause and effect are adjacent:

```java
@Test public void testIncrement_newKey() {
  tally.increment("key", 100);
  assertEquals(100, tally.get("key"));
}

@Test public void testIncrement_existingKey() {
  tally.increment("key", 8);
  tally.increment("key", 1);
  assertEquals(9, tally.get("key"));
}

@Test public void testIncrement_incrementByZeroDoesNothing() {
  tally.increment("key", 8);
  tally.increment("key", 0);
  assertEquals(8, tally.get("key"));
}
```

Each test tells its own self-contained story. (See [Keep Cause and Effect Clear](https://testing.googleblog.com/2017/01/testing-on-toilet-keep-cause-and-effect.html).)

### Don't put logic in tests

Tests should state inputs and expected outputs directly, without computing them. Logic in tests -- string concatenation, arithmetic, conditionals, loops -- makes the test harder to verify and can introduce its own bugs.

Bad:

```java
@Test public void shouldNavigateToPhotosPage() {
  String baseUrl = "http://plus.google.com/";
  Navigator nav = new Navigator(baseUrl);
  nav.goToPhotosPage();
  assertEquals(baseUrl + "/u/0/photos", nav.getCurrentUrl());
}
```

This looks correct at a glance. But if you inline `baseUrl` into the concatenation, you get `"http://plus.google.com//u/0/photos"` -- two slashes. The string concatenation masked a bug (either in the test or in the code under test).

Good:

```java
@Test public void shouldNavigateToPhotosPage() {
  Navigator nav = new Navigator("http://plus.google.com/");
  nav.goToPhotosPage();
  assertEquals("http://plus.google.com/u/0/photos", nav.getCurrentUrl());
}
```

When the expected value is a literal, the test is trivially verifiable by inspection. If logic is unavoidable, move it into a utility function and test that utility function separately. (See [Don't Put Logic in Tests](https://testing.googleblog.com/2014/07/testing-on-toilet-dont-put-logic-in.html).)

## Manage test data carefully

### Avoid sharing input data between tests

As a general rule, avoid sharing input data between tests. Shared input data causes several problems:

- **Coupling.** Changing the shared data for one test risks breaking other tests that depend on it.
- **Unnecessary complexity.** Shared data must satisfy the needs of every test that uses it, which tends to make it more complicated than any single test requires.
- **Obscured cause and effect.** Shared data is often defined far from the tests that use it (e.g., in a `setUp` method at the top of the file), making it hard to see the connection between input and expected output.
- **Reduced coverage.** A well-written test suite should exercise a variety of inputs. Sharing the same data across tests means you are testing the same data repeatedly rather than exploring different scenarios.

Test fixtures (such as `setUp`/`tearDown` methods, or xUnit-style fixtures) are useful for sharing test _infrastructure_ -- things like database connections, mock servers, or temporary directories. But they should generally not be used to share test _input data_.

Sometimes the temptation to share input data arises because constructing the data is laborious. If so, write helper functions that make it easy to construct test data, rather than sharing a single instance. This approach is discussed further in the next section.

### Prefer DAMP over DRY

Production code benefits from the DRY principle ("Don't Repeat Yourself"), but test code has different priorities. Tests have no tests of their own, so they need to be easy for a reader to manually inspect for correctness. This often means accepting some repetition in exchange for clarity.

The DAMP principle ("Descriptive And Meaningful Phrases") favors readability over uniqueness in tests.

Bad -- DRY but hard to follow:

```python
def setUp(self):
    self.users = [User('alice'), User('bob')]
    self.forum = Forum()

def testCanRegisterMultipleUsers(self):
    self._RegisterAllUsers()
    for user in self.users:
        self.assertTrue(self.forum.HasRegisteredUser(user))

def _RegisterAllUsers(self):
    for user in self.users:
        self.forum.Register(user)
```

Good -- DAMP and easy to verify:

```python
def setUp(self):
    self.forum = Forum()

def testCanRegisterMultipleUsers(self):
    user1 = User('alice')
    user2 = User('bob')

    self.forum.Register(user1)
    self.forum.Register(user2)

    self.assertTrue(self.forum.HasRegisteredUser(user1))
    self.assertTrue(self.forum.HasRegisteredUser(user2))
```

In the DAMP version, everything needed to understand the test is right there in the test body. There is no need to cross-reference a `setUp` method, a helper method, or a loop. (See [Tests Too DRY? Make Them DAMP!](https://testing.googleblog.com/2019/12/testing-on-toilet-tests-too-dry-make.html).)

### Cleanly create test data

When test data is complex, resist the temptation to solve the problem with an ever-growing constructor or factory function:

```java
Company small = newCompany(2, 2, null, PUBLIC);
Company privatelyOwned = newCompany(null, null, null, PRIVATE);
Company bankrupt = newCompany(null, null, PAST_DATE, PUBLIC);
```

These calls are hard to read (what does the second `null` mean?) and the factory function accumulates conditionals and parameters over time.

A builder pattern (or equivalent in your language) keeps each test's data clean and self-documenting:

```java
Company small = newCompany().setEmployees(2).setBoardMembers(2).build();
Company privatelyOwned = newCompany().setType(PRIVATE).build();
Company bankrupt = newCompany().setBankruptcyDate(PAST_DATE).build();
Company arbitraryCompany = newCompany().build();
```

The `newCompany()` function returns a builder pre-populated with sensible defaults for required fields. Each test sets only the fields it cares about, and the meaning of each field is clear from the method name. (See [Cleanly Create Test Data](https://testing.googleblog.com/2018/02/testing-on-toilet-cleanly-create-test.html).)

In languages without a builder pattern, similar clarity can be achieved with keyword arguments, factory functions with named parameters, or helper functions that set specific fields.

## Test your test helpers

If a helper function is written for tests -- to construct test data, to set up mock infrastructure, to perform a common assertion -- and it contains more than trivial logic, it should have its own tests. Untested helper functions are a liability: a bug in a helper can cause tests to silently pass when they should fail, or to fail in ways that are hard to diagnose. (The "Don't Put Logic in Tests" principle from the previous section applies here too: when logic is unavoidable, extract it into a helper and test the helper.)

## Use tests in code review

Tests can be a valuable tool during code review. As a reviewer, consider reading the tests first: they should tell you what the change is about and what the code is supposed to do. If the tests don't make this clear, that may indicate poorly specified or designed code. As a code author, write tests with the reviewer in mind -- they should serve as a readable specification of the change's intended behavior. (See [Test Driven Code Review](https://testing.googleblog.com/2010/08/test-driven-code-review.html).)

## See also

- [Test-driven development for AI agents](tdd-guide-for-ai-agents.md) -- guidance on executing the TDD workflow correctly, especially aimed at AI coding agents.

## References

The following Google Testing Blog posts informed this document:

- [Test Behavior, Not Implementation](https://testing.googleblog.com/2013/08/testing-on-toilet-test-behavior-not.html)
- [Test Behaviors, Not Methods](https://testing.googleblog.com/2014/04/testing-on-toilet-test-behaviors-not.html)
- [Don't Put Logic in Tests](https://testing.googleblog.com/2014/07/testing-on-toilet-dont-put-logic-in.html)
- [Writing Descriptive Test Names](https://testing.googleblog.com/2014/10/testing-on-toilet-writing-descriptive.html)
- [Prefer Testing Public APIs Over Implementation-Detail Classes](https://testing.googleblog.com/2015/01/testing-on-toilet-prefer-testing-public.html)
- [Keep Cause and Effect Clear](https://testing.googleblog.com/2017/01/testing-on-toilet-keep-cause-and-effect.html)
- [Cleanly Create Test Data](https://testing.googleblog.com/2018/02/testing-on-toilet-cleanly-create-test.html)
- [Keep Tests Focused](https://testing.googleblog.com/2018/06/testing-on-toilet-keep-tests-focused.html)
- [Tests Too DRY? Make Them DAMP!](https://testing.googleblog.com/2019/12/testing-on-toilet-tests-too-dry-make.html)
- [Test Driven Code Review](https://testing.googleblog.com/2010/08/test-driven-code-review.html)
