Testing Strategy for Enterprise Software: What Your Test Suite Should Actually Cover

A Nigerian fintech went live with an ERP integration after a development cycle that included, by every measurable standard, adequate testing: unit test coverage above 80%, a UAT cycle, and a staging environment. Three weeks after go-live, a specific sequence of transactions — a payment that triggered both an automated reconciliation and a concurrent manual adjustment — created an inconsistency that propagated through the ledger and required three days of manual correction.

The specific test case had not been written because the specific scenario had not been anticipated during test planning. The test suite had substantial coverage of lines of code. It had zero coverage of the actual failure scenario.

This is the core problem with most enterprise software test strategies: they measure test coverage by the wrong metrics (code lines covered rather than business scenarios covered) and allocate testing effort to the wrong levels (unit tests are cheap to write and numerous; integration and end-to-end tests require more effort and are therefore underrepresented in most suites).

The Testing Pyramid (and Its Misapplication)

The testing pyramid is the standard framework for test strategy: many unit tests at the base, fewer integration tests in the middle, few end-to-end tests at the top. The ratio is recommended because unit tests are fast, cheap, and provide specific diagnostic information when they fail; end-to-end tests are slow, expensive, and flaky.

The pyramid is sound advice — but has been misapplied to create codebases with hundreds of unit tests and minimal integration or end-to-end coverage. A common failure pattern:

500 unit tests covering individual component behaviour in isolation
15 integration tests that were written during an incident investigation and never expanded
2 end-to-end tests that sometimes pass in CI and are generally ignored when they don't

This shape of test suite provides strong confidence that individual components work in isolation. It provides weak confidence that the system works correctly when components interact — which is where most enterprise software failures occur.

What the Levels Actually Test

Unit tests verify that individual functions, classes, or components behave correctly given specific inputs, in isolation from other system components. They are appropriate for:

Business logic functions with defined inputs and outputs (calculate tax, validate IBAN format, format currency)
Utility and helper functions
Class behaviour tested against mocked dependencies

Unit tests are not appropriate for testing database interactions, API contracts, or workflow correctness — these have dependencies that mocking obscures.

Integration tests verify that multiple components work correctly together — service A can call service B and the response is correctly processed; database queries return the expected data given known database state; an API endpoint correctly orchestrates multiple service calls and returns the correct response.

Integration tests are where most enterprise software value lives and where most test suites are underinvested. An integration test for the payment reconciliation system in the ERP example would have exercised the actual database state transitions and caught the concurrent modification problem by virtue of running the scenario in the same environment.

End-to-end tests verify complete user journeys through the system — a user can log in, create a purchase order, approve it, receive goods against it, and see the result correctly reflected in stock and in accounts. These tests run against a representative environment with realistic data.

End-to-end tests provide the highest confidence in system correctness and are the most likely to catch the class of bugs that cause production incidents, because they exercise the full system behaviour. They are expensive to maintain because they are sensitive to UI changes and environmental issues.

What Should Actually Be in Each Level

Unit Test Priority Areas

Not all code is equally worth unit testing. Prioritise:

Complex business logic: Calculations, formula implementations, state machines. If the logic involves many conditions, branches, or formula steps, unit tests that cover the cases are valuable.

Validation logic: Input validation, business rule enforcement, error condition handling. These are high-value test targets because validation failures often surface as data integrity issues rather than obvious application errors.

Data transformation functions: Functions that convert between data formats, parse external inputs, or map between internal models. Transformation errors are subtle and unit tests catch them efficiently.

Security-critical code: Authentication logic, authorisation checks, input sanitisation. Unit tests here provide fast feedback on regressions in security controls.

Integration Test Priority Areas

Database interactions: Tests that write known data to the database and verify that queries return the expected results. Verify not just that data is returned, but that the correct data is returned, in the correct format, under the correct conditions.

API endpoints: Tests that call API endpoints with specific inputs and verify the full response — status code, response body structure, error handling for invalid inputs, authentication and authorisation enforcement. For a Nigerian fintech or enterprise system, API correctness is the primary correctness concern.

External service integrations: Interactions with Paystack, Flutterwave, Mono, SMS gateways, email services. These should be tested with test-mode credentials against the actual external APIs, not mocked. Mocking the external service means the test cannot catch changes in the external API's behaviour.

Transaction boundaries: Scenarios that test that database transactions are correctly committed and rolled back — the concurrent modification scenario from the opening example is a transaction boundary test.

In our experience delivering enterprise software for Nigerian clients, the test that catches the most production issues is not the sophisticated integration test — it is the contract test that verifies the bank payment file format. Nigerian banks reject files for whitespace differences, missing headers, and date format variations. A contract test suite for each bank's bulk payment format has saved our clients more production incidents than any other single test category. If your system generates files for bank submission, testing those file formats against the bank's actual specification — byte by byte — should be the first integration test you write.

End-to-End Test Priority Areas

Critical user paths: The scenarios that, if broken, would cause the most business impact. For a payments platform: initiate payment → payment processes → balance updates correctly → confirmation is sent. For an ERP: create purchase order → approve → receive goods → invoice matched → payment released.

Authorization scenarios: Verify that users can access what they are authorised to access and cannot access what they are not. A user in role X should be able to do Y and should not be able to do Z. These are easy to write but frequently omitted.

Edge cases that caused incidents: Every production incident should generate a regression test covering the specific scenario that exposed the bug. This is not optional — it is the mechanism by which the test suite grows to reflect real failure modes.

The Coverage Metric That Actually Matters

Code coverage percentage is a useful but insufficient quality signal. 80% code coverage means 80% of lines were executed during testing — it says nothing about whether the executed lines were tested with the inputs that expose bugs.

More useful coverage metrics:

Business scenario coverage: Can you list the 50 most important user scenarios in your system? What percentage have at least one automated test that exercises them end-to-end?

Error path coverage: For every documented error condition (invalid input, system unavailability, insufficient permissions, business rule violation), is there a test that exercises that error condition and verifies the correct error handling?

Regression coverage: What percentage of historical production incidents have a corresponding regression test that would have caught them?

These metrics require deliberate tracking but provide genuine signal about the test suite's value.

Test Environment Parity

Tests that run in an environment that does not match production provide lower confidence than the numbers suggest. Common environment gaps:

Test database is empty or contains toy data; production database has millions of rows — performance characteristics and edge cases involving data volume are not covered
Test environment uses a mocked payment gateway; production uses the real one — the test cannot catch integration failures
Test environment does not have the same network constraints as production — the test cannot catch timeout or connectivity issues that affect Nigerian production deployments

Investment in parallel test infrastructure — a test environment that closely resembles production in its data volume, service integrations, and network characteristics — is prerequisite to high-confidence testing.

The Practical Implementation Path

For a Nigerian engineering team inheriting an enterprise system with limited existing tests:

Write regression tests for known incidents first: Cover the scenarios that have already caused problems. This immediately raises the floor of confidence.
Write integration tests for critical paths upward: Identify the 10 most important user scenarios and write integration tests that cover them. These provide more value per test than unit tests for system-level confidence.
Add unit tests for complex business logic: Identify the calculation-heavy, state-machine, or validation-heavy code and add unit tests where they don't exist.
Set up CI enforcement: Automated tests must run on every code push and must pass before merging to the main branch. Test suites that do not gate deployments do not provide the protection they appear to offer.
Create incident-driven test requirements: Establish as a team norm that every production incident generates at least one new test before the incident is closed.

Testing is not a one-time project. It is ongoing practice that, done consistently, makes a system progressively more reliable over time.

TypeScript in Production Business Software — Type safety as a quality strategy
Deployment Pipelines: Ship Software with Confidence — From test to production safely
Technical Debt: Measuring, Managing, and Selling It to the CFO — Quality is a business investment