AI Insights
Top tips for generating API tests leveraging AI
Automated API testing has always been a balancing act. On one side, speed and coverage, and on the other, maintainability, determinism, and trust. We decided to share some top tips for what would happen if you invite AI into this equation and ask it to write production-grade API tests inside a real microservice ecosystem
This article shares the results of a hands-on experiment with GitHub Copilot in agent mode, showing how AI is moving beyond toy examples and becoming a reliable contributor to automated API testing, not in theory, but in practice.
Why This Experiment Mattered
AI-generated code is everywhere. Demos look impressive. Blog posts are optimistic. Yet one question keeps coming up in engineering teams: Can AI actually generate maintainable, framework-compliant API tests, or does it just look convincing until you run them?
Our goal was definite:
- Test AI-assisted API test generation in a real-world microservice architecture
- Use existing documentation only, no peeking at implementations
- Enforce strict framework rules, type safety, and deterministic behaviour
What we explicitly did not aim for:
- 100% test coverage
- Bug hunting
- Making tests “pass at any cost”
This was not about shortcuts. It was about discipline.

The Playground: A Real Microservice System
The system under test was a fully containerised online shop built on microservices:
- Independent services (users, cart, catalogue)
- A single API gateway routing all REST requests
- Services written in different languages
- An existing TypeScript-based API test automation framework
In other words, realistic complexity.
The challenge was obvious from the start:
API documentation existed, but it was incomplete, inconsistent, and unaware of the gateway layer. Which brings us to the most important rule of the experiment.
One Source of Truth. No Exceptions
For this experiment:
Service API documentation was the only source of truth.
No gateway docs.
No “tribal knowledge.”
No fixing tests to match reality.
If the behaviour through the gateway differed from the documentation, it was treated as a bug, not something Copilot should “work around”.
This single constraint defined everything:
- The scope of generated tests
- Their correctness
- Their limitations
And it exposed something crucial about AI-generated testing.
How We Actually Generated the Tests
Letting Copilot “just generate tests” doesn’t work. It hallucinates. It invents patterns. It ignores your framework. The real breakthrough came from adding a Framework Instruction Layer.
The Instruction Layer: The Missing Piece
Before generating a single test, we created a detailed set of rules describing:
- API client inheritance and structure
- Naming conventions and file layout
- Assertion patterns
- Error handling and status code validation
- Positive vs. negative flow structure
- Good and bad examples
These rules lived in Markdown files that Copilot was required to read before writing code. Think of it as teaching Copilot how your team thinks, not just what the API looks like. Without this layer, Copilot consistently:
- Invented new abstractions
- Misnamed entities
- Ignored shared utilities
- Produced unreviewable code
With it, the quality difference was dramatic.

Prompting Is Not a Detail, It’s the System
Once the framework rules were in place, test generation followed a strict, repeatable flow:
- Define the role – Copilot acts as a Senior SDET, not a code generator.
- Constrain the scope – One service, specific endpoints, and clear scenario types.
- Specify scenario depth – Positive, negative, and edge cases, with examples.
- Attach the API documentation – JSON files only. No hidden context.
- Validate and iterate – Review style, correctness, and coverage against documentation. Fix prompts, not the code, whenever possible.
Over time, this became less like “asking AI for help” and more like programming the generator itself.
What Worked Surprisingly Well
- Generating typed API clients directly from documentation
- Producing consistent service-level tests aligned with the framework
- Reusing generated code across services for E2E scenarios
- Enforcing deterministic, independent tests
With the right constraints, Copilot behaved less like a junior developer and more like a very fast, literal senior engineer.
Where Things Fell Apart
AI is not magic, and the cracks were instructive.
1. Prompt Size and Context Loss
Large prompts caused Copilot to:
- Forget earlier rules
- Duplicate entities
- Import unused or nonexistent code
Smaller, segmented tasks worked far better.
2. “Fixing” Failing Tests
When tests failed due to real system behaviour mismatches, Copilot often tried to:
- Relax assertions
- Remove validations
- Change expectations
In other words, create false positives. Human oversight is non-negotiable.
3. Documentation Quality Is Everything
Incomplete or unclear documentation led directly to:
- Missing coverage
- Invalid assumptions
- Broken tests
AI doesn’t fill gaps responsibly; it guesses.

The Real Lesson: AI Amplifies Your Discipline
This experiment didn’t prove that AI can replace test engineers. It proved something more interesting:
AI magnifies the quality of your existing processes, good or bad.
- Strong frameworks → scalable test generation
- Clear documentation → reliable automation
- Vague rules → confident nonsense
Copilot didn’t remove the need for thinking. It punished the absence of it.
So… Can AI Write Your API Tests?
Yes, if you’re willing to do the hard work first. AI won’t save you from:
- Poor documentation
- Weak frameworks
- Inconsistent conventions
But if those foundations exist, it can:
- Speed up onboarding
- Standardise test quality
- Generate boilerplates at scale
- Free engineers to focus on intent, not repetition
The future of test automation isn’t “AI instead of engineers”. It’s AI-guided by engineers who know exactly what they want.
-
-
Learn moreGodel accelerating digital delivery with Awaze ahead of peak demand
-
Lead Java Software Engineer, Siarhei Dvaradkin
Learn moreChange Propagation: SDD’s Central Unsolved Challenge
-
Siarhei Oshyn, Head of Data / Data & AI Architect
Learn moreWhat LLM will be the best choice for your business?
-
Valdemaras Girštautas, Jr, JavaScript Software Engineer
Learn morePrompt Context Types: Key Experimental Findings
-
Learn moreGodel helps Welbeck Health turn AI ambition into action