Modern Agile Engineering – The Complete Guide to Real-World Agile Software Development▼

All Series (154)Microservices Architecture & Patterns – The Complete Guide (35)Modern Agile Engineering – The Complete Guide to Real-World Agile Software Development (8)Software Architecture Fundamentals – The Complete Guide to Modern System Design (32)Design Decisions in Software Architecture (9)Domain-Driven Design – A Complete Guide to Modeling Complex Systems (12)Quality Engineering – The Complete Guide to Modern Software Testing (1)AI & the Future of Work in Software – Skills, Roles, and Mindset for the AI Era (3)Software Security Fundamentals – The Complete Guide to Authentication, Authorization, and Secure Systems (35)Spring Boot – The Complete Developer Guide (6)Micronaut for Spring Boot Developers – The Complete Guide (13)

Learning Paths

Browse All

All Learning Paths154

Learning Paths

Microservices Architecture & Patterns – The Complete Guide35

Modern Agile Engineering – The Complete Guide to Real-World Agile Software Development8

Software Architecture Fundamentals – The Complete Guide to Modern System Design32

Design Decisions in Software Architecture9

Domain-Driven Design – A Complete Guide to Modeling Complex Systems12

Quality Engineering – The Complete Guide to Modern Software Testing1

Software Security Fundamentals – The Complete Guide to Authentication, Authorization, and Secure Systems35

Spring Boot – The Complete Developer Guide6

Micronaut for Spring Boot Developers – The Complete Guide13

Last Updated: May 29, 2026 at 09:30

Agile Spikes: How Engineering Teams Reduce Uncertainty and Make Better Technical Decisions

How engineering teams make better technical decisions before committing to them

A spike is an investment in decision quality — its value lies in the output it produces, but even more in the reduction in the cost of being wrong. That framing changes how you plan them, run them, and measure whether they succeeded. This guide covers the full picture: what spikes are, when to use them, how to structure them for real outcomes, and what separates the teams that use them well from those that do not.

A spike is an investment in decision quality. Its value lies in the output it produces — but even more in the reduction in the cost of being wrong.

Spikes originated in Extreme Programming (XP), where teams used short, focused investigations to reduce technical uncertainty before committing to implementation. The idea was simple: if you do not know whether something is feasible, a brief exploration is worth more than a confident guess.

Spikes exist because some engineering decisions are expensive to reverse, and building on a flawed assumption costs far more than taking a few days to examine it properly. When the stakes are high enough, structured investigation is not overhead — it is responsible engineering.

One important nuance up front: a spike does not eliminate uncertainty — it reduces uncertainty to a level where a decision becomes safe to make. Teams that expect a spike to remove all doubt will either over-invest in investigation or feel perpetually unsatisfied with the outcome. The goal is not certainty. It is enough clarity to move forward confidently.

What a Spike Actually Is

A spike is a time-bounded investigation designed to answer specific questions so that future delivery work can proceed with greater confidence. Those questions typically fall into a few categories: can we build this at all, how should we build it, which approach involves acceptable tradeoffs, what are the risks we have not yet seen, and does this technology fit the way our organisation actually works. It is not a mini-project. It is not a feature with relaxed quality standards. It is not "we'll look into it."

The distinction matters because mischaracterising a spike changes how teams approach it. If a spike feels like partial feature work, teams will try to make it production-ready. If it feels like open-ended research, it will never end. A spike is neither. It is a structured attempt to answer defined questions within a defined window of time.

The central principle: spikes optimise for learning speed, not implementation quality. Tests may be minimal. Code coverage can be ignored. Hardcoded values are acceptable. Shortcuts are fine — provided the spike answers the questions it was created to answer.

Spike versus POC versus Prototype

These three terms are closely related, and the distinctions are worth being precise about. The clearest way to separate them is by audience and intent.

A spike is internal. It answers a question for the team so they can make a decision. It may produce nothing demo-worthy. Nobody outside the team may ever see it. The measure of success is whether the question got answered.

A proof of concept is often external. It demonstrates to a stakeholder that something can work. It proves viability rather than exploring options. Where a spike might compare three approaches, a POC usually validates one proposed direction.

A prototype sits somewhere between them — it is typically more complete than a spike, more interactive, and often used to test assumptions with users or stakeholders rather than to answer internal engineering questions.

The practical test: if you are showing it to someone outside the team to justify a direction, it is a POC or prototype. If you are using it to decide which direction to take, it is a spike.

Types of Spikes

Not all uncertainty is the same, and neither are spikes. The categories below cover the most common patterns, but in practice spikes often blend across types or surface uncertainty that does not fit neatly into any single category.

Technical spikes address implementation uncertainty. Can we build this? Which approach performs better? How do we integrate with this external system? Examples include evaluating Redis versus PostgreSQL for a caching layer, testing Kubernetes ingress patterns, or exploring authentication flows across multiple providers.

Functional spikes address product and requirements uncertainty. What should we build? What do users actually need? Examples include clarifying business rules with stakeholders, researching compliance requirements, or mapping the real user workflow before designing a feature.

Architectural spikes address structural uncertainty. Should we decompose this monolith? Is an event-driven approach feasible here? What are the real tradeoffs between these two system designs? These tend to be higher stakes because the decisions are harder to reverse.

UX and design spikes address interaction uncertainty. Does this onboarding flow work? Can we meet the accessibility constraints? Is this interaction model intuitive before we build the full thing?

Mixing spike types without acknowledging it is a common source of weak outcomes. A team running what they think is a technical spike may discover the real uncertainty is functional — and without naming that, they will investigate the wrong questions.

The Spike as a Decision System

A powerful way to think about spikes is as a structured pipeline from uncertainty to decision — not just a research activity, but a mechanism for reaching a specific conclusion. Each spike should move through the following stages:

1. Identify the uncertainty — what specific unknowns are blocking a reliable decision or estimate?

2. Choose the spike type — is this a technical, functional, architectural, or design question? Define it clearly before starting.

3. Define the questions — precise, answerable questions with explicit success criteria. The difference between a weak spike and a strong one often comes down to this step. "Investigate GraphQL" is not a question — it is a topic. "Can GraphQL support our mobile aggregation use case without introducing unacceptable query complexity or caching overhead, given our current backend architecture?" is a question. It has a scope, a constraint, and a definition of what an acceptable answer looks like.

4. Run the timeboxed investigation — explore, build, test, or analyse within the agreed window.

5. Evaluate the outcome — assess what was learned against the questions defined at the start. This goes beyond "does it work?" A technology can be technically functional but operationally painful, expensive to maintain, beyond the team's current capability, or incompatible with existing architecture. The evaluation should explicitly surface the drawbacks discovered and ask: given our context, can we tolerate these? A slow query that appears in benchmarks may be acceptable for an internal tool and unacceptable for a customer-facing API. A steep learning curve may be fine for a platform team and prohibitive for a product team under delivery pressure. Tolerance is always contextual. This step is about analysis — the team thinking honestly together about what was found. It sets up the decision but does not make it yet. Finding that a technology has real drawbacks is not a failure — a spike that surfaces serious problems is doing exactly what it should.

6. Decide — where the analysis becomes a commitment. The team names a conclusion, documents it, and identifies the next action. This is what separates a spike from an investigation that simply expires. The following cover the most common outcomes depending on the type of spike, but the right conclusion is always whatever honestly reflects what was learned:

Proceed with a clear recommendation
Reject the approach with documented reasoning
Adopt partially — the approach works for a defined scope but not the full original intent
Run a focused follow-up spike on a remaining unknown
Move to a limited production pilot for validation at scale
Revisit later — the approach has merit but is not ready yet due to team capability gaps, ecosystem immaturity, or missing infrastructure
Publish implementation guidance — when the spike was about how to use a technology rather than whether to use it, the output is recommended patterns, configuration choices, pitfalls to avoid, and a clear implementation path for the team
Confirm requirements or business rules — when a functional spike resolves ambiguity about what to build, the output is agreed scope, clarified acceptance criteria, or refined user stories ready for implementation
Validate or discard an interaction model — when a UX or design spike tests an interface approach, the output is a justified design direction, evidence of what works for users, and a clear rationale for what was set aside

When to Use a Spike

The case for a spike strengthens when uncertainty meaningfully affects delivery. Some common signals — though not an exhaustive list:

Estimates are unreliable because the approach is unclear
Feasibility is genuinely unknown
The implementation cost is high and the architectural impact is large
Multiple valid approaches exist with unclear tradeoffs
The technology is unfamiliar to the team
External integrations carry significant risk
The rollback cost is expensive if the wrong path is chosen

Choosing an event streaming platform, selecting an authentication provider, deciding whether to decompose a monolith, or introducing AI tooling into production workflows — these are the kinds of decisions where a spike pays for itself quickly.

The reason is straightforward: the earlier a decision is made with confidence, the cheaper it is to act on. Changing direction during a spike costs hours. Changing direction mid-implementation costs weeks. A spike is not just an investigation — it is an early decision mechanism, and that is precisely where its return on investment comes from.

The importance of the decision should determine the investment in the spike. Not every uncertainty justifies structured investigation.

There is also a subtler case: sometimes teams cannot clearly identify their risks because the technology is too unfamiliar. That inability to name the unknowns is itself a signal that investigation is warranted.

When NOT to use a spike

Small UI tweaks, isolated CRUD changes, reversible implementation details, and standard framework functionality do not need spikes. But there are also better alternatives to consider before defaulting to one:

Just build it iteratively — for low-risk work, incremental delivery surfaces real problems faster than investigation
Architectural discussion only — a focused team conversation or design session may be enough for well-understood patterns
Lightweight investigation — a few hours of reading documentation or running a quick experiment does not always need the formality of a spike story
ADR without a spike — if the team already has enough experience to document a decision with confidence, write the ADR directly

Running a spike for low-risk, well-understood work creates process overhead without corresponding value. Mature teams use spikes selectively rather than reflexively — and that selectivity itself reflects a skill: the ability to tolerate some ambiguity without requiring formal investigation. Teams that over-spike often lack confidence in their own judgement, not information.

How to Write a Good Spike Story

A vague spike creates vague learning. The most common failure in spike execution happens before the spike begins, when nobody clearly defines what questions need answering.

A well-formed spike story includes:

Problem statement — what uncertainty is blocking progress?
Specific questions to answer — not "investigate GraphQL" but "determine whether GraphQL can support our mobile aggregation use case without introducing unacceptable query complexity or caching overhead"
Explicit unknowns — what assumptions are being made that could be wrong?
Constraints — what are the non-negotiables (cost, compliance, existing architecture, team capability)?
Timebox — how long will this investigation run?
Success criteria — how will the team know the spike is complete?
Expected deliverables — what does done look like?

The shift from "investigate X" to "answer these specific questions about X within this timeframe and produce this output" is the difference between a spike and a research black hole.

Timeboxing and Estimation

Spikes should be estimated in time, not story points. The work is exploratory by nature, and the point of the spike is often to enable estimation of the work that follows — estimating the spike in story points introduces a circular problem.

How long a spike takes depends less on its type and more on the people running it. A senior engineer deeply familiar with the domain may answer in hours what takes a less experienced engineer days. The quality of the questions defined upfront matters too — a well-scoped spike with clear boundaries moves faster than one where the team is still discovering what they are trying to learn mid-investigation. What spike type can tell you is roughly how much surface area the investigation covers, but it cannot tell you how quickly a given team will cover it. The only honest approach is to agree a timebox based on the specific questions, the people involved, and the acceptable level of uncertainty remaining — and then hold to it.

The timebox is not just a planning constraint. It is a forcing function. A team that knows they have two days to answer a question will make prioritisation decisions that an open-ended investigation never forces. When the timebox expires, the team should be able to present findings — even if those findings are "we need a second, more focused spike on this specific aspect."

Longer spikes increase the risk of uncontrolled exploration. When a spike starts to feel comfortable, it has usually become something else.

Who Should Run a Spike

Solo spikes are the most common starting point, but they are not always the best one.

When one engineer runs a spike alone, the team gets one perspective, one set of assumptions, and one bias toward familiar technologies. The learning stays concentrated. The rest of the team has to take the outcome on trust rather than understanding it.

Paired spikes are better for most situations. Two engineers can explore different paths simultaneously, challenge each other's assumptions, and arrive at a more robust recommendation. The knowledge distribution is better, and adoption of the decision tends to be stronger when more people helped shape it.

For architectural or high-stakes technology decisions, a cross-functional spike is worth considering. Involving product, QA, security, or platform engineers exposes assumptions that pure development thinking misses entirely. A technology that looks elegant to developers may create operational nightmares, compliance issues, or QA complexity that would have been caught in the spike if the right people were in the room.

Collaborative spikes also serve a second purpose: they are alignment tools. Teams often skip spikes not because they lack uncertainty, but because they assume everyone already agrees on the approach. They do not. Running a spike together frequently uncovers that different team members have different mental models of the architecture, different assumptions about constraints, and different risk tolerances. Discovering those disagreements during a spike is cheap. Discovering them during implementation is expensive.

Think Beyond the Code

A natural starting point is asking: can we write the code? But the more valuable questions go further.

Can we deploy it? Can we monitor it? Can we scale it under real load? Can we troubleshoot it at 2am during an incident? Can new engineers learn it in a reasonable time? What does the support model look like in 18 months?

A spike that validates technical implementation but ignores operational reality produces a false sense of confidence. The full lifecycle matters: development, testing, CI/CD, deployment, observability, security, operations, incident response, onboarding, cost, and long-term ownership.

This is especially true for technology adoption decisions. A technology can be technically impressive but culturally incompatible, financially unsustainable, or operationally burdensome. A spike should evaluate licensing cost, infrastructure and running costs, ongoing maintenance burden, operational maturity, learning curve, debugging complexity, ecosystem maturity, hiring implications, documentation quality, and whether the technology fits the organisation's actual engineering maturity — not just an idealised version of it.

Sometimes the spike discovers that the problem is not the technology itself. The problem is missing platform maturity, insufficient monitoring, weak operational ownership, or a deployment process that cannot support the new system. Those are valuable findings, even though they are uncomfortable ones.

Defining Success Criteria and What Good Output Looks Like

A spike without success criteria is a spike with no definition of done. Teams should agree before the spike begins on what a successful outcome looks like — and that answer should be tied to the questions defined in the spike story, not to the quality of any code produced.

Useful spike deliverables include: a clear recommendation (adopt, reject, or adopt conditionally), documented tradeoffs, an architecture sketch, benchmark results, a risk assessment, refined implementation stories, an estimated complexity for the work ahead, and identified unknowns that remain.

Equally important are the unplanned discoveries — things the investigator came across that were not in the original questions but are too significant to ignore. A spike on caching strategy might surface an undocumented rate limit on a third-party API. An architectural spike might reveal that a dependency the team assumed was stable is no longer maintained. A technology evaluation might uncover a licensing constraint that changes the economics of the decision entirely. These findings were not on the radar when the spike started, but they belong in the output. A good spike report should explicitly include a section for unexpected learnings alongside the answers to the original questions — because sometimes what was not being looked for turns out to matter most.

An Architecture Decision Record is often the right format for architectural spikes. It creates a durable artefact that explains not just what was decided but why, and what alternatives were rejected. This becomes especially valuable when engineers leave, when decisions are revisited, or when other teams face similar choices.

If nobody uses the output, the spike did not happen in any meaningful sense. Findings that stay in one engineer's head, or a document that nobody reads, represent wasted investment.

How Spikes Fail

Understanding the failure modes is as important as understanding the method. These are the patterns that appear most frequently, though any spike without clear questions, a timebox, and a defined output is vulnerable in its own way.

Permanent research mode is the most common failure. Teams investigate endlessly without converging on a decision. This usually happens when success criteria were not defined upfront, or when the team is unconsciously using the spike to avoid committing to a direction.

Hidden development work is the second most common failure. The spike quietly becomes feature implementation, bypassing normal quality standards, architecture review, and stakeholder visibility. This creates technical debt and governance problems that surface later.

No knowledge sharing means one engineer learns something valuable and the rest of the team learns nothing. The spike produces individual knowledge rather than organisational knowledge.

Spikes used to avoid estimation is a culture symptom worth naming explicitly. Teams that over-spike often have an unsafe estimation environment where people fear committing to numbers. The spike becomes a loophole rather than a genuine uncertainty-reduction mechanism.

Over-spiking is its own failure mode, and it reflects a maturity problem in reverse. When teams spike everything instead of building uncertainty tolerance, they create process overhead and slow delivery. Some uncertainty is normal and should be addressed through iterative implementation, not structured investigation. Mature teams spike less not because they have less uncertainty, but because they have developed the confidence to tolerate more ambiguity before it becomes genuinely decision-blocking.

A spike should not be treated as a mechanism for quietly avoiding commitment. It should end with a decision, or a clear articulation of what additional investigation is needed to reach one.

When One Spike Leads to Another

Spikes are not always one-time exercises. It is common — and perfectly normal — for a spike to uncover new unknowns that justify a follow-up investigation.

An initial spike might validate that a technology is technically feasible, while leaving operational concerns unresolved. Or scalability remains untested at meaningful load. Or security implications emerge mid-investigation that were not anticipated. Each of these can justify a focused second spike rather than proceeding blind.

The key is that follow-up spikes should be narrower and more targeted than the original. Each one should answer specific residual questions rather than reopening the broader investigation. The chain — Spike → Follow-up Spike → Pilot → Decision — is a legitimate and often appropriate progression for high-stakes architectural choices. It becomes a problem only when the chain is used to avoid decision-making indefinitely rather than to reach a genuinely better-informed conclusion.

Communicating Spike Outcomes

Stakeholders and product managers often perceive spikes as the team "not doing real work." Engineering leads need to be able to translate spike outcomes into language that makes sense to non-technical audiences.

The framing that works: a spike is an investment that prevents a larger, costlier mistake. "We spent two days learning that approach X would have caused us to rebuild the data layer three months from now" is a compelling outcome. So is "we validated that this technology is the right choice, which means the next six weeks of implementation work can proceed confidently."

Spike findings should be shared beyond the immediate team. A short demo, a written summary, or a recorded walkthrough turns a single team's learning into organisational knowledge.

Spikes as Organisational Memory

A spike generates far more knowledge than the decision it leads to. During the investigation, the team learns how to work with a technology, which features are well-suited to the use case and which are not, what the best tooling and configuration choices look like, where the sharp edges are, and how the documentation holds up against real problems. That accumulated practical knowledge has value well beyond the immediate question the spike was created to answer.

When that learning is documented, it becomes a reusable asset. The next engineer evaluating the same technology does not start from zero. Onboarding is faster. Follow-up spikes are more focused. Implementation work begins with hands-on context rather than theory.

The risk is that none of this gets recorded. When spike learning lives only in one engineer's head, the organisation loses it the moment that person moves on, changes team, or simply forgets. Undocumented spikes create repeated research costs — rejected approaches get rediscovered, failed experiments repeat, and teams pay the price of investigations they already ran.

Documented spike outcomes — ADRs, summary documents, benchmark results, feature evaluations, tooling recommendations, recorded demos, annotated code, linked wiki pages — create durable organisational memory. Future teams can understand not just what was decided, but what was tried, what worked well, what did not, and why certain paths were not taken. That context is often more valuable than the decision itself.

This is one of the strongest arguments for treating spikes as a first-class engineering activity rather than informal background work. The learning belongs to the organisation, not to the individual who did the investigation.

A Checklist for Healthy Spikes

Before starting:

Is the uncertainty genuinely blocking estimation or design?
Are the specific questions clearly defined?
Is the output format agreed upon?
Is the timebox set and committed to?
Are success criteria defined?
Are the right people involved?

At the end:

Were the original questions answered?
What was learned — including what was ruled out?
What problems were found, and can the team tolerate them?
What is the recommendation: proceed, reject, follow-up spike, or pilot?
What risks remain?
Is the outcome documented and shared?
What are the next delivery steps?

The Real Output Is Organisational Learning

Code produced during a spike is usually throwaway. The real output is better decisions, reduced uncertainty, shared understanding, and implementation confidence. A spike that produces a clean recommendation and a team that understands why — even if the code is deleted — has delivered more value than a spike that produces elegant prototype code but leaves the team uncertain about what to build next.

Spikes are not about writing software. They are a structured pipeline from uncertainty to decision — and the measure of that pipeline's success is not what was built during the investigation, but the quality of what gets built afterwards.

When they are run well — with clear questions, appropriate timeboxes, honest tolerance assessments, the right people involved, and structured knowledge sharing — spikes make subsequent delivery calmer, more predictable, and more likely to succeed. The uncertainty does not disappear, but its cost moves from mid-delivery crisis to pre-delivery learning.

That is the investment worth making.

About N Sharma

Lead Architect at StackAndSystem

N Sharma is a technologist with over 28 years of experience in software engineering, system architecture, and technology consulting. He holds a Bachelor’s degree in Engineering, a DBF, and an MBA. His work focuses on research-driven technology education—explaining software architecture, system design, and development practices through structured tutorials designed to help engineers build reliable, scalable systems.

Disclaimer

This article is for educational purposes only. Assistance from AI-powered generative tools was taken to format and improve language flow. While we strive for accuracy, this content may contain errors or omissions and should be independently verified.