What Is User Testing and How Does It Work With Prototypes


What this article is about
What user testing actually is, the relationship between testing and prototypes, why testing matters more than most owners assume, the kinds of testing worth knowing, the difference between what people say and what they do, the components of a useful test, how many users you actually need, the common failures, and a practical process for running tests at small-business scale. Written for owners commissioning websites, products, or features and wanting to make sure they are building the right thing.

User testing is one of those practices that small business owners hear about, vaguely accept is important, and quietly assume belongs to companies with budget for that sort of thing. The picture brought to mind is of a research lab somewhere, with one-way mirrors and clipboards, watching strangers click through software. The actual practice is much more modest, much more accessible, and considerably more useful than most owners realise. A few hours of focused user testing on a prototype regularly saves weeks of building the wrong thing.

The reason user testing matters is not complicated. Most websites, products, and services contain assumptions about how users will behave. Some assumptions are correct; others are not. The ones that are not become problems only when real users encounter them — and by then, the work has already been built. User testing is the practice of finding out which assumptions are wrong while changes are still cheap. It exists, in effect, to move the cost of fixing problems from the expensive stage (after launch) to the cheap stage (during prototyping). The discipline is straightforward; the leverage is substantial.

What User Testing Actually Is

User testing is the practice of putting a real person — someone who fits the actual audience for the artefact — in front of a prototype, website, product, or service and watching how they use it. Not asking them what they think. Watching what they do.

The distinction matters. What people say about a product and what they do with it are often different things. A user can confidently tell you that a checkout flow is clear, then take two minutes to find the button. A user can say they would never use a feature, then use it three times during the session. The behaviour is the signal; the commentary is secondary, useful for context, less useful for design decisions.

A useful user test consists of a small set of elements. A real artefact — a prototype, a working site, a paper sketch, anything the user can interact with. A real person from the target audience — not a colleague, not a friend who fits the rough demographic, an actual person who would plausibly be a user. A specific task — something the user is asked to accomplish, which gives the test a shape. Quiet observation while they attempt the task. A short conversation afterwards to clarify what was happening at moments of confusion or hesitation.

That is most of what user testing is. The form is simple. The information it produces is disproportionate to the effort.

The Relationship Between User Testing and Prototypes

Prototypes exist, in part, to be tested. Building a prototype creates an artefact that can be put in front of users before the production version exists. The testing reveals which parts of the design work and which do not. Those discoveries inform the production build.

This is why the relationship between prototyping and user testing is closer than the relationship between most other design disciplines. A prototype that is never tested is a prototype that has produced only internal opinions about how the design might work. A prototype that has been tested is a prototype that has produced evidence about how the design actually works in the hands of users. The evidence is what makes the production build proceed with confidence — or, more usefully, with the corrections that the testing surfaced.

The most common version of this relationship in small business work is the website prototype. A designer produces an interactive prototype that shows how the new site will look and behave. Before the prototype is approved and the production build begins, the prototype is shown to a few users, who are asked to complete a few representative tasks. The friction points become visible. The prototype gets adjusted. The production build proceeds with a design that has been validated rather than guessed at.

The same relationship applies to product prototypes, service-flow prototypes, app prototypes, and physical product prototypes. The testing is what turns the prototype from a deliverable into a decision.

Why User Testing Matters

The case for user testing rests on a simple asymmetry. Fixing a problem in a prototype is cheap. Fixing the same problem after launch is expensive.

The cheap stage. The problem exists in the prototype. The designer adjusts the design. The change takes hours. No one outside the team has yet seen the problem. The cost is minimal.

The middle stage. The problem exists in the production build, pre-launch. The change requires rework, possibly across multiple files, possibly with new approvals. The cost is several times higher.

The expensive stage. The problem exists in the live product or website. Users are encountering it. Some are leaving without completing their task. Revenue is being lost. Support tickets are accumulating. Fixing the problem now requires a coordinated update, with quality assurance, with deployment, with communication about the change. The cost is many times higher again — and that is before counting the value lost during the period when the problem was live.

User testing exists to move problems from the expensive stage to the cheap stage. It is not foolproof. Some problems escape testing and surface only at launch. The proportion that get caught early is what produces the value, and that proportion is substantial.

The owners who are most surprised by user testing are usually the ones who have been through a launch where a serious usability problem went undetected until users encountered it. After that experience, the case for testing becomes obvious. The honest reframe is to make that case before the expensive lesson, rather than after.

The Kinds of User Testing Worth Knowing

User testing comes in several forms, and the differences are worth knowing because they suit different situations.

Moderated testing. A researcher (or any attentive person, including the founder) is present with the user during the session — in person or via video call. The moderator gives the user the task, watches them attempt it, and asks follow-up questions afterwards. This is the most common kind of small-business user testing, and the most generally useful. The moderator can probe at moments of confusion and adjust the session as the testing reveals what matters.

Unmoderated testing. The user receives instructions and completes the task without a moderator present. Screen recording captures what they do. This is faster and easier to scale than moderated testing — a session that would take an hour in moderation can be completed by a user in twenty minutes on their own time — but the moderator’s ability to probe and clarify is lost.

In-person testing. The user is physically present. Useful for products or services that exist in physical space, or when the moderator needs to observe behaviours that do not show up on screen. More logistically demanding than remote testing, and rarely necessary for digital products.

Remote testing. The user is at home, on their own equipment, joining via video call. The default for most digital product testing. Less demanding logistically, and often produces more natural behaviour because the user is in their own environment.

Formative testing. Testing conducted during the design process to inform decisions. The goal is to discover problems and improve the design. Most prototype testing is formative.

Summative testing. Testing conducted near the end of a design process to evaluate whether the final design meets a defined standard. Less common in small business work, more common in enterprise and regulated industries.

For most small business situations, moderated remote testing in a formative mode is the right starting point. It is accessible, the friction is low, and the insights tend to be substantial.

What People Say vs What People Do

A useful distinction to internalise: user testing observes behaviour, not opinion.

The reason this matters is that human commentary about products is unreliable in a specific way. Users want to be helpful and polite. They tend to predict that they will use products more than they actually do. They tend to evaluate designs in light of features rather than in light of the experience of using those features. They tend to provide feedback that is shaped by what they think the team wants to hear.

The behaviour during a session, by contrast, is closer to evidence. If a user struggles to find a button, the button is hard to find — regardless of what they say about the design afterwards. If a user completes a task quickly and confidently, the design works for that task — regardless of any criticisms they offer during the debrief. The behaviour does not have to be interpreted; it is what happened.

This is why “we asked users what they think” is not user testing. Surveys, feedback forms, focus groups, and conversational interviews all produce opinions about products. Useful for some purposes, less useful for design. User testing produces behaviour, which is the harder, more reliable signal.

The implication for designing tests: ask users to do, not to evaluate. “Buy a small bag of coffee” is a task. “What do you think of the buy button?” is a question. The task produces behaviour; the question produces commentary. Strong tests are built around tasks.

The Components of a Useful Test

A user test that produces useful information has a few specific components in place.

A real prototype, however rough. The test artefact must be something the user can actually interact with. A static design that they cannot click into produces only commentary. A clickable prototype, even a simple one, produces behaviour. The fidelity does not need to be high — paper prototypes and rough wireframes can produce useful testing — but interactivity matters.

A real user from the target audience. Not a colleague. Not a friend who is “kind of like” the audience. Someone who fits the actual customer profile. This is one of the most consistent failure modes — testing the design with people who are not the audience produces signals that do not transfer to the people who actually will be the audience.

A specific task. The user has something to do. The task is realistic — something a user of this product would actually try to accomplish — and is described clearly without describing the design. “Find a pair of running shoes for someone with a wide foot” is a task. “Try out the filtering options on the website” is a tour. Tests are built around tasks, not tours.

Quiet observation. The moderator says as little as possible during the session. The user encounters the design as they would in real life — without help, without prompts, without explanations. The moderator’s job is to watch.

Follow-up conversation. After the task, the moderator asks a few open-ended questions to clarify what was happening at confusing moments. “When you paused there, what were you thinking?” “What did you expect to happen when you clicked that?” The follow-up adds context to the behaviour.

A test missing any of these components produces less useful information. A test that has all of them — even at small scale — produces evidence the team can act on.

How Many Users You Actually Need

A common worry: how many people need to be tested for the results to be meaningful? The answer is reassuringly small.

For most small business testing, five users is enough to identify the majority of usability problems. The often-cited research, conducted by Jakob Nielsen and others over many years, shows that five users typically reveal around eighty percent of the issues a larger test would surface. Going from five to fifteen users reveals proportionally less new information per user. The first few users find the big issues; the additional users find diminishing returns.

This finding consistently surprises owners who assume user testing requires statistical samples. It does not. The kinds of insights user testing produces — where users get confused, what they expect, where the design’s logic does not match their mental model — emerge from a small number of careful observations, not from large samples.

The practical implication: a small business can run useful user tests with five users per test. The recruitment is manageable, the time investment is contained, and the information produced is substantial.

For tests where the goal is to confirm a specific quantitative claim — say, that a checkout flow has fewer than two percent abandonment — larger samples are required. For tests where the goal is to find usability problems and inform design decisions, five users is usually enough.

How to Write a Good Test Task

The task is the part of the test that most often goes wrong, because writing tasks is harder than it looks. A few principles.

The task describes the user’s goal, not the design’s affordances. “Find a contact form” is a task that names the design element. “Get in touch with the business to ask about their services” is a task that names the user’s goal. Users do not arrive at websites looking for “contact forms” — they arrive looking to get in touch with the business. The task should describe the goal, not the means.

The task is realistic. It should be something a real user might actually try to accomplish. Made-up scenarios produce unnatural behaviour; realistic scenarios produce useful signal.

The task does not lead the user. “Use the search bar to find a blue jumper” tells the user where to start. “Find a blue jumper” lets the user decide whether the search bar is the right starting point. Leading tasks hide design problems by telling the user how to use the design.

The task is specific enough to be testable. “Browse the site” produces no usable signal. “Find a product you would actually consider buying for someone you know” produces behaviour the team can interpret.

The task does not test multiple things at once. Each task should focus on one user goal. Compound tasks — “sign up, then buy a product, then leave a review” — produce results that are hard to attribute to specific design elements.

Five to seven tasks per session, with the most important ones first, is a reasonable structure. The session lasts thirty to sixty minutes. The user does not get tired; the team gets enough information.

The Common Failures

A few patterns recur across user testing done badly enough to be worth naming.

Testing too late. The design is already in production. The testing happens because someone insisted, but the cost of fixing what is found has already become high. The testing should happen on prototypes, not on launched products.

Leading the user. The moderator helps when the user gets stuck. Tells them where to click. Explains what the design was meant to do. The user finishes the task successfully, but the test has produced nothing useful — the moderator’s intervention has masked the design problem.

Testing the wrong people. The five users are all internal colleagues, family members, or friends who do not fit the actual audience. Their feedback is goodwill rather than evidence. The audience the product was built for would behave differently.

Fixing every small complaint. The testing reveals minor frustrations alongside the major problems. The team tries to address every comment, which dilutes the design and slows the project. The discipline is to prioritise the patterns — the issues that multiple users encountered, in similar ways, at similar points — and to leave one-off comments for later.

Never testing at all. The most common failure. The team intends to test, never schedules it, and launches without doing it. The launch reveals the problems that testing would have caught earlier, at substantially higher cost.

Treating testing as validation rather than discovery. The team runs the test hoping to confirm that the design works. When the test reveals problems, the team is disappointed rather than informed. The reframe is to treat testing as discovery — the test exists to surface what the team did not yet know.

Skipping the follow-up. The user completes the tasks; the session ends. The follow-up conversation that would have clarified the confusing moments is omitted. The behavioural signal is rich, but the context that would have helped interpret it is missing.

Each of these failures is fixable. The most useful starting point is to plan testing into the project early enough that the results can actually inform the design.

A Practical Process for Running a Small Business User Test

For a small business about to run its first or next round of user testing, a workable sequence.

Decide what you want to learn. The testing exists to answer specific questions about the design. “Can users complete the onboarding flow without help?” “Do users understand what the business does within the first thirty seconds on the homepage?” Vague tests produce vague results.

Recruit five users from the actual audience. Not perfect users — close-enough users. Recent customers can work. Friends-of-customers can work. Professional recruiting services exist and are reasonable for higher-stakes tests. The audience match matters more than the recruitment elegance.

Write three to five tasks. Real tasks, framed as user goals, not as design instructions. The tasks should cover the most important things the design needs to support.

Prepare the prototype. Make sure it is interactive enough to be tested. Make sure the tasks the users will attempt actually work in the prototype.

Schedule short sessions. Forty-five minutes is plenty for most testing. The sessions can run over a few days; doing them all in one day is intense and tends to produce diminishing attention.

Run the sessions. Watch quietly. Take notes on what users do, where they hesitate, what they say at moments of confusion. Resist the urge to help.

Watch for patterns. After all five sessions, identify the issues that multiple users encountered. These are the patterns worth acting on. One-off comments may be useful too, but the patterns are higher priority.

Share what you found. Brief the design team, the agency, or whoever is making the design changes. The findings should drive specific revisions, not become a presentation.

Iterate. Adjust the design based on what the testing surfaced. If the changes are substantial, retest. If they are smaller, proceed with the build.

The whole process can be completed in two weeks for most small business situations. The investment is modest. The amount of downstream cost it removes is substantial.

Key Takeaways

  • User testing is the practice of putting real users in front of an artefact and watching how they use it — not asking what they think.
  • Prototypes exist partly to be tested before production; the testing is what turns the prototype from a deliverable into a decision.
  • The case for user testing rests on asymmetry: fixing problems in prototypes is cheap; fixing the same problems after launch is expensive.
  • The kinds of testing worth knowing include moderated vs unmoderated, in-person vs remote, formative vs summative — most small businesses should start with moderated remote formative testing.
  • What people say and what people do are different; testing observes behaviour, which is the more reliable signal.
  • A useful test has a real prototype, a real user from the audience, a specific task, quiet observation, and a follow-up conversation.
  • Five users typically reveals around eighty percent of usability issues; larger samples produce diminishing returns for most small business purposes.
  • Good test tasks describe user goals rather than design affordances, are realistic and specific, do not lead the user, and test one thing at a time.
  • Common failures include testing too late, leading the user, testing the wrong people, fixing every small complaint, never testing at all, treating testing as validation rather than discovery, and skipping the follow-up.
  • A practical small-business testing process can be run in two weeks with five users and a few well-designed tasks.

A note from SWL
The simplest version of this practice — five users, a clickable prototype, four well-written tasks, a quiet observer — is achievable for almost any small business and almost any project. The cost is a few hours of focused work. The downside protection is the avoidance of building the wrong thing at considerably greater expense. If you are about to commission a website, product, or feature and wondering whether user testing belongs in the plan, the honest answer is almost always yes. We are happy to help you think through where it would fit best whenever it would be useful.

prototype testing, test a prototype, usability testing, user testing methods, user testing process
>