The Noun Bucket Trap

Top-down categorisation leads to weaker analysis. Take a bottom-up approach instead, using three simple rules for group size, outliers, and labels.

Early in my career I was working with a team to analyse a couple of dozen discovery interviews. We’d decided to affinity map the data, which resulted in hundreds of sticky notes, covering several metres of wall space. To tackle that much data, we analysed it as a whole team: researchers, designers, developers and a product manager. As I was off in one corner of the wall poring over a handful of notes, the rest of the team very quickly sorted the data into about 8 groups. Job done. But this didn’t feel right; we’d spent weeks recruiting participants, running the research, and writing up the notes. Did we really do all that to just learn 8 things?

Those 8 groups were noun buckets: pre-named categories like “pain points”, “needs” and “opportunities”. You’ve probably used similar categories yourself. I have, many times. But I’ve come to find they can hold back the quality of analysis if you rely on them too heavily.

Most techniques for analysing user research data involve some form of grouping individual data points (i.e. quotes and observations). There are two ways to approach grouping: top-down or bottom-up.

Top-down is where you start with some kind of framework or set of categories that you use for organising data points
Bottom-up is where you let patterns emerge by grouping data points that share a concept, creating categories after the fact

What my team had done was quickly determined a set of high-level categories and then used those to group the data. They’d taken a top-down approach.

Why top-down is less effective

Top-down is tempting. It’s generally much quicker because you’re not having to interrogate the meaning of each note. You just need to decide if it matches an existing category. This also makes it a lot less mental effort. It’s great for things like usability testing where you might be looking for a relatively narrow set of data points.

But it has some pitfalls, especially when you’re working in more open-ended research contexts. The biggest risk is that by providing categories up front, much of the analytical process is a foregone conclusion. There’s a danger you’re sorting data into a limited set of preconceived buckets, rather than doing a meaningful analysis. There are a few reasons why this can make your analysis less effective:

Unexpected findings that do not fit into existing categories fall by the wayside. You miss out on unintuitive things that can lead to breakthrough improvements in design. We do user research because we don’t presume to know everything about our users. Why then would we presume to know every conceivable category their experience could fit into?
People don’t usually express themselves in ways that match our categories. Imagine you’re running a series of discovery interviews. You want to decide if there’s an opportunity to build a product or service that addresses a real need. To do this, you have to understand users’ experiences, goals, contexts, pain points and mental models. You could have a bucket for each of those things, but human beings are not usually so dull as to articulate their lives in user researcher-speak. You need to synthesise these concepts from what participants said and did in the research.
Nuance is lost. Different underlying reasons get collapsed into a single label. Context gets stripped away: when something happens, for whom, and under what conditions. Contradictions and tensions get smoothed over. Basically, you lose a lot of the detail you need to make effective design decisions. Even if you break down larger groups, you’re usually only breaking them down within the original buckets. Once the nuance has been flattened across different categories it’s hard to reconstruct.

Bottom-up analysis, in practice

The basic approach is simple enough:

Go through individual data points one by one
Each time you see a data point that conceptually relates to another, group them together
Once you have enough data points in a group, give it a name that captures that shared concept

The specific mechanics of this differ based on your analysis method. If you’re doing affinity mapping, this means reading one sticky note at a time and only moving them when you find another one that has a similar meaning. If you’re doing thematic analysis, this means working through highlighted passages one at a time, adding a label when a passage says something new, and only reusing an existing label when it’s genuinely the same underlying idea.

But by saying it’s simple, I don’t mean that it’s easy - it’s actually quite hard. So here are some tips to help you along.

Three tips to keep you out of the noun buckets

Set a cap on the number of items in a group

Even if you go bottom-up, there is a risk that you just balloon up into big, vague buckets anyway. My favourite way to prevent this is to set a limit on how many data points can be in a single group. I’ve been doing this ever since reading it in Contextual Design by Karen Holzblatt and Hugh Beyer ^[1]:

We limit each first-level group to four notes to force the team to look deeply and make more distinctions than they would otherwise be inclined to. It pushes more of the knowledge up into the group labels.

Don’t try to make every data point have a group

If you’re going bottom-up, you’ll eventually find you have notes that just don’t seem to have any conceptual similarity to any others. I often see people shoehorn these into groups where they don’t really fit, which is another source of vague findings.

Not every note needs to be grouped. Jiro Kawakita, the inventor of affinity mapping, called these “lone wolves” ^[2]. Sometimes they might be interesting observations that stand on their own. Other times they’re noise. In either case, don’t distort a group to fit them in.

Group labels should clearly capture the idea

Giving each group a name is the point in the process where you’re creating new knowledge about your users. Coming back to Contextual Design:

When well written, the labels tell a story about the user, structuring the problem, identifying specific issues, and organizing everything we know about that issue. The labels represent new information in an affinity.

You should be able to write a succinct, narrative summary that encapsulates every note in the group. If this proves particularly hard, it’s an indication that the group doesn’t represent a single, coherent idea.

You know you’ve done this well when you can trace each high-level finding back to the specific groups, and the individual quotes and observations, that support it.

A clear signal you’re not doing this is that you’re pouring more time and effort into writing up your findings than you did your analysis. The core narrative should already be written up in the group labels. The slide deck or report is just presentational polish. I’ve seen teams do two hours of analysis, then spend three days struggling to structure and restructure their findings and insights in a slide deck. They didn’t realise that they were still trying to analyse their data, just moving slides and text boxes around instead of sticky notes.

Try this on your next project

Doing your analysis bottom-up is slower. That’s the point. It forces you to make distinctions, hold onto context, and think carefully about what deserves to be a group. This is the stuff that analysis is made of.

Try it out on your next project. Start from the unstructured data. Cap the size of groups. Allow lone wolves. Dedicate time to encapsulating meaning in group labels. Hopefully you’ll find, like I have, that your research outputs are far more insightful than before.

(Does this sound like something you want to do, but there’s no way you can spare the time? Stakeholders need the report tomorrow morning? I’ve been there too. I’m building Stitchwork to speed up this kind of analysis: bottom-up, nuanced findings that trace back to the underlying evidence. You can join the waiting list below.)

Notes

Beyer, H., & Holtzblatt, K. (1998). Contextual design: defining customer-centered systems. Morgan Kaufmann. Back to citation
Scupin, R. (1997). The KJ method: A technique for analyzing data derived from Japanese ethnology. Human Organization, 56(2), 233–237. Back to citation