Skip to content

How to minimize human bias in design testing (using benchmarks to track UI progress)

How to minimize human bias in design testing

UI Design happens at the intersection of business needs and user needs. It’s easy to see if business needs have been met -- a simple checklist will work. 

But user needs are different, and require reliable tools to measure. For decades, design professionals have used the System Usability Scale (SUS) to collect a single measurement that expresses the usability and the learnability of a given interface. This technology-independent measurement has become an industry standard, referenced in over 600 publications. 

Most people respond positively when asked “How are you?” in casual conversation. Humans generally strive for social acceptability, and this can bias the results of usability tests and user satisfaction studies. 

SUS is designed to control this bias through using a rating system rather than direct questions. It’s theoretically easier for a user to provide a response that could be construed as being “negative” if the user is responding to a statement rather than a question.  

Study participants are asked to respond to ten statements using a five-point scale. For example, the first statement is “I think that I would like to use this system frequently” and the possible responses are:

1. Strongly Disagree
2. Disagree
3.  Neutral
4. Agree
5. Strongly Agree

The second statement is a “negative” statement, “I found the system unnecessarily complex”, with the same set of possible responses. The positive/negative pattern continues throughout. 

The numerical responses are then normalized to produce a range of zero to four, with four being the most positive. Since there are ten statements, the highest possible total is 40. The total is multiplied by 2.5 to bring the possible total to 100. This final number is used as a benchmark. 

What is a “good” SUS score? 

Well, anything higher than your benchmark is good, but the average SUS score derived from over 500 studies is 68, so a score higher than 68 would mean that your interface has a better than average SUS score. A score of 80 means that users are likely to recommend the product being studied to a friend or colleague.  

For a simpler measure of user satisfaction, UXD at ACI uses a Net Promoter Score. This is similar to SUS, but it uses only a single question: “How likely are you to recommend this product to a friend or colleague?” with a response scale ranging from 0, “Not likely at all” to 10, “Highly Likely”. 

The scoring is also simple: Any score from 0 – 6 is labeled a “detractor” while a 9 or a 10 is a “Promoter”. Detractors are subtracted from Promoters to produce the Net Promoter Score. Since this method is widely used in many industries, benchmarks are available, making it possible to compare your scores with other companies in your industry. 

Both SUS and NPS are useful throughout the life of a project. They are used to justify design work and to support design decisions through scientifically gathered data directly from users.