Summative usability testing is summative evaluation of a product with representative users and tasks designed to measure the usability (defined as effectiveness, efficiency and satisfaction) of the complete product.
Summative usability testing is used to obtain measures to establish a usability benchmark or to compare results with usability requirements. The usability requirements should be task-based, and should tie directly to product requirements, including results from analytic tools such as personas, scenarios, and task analysis. Testing may validate a number of objective and subjective characteristics, including task completion, time on task, error rates, and user satisfaction.
The main purpose of a summative test is to evaluate a product through defined measures, rather than diagnosis and correction of specific design problems, as in formative evaluation. The procedure is similar to a controlled experiment, testing the product in a controlled environment. However, it is common to note usability problems that occur during testing, and to interview the participant after the task to obtain an understanding of the problems.
The theoretical background of this method can be found in scientific experiments, especially those applied in social sciences and psychology. In such an experiment, hypotheses are tested by modifying an independent variable in a controlled environment. The effects of this modification on one or several dependent variables are then measured and statistically analyzed. In the early 1980s such experiments were first transferred to usability testing and it is therefore quite hard to mark the exact point in time, when 'summative testing' was formally developed out of these methods. An important aspect is the separation from a user test, which tries to identify usability problems but does not qualify for statistical analysis of quantitative measurements. Such a test is often referred to as informal testing or formative evaluation. Summative usability testing is sometimes also referred to as user performance testing or formal evaluation and tries to fulfil the requirements of scientific experiments.
As described above, the history of this method can be found in social sciences and psychology and therefore goes back a long time in human history. The adaptation of the method to usability testing began in the early 80s and since then has been a long journey and therefore a lot of different definitions and slightly different approaches exist. The MUSiC project (1993) can be seen as one important step to formalize the method with respect to software evaluation.
ISO 9241-11 standardizes usability measures and also provides a general procedure for summative usability testing. The Common Industry Format for Usability Test Reports (now ISO/IEC 25062) marks also an important step in the method development, since it formalizes the output of the method.
ISO 20282 parts 2, 3 and 4 contain summative test methods to measure the ease of operation and installation of everyday products.
Benefits, Advantages and Disadvantages
The main goal of the method is to measure the usability of a product. This allows checking if usability goals are met and to be able to compare the product with competing products or earlier/different versions of it. Possible measurements are efficiency, effectiveness and user satisfaction which are normally measured by recording task completion times, success rate/accuracy and subjective user ratings derived from questionnaires.
As the term summative evaluation suggests, the method should be mainly applied in later stages of development. This allows integrating real tasks and, since the evaluation object is completed or nears completion, excluding possible interfering variables such as system crashes or incomplete functionality. It is also used in post development, e.g. to test if usability goals were met or for marketing purposes (testing vs. a competing product)
This description highlights issues that are important for summative usability testing. A more detailed description of How to Do It can be found in Usability testing.
The standard procedure can be divided into three parts.
Participants and Other Stakeholders
Participants should be representative of the user population for whom the application is being designed.
Data Analysis Approach
Costs and Scalability
Ethical and Legal Considerations