Paper 2

Taking Short Cuts                                

Our second presentation of the symposium provided an overview of the challenges that test developers often face when trying to develop shorter assessments that are reliable, valid and fair.

For a long time, much of the research literature and practice in this area has made the assumption that longer tests are more effective, particularly when it comes to applications which involve decision-making, such as selection. This view was discussed throughout this presentation, as well as the difficulties often faced when developing shorter tests.

Length of test and reliability

One of the assumptions of Classical Test Theory is that longer questionnaires provide greater reliability and that greater reliability should lead to increased validity.

However, there is an assumption within the formula (the Spearman Brown Prophecy Formula) that is used to provide an estimate to correct for scale length that each question item within a questionnaire will provide equivalent shared variance and will contribute equally to reliability.

In reality, this is an invalid assumption as some question items are often better than others. To chase a higher internal consistency, longer questionnaires typically contain many redundant items.

Shorter questionnaires typically have a low internal consistency (the question items measure different things). However some well researched and developed questionnaires have been shown to have strong reliability estimates in terms of alternate form and test re-test (Rammstedt et al., 2018, Saville et al, 2012). Arguably, this is the ideal scenario – a test that has broad, short but reliable questionnaire scales.

Length of test and validity

More reliable assessments should lead to assessments of higher validity. However, longer tests are not necessarily more reliable. We looked at what evidence there is to suggest that longer tests are more valid.

An important paper by Burisch (1984) compared the validity of different personality questionnaires and found that length was not an important indicator of validity. More recent work by the same author (Burisch, 1997) indicated that it was possible to select the most valid items from a personality scale. Shorter scales with as little as two items outperformed their much longer counterparts.

So, why don’t we simply develop shorter questionnaires?

The first issue is that despite shorter scales being valid with a small number of items, some behavioral traits have a degree of breadth in the content that they measure. This may make it difficult to cover the content with a small number of items.

However, there is a solution to this. Measuring behavioral traits in a scale hierarchy with narrow traits at the bottom, summing to create broader trait measures or factors at the top of your personality scale hierarchy can provide a solution to this issue. For example, a short measure of the Big Five has recently been developed by Soto and John (2017) which, as well as measuring the Big Five, measures three sub-facets using just 30 items.

A second issue for short scales is that they can struggle to have sufficient variance to profile scores. This issue was highlighted by MacIver (1997) who found that five-point Likert formats and most-least formats, which select from three or four response options, typically only provide three points towards the overall scale score.

When profiling scores, on a 1 to 10 scale, at least five and often many more items will be typically needed. To counter this, it is possible to use other formats. For example, the Wave personality questionnaire uses an interactive normative and ipsative format based on a nine-point Likert scale and six alternative items options to be ranked. This delivers a much larger range of responses from one single interactive online questionnaire item.

A final consideration noted for short scales to be used in decision-making is which level in the personality hierarchy is most valid.


Burisch, M. (1984). Approaches to personality inventory construction: A comparison of merits. American Psychologist, 39(3), 214–227

Burisch, M. 1997. Test length and validity revisited. European Journal of Personality, 11: 303–315.

Rammstedt et al., 2018 Validation of the Short and Extra-Short Forms of the Big Five Inventory-2 (BFI-2) and Their German Adaptions. European Journal of Psychological Assessment (2018)

Saville, P., MacIver, R. & Kurz, R, (2012). Saville Consulting Wave ® Professional Styles Handbook, 2ndEdition, Saville Consulting Group, Jersey.

Soto, C & John, O. (2017). The Next Big Five Inventory (BFI-2): Developing and Assessing a Hierarchical Model With 15 Facets to Enhance Bandwidth, Fidelity, and Predictive Power. Journal of Personality and Social Psychology. 113. 117-143

Find out about the other papers we presented at the DOP Annual Conference 2020:


Validity – Are Candidates Really Bothered?

It is often assumed that validity is only important to Occupational Psychologists or recruiters, but is this really true?


Scales of Potential

We discuss how to identify the distinct behaviors that are important for a particular role.


Short and Valid!

This paper explores the trade-off between the length of a personality assessment and its validity.

Want to find out more about this topic?

To find out more about our approach to assessment and how you can get started, please fill in the form below and a member of our team will be happy to help.

    Email Sent Confirmation

    Thank you for contacting us

    One of our team will get back to you shortly.