Article · 7 minute read

What Does the Emergence of ChatGPT Mean for the World of Assessment?

By Sarah Chan & Jake Smith – 1st June 2023

46% of job seekers reported having used ChatGPT to write their CVs and/or cover letters in their recent job applications, according to a survey conducted by in early 2023. 

In this short article, we examine generative artificial intelligence (AI), such as ChatGPT, and some of the issues in relation to accurately assessing individuals. 

What is ChatGPT?

The latest generation of AI tools have created renewed interest in what AI is capable of. The newest tools offer advanced applications, for instance, generating content in different formats. 

ChatGPT has in particular been grabbing the headlines. It ‘learns’ the patterns and structure of natural language through consuming huge amounts of data scraped from the internet, such as books, articles and websites. This information is used to create a large language model (LLM). It then uses this model to generate responses to simple prompts, word by word, based on predicting which word is most likely to come next. Consequently, the outputs it produces mimic the data it has been trained on. 

A real strength demonstrated by ChatGPT is its capability to provide quick, useful summaries of complex topics, performing simple tasks such as composing a short email, or even rewriting information a user has input to make it easier to read. But it’s not all upside – while tools like ChatGPT can do a good job of generating text outputs based on simple prompts, it is important to note that they have been found to exhibit errors. Critically, as they do not ‘look up’ information, they are not always accurate (despite how convincing they might seem). ‘Hallucinations’ is the term given to content generated by AI that is inaccurate or just plain wrong. They are also easily led – ask ChatGPT to produce a positive review of something awful and it will happily oblige! 

What are some of the issues ChatGPT brings to assessment?

Among various assessment methods, the greatest concern is where written information is supplied by the candidate and assessed by the organization. As the survey by suggests, CVs and cover letters are the most obvious examples where candidates may use ChatGPT to help produce better quality writing to make a good first impression. Application forms that require candidates to submit written evidence of competencies is another assessment method that is likely to be susceptible to ChatGPT. 

Whilst there has been a shift away from a focus on academic results for many organizations in how they hire new recruits, there is still potentially the knock-on concern for employers in the future around the credibility of academic results that are based on coursework essays done without supervision. This is certainly an area to watch. 

It is also worth considering where ChatGPT and other AI could assist candidates which might not appear so obvious. For example, as ChatGPT can transform a rough paragraph into a much more coherent and persuasive piece of text, a candidate could prepare for a recorded online interview (where they have been given the questions in advance) by using ChatGPT to improve their responses. 

What about Saville’s assessments?

Will ChatGPT help candidates score well in assessments? What are the implications for the most commonly used assessments in our portfolio? We conducted trials using practice content to answer these two questions. 

Aptitude Tests 
While our verbal tests are solely text-based, all the others in our aptitude test portfolio use non-verbal information or a mix of verbal and non-verbal information, such as diagrams, symbols and graphs. These non-verbal formats are not something that ChatGPT directly addresses. When trialling how ChatGPT would deal with some example test questions, we observed ChatGPT make simple logical errors, and there are times when it fails to fully understand a question or the arguments in a passage of text. This highlights the limitation of the tool in that it is not capable of reasoning, even with text-only content. 

Additionally, our tests are strictly timed. This naturally makes inputting the test content, along with necessary further clarification, into ChatGPT for it to provide a credible answer very difficult to achieve. 

Finally, it is worth being aware that ChatGPT does not have the scoring key for our tests; the tool is merely generating text using the data on which it has been trained and based on inputs from the user. 

Wave® Questionnaires 
Although there is no time limit, the proprietary response format and scoring mechanism of our Wave questionnaires are highly effective in detecting inconsistent or unrealistic responses. The ‘rate and rank’ format of Wave makes it difficult for ChatGPT to firstly provide a precise rating on an item, and then sensibly rank two or more items while remaining consistent across the different constructs measured. Crucially, ChatGPT does not know how to create a personality profile that is appropriate for a particular job, or reflects the candidate’s personality and, therefore, a candidate would struggle to replicate the persona at interview or in a feedback session. 

Situations (SJTs) 
Similar to the Wave questionnaires, it will be unlikely for candidates to gain any advantage by using ChatGPT to complete our situational judgment tests (SJT) due to the sophisticated format and scoring mechanism used. Some SJT formats involve asking the candidate to compare and rank multiple items at the same time. This can be potentially more straightforward for an AI such as ChatGPT to deal with than other formats. 

Our SJT format, where items are presented one-at-a-time, requires more nuance in response, as a judgment of effectiveness is required to be made for each item independently. This format is more demanding for an individual to input into ChatGPT, and much more challenging for ChatGPT to then produce an appropriate response with sufficient precision on the rating scale. Moreover, as in the case with aptitude, both the scoring mechanism and scoring key are not available to either the candidate or ChatGPT. 

Final Thoughts

ChatGPT is likely to push organizations to rethink the assessments they currently use, whether they can still accurately and fairly assess job applicants. The greatest concern ChatGPT brings to assessments are undoubtedly those which involve written materials prepared by candidates in their own time, such as CVs, cover letters, competency-based application forms and recorded online interviews. It may be that organizations will need to reduce the emphasis they place on these assessments, and to ensure additional assessments that can effectively assess candidates’ suitability are used in the selection process. 

When it comes to using our assessments, we also recommend the use of more than one assessment at the screening stage of a recruitment program. For example, using our Match 6.5 behavioral screening tool in combination with an SJT and/or an aptitude test provides a range of benefits. Not only does it increase the breadth of assessment while offering multiple layers of security, the different assessments used can be given varying weights to fully tailor measuring performance in a specific role. The optimized scoring algorithm produced from this approach maximizes fairness and validity, along with offering recruiters an efficient and cost-effective assessment solution. 

As new advances are made in technology and AI, it is important to understand both the risks as well as any potential benefits they could bring. The specific features embedded in Saville Assessment Wave, Swift Aptitude, and scenario-based SJTs are useful in providing reassurance that tools such as ChatGPT do not pose a material threat to the integrity of assessment results. Clearly, all assessment users need to consider the use of written evidence as part of the assessment processes, and remain vigilant to the future developments in AI. 

If you’d like to discuss the use of ChatGPT in assessment further, or have a project you require our help with, get in touch today!

The views and opinions expressed in this blog post do not necessarily relfect those of Saville Assessment and its employees.