Unpacking the science of AI-driven video assessments in the South African context

Despite only appearing a few years ago on the list of influential trends in IOP, Artificial Intelligence (AI) quickly became one of the most important technologies for IOPs throughout the world (Source: SIOP annual workplace trends survey, 2020).

As such, IOPs need to advise their business partners on how to leverage this exciting new technology while maintaining fair and ethical selection practices.

To aid our clients and fellow IOPs in this objective, TTS has partnered with HireVue, a US-based pioneer in video assessment and AI-driven technologies. Enabling diversity and maintaining fairness are guiding principles for HireVue. They have set new benchmarks for candidate experience, such as using their net promoter scoring to measure the impact of assessments on potential talent. Indeed, their AI-enabled video assessment offerings are based on years of research and development and are continually screened for potential adverse impact and bias.

While a small South African sample was represented in the research HireVue has undertaken in the past, large-scale data had been lacking.

To remedy this, our research team has completed a study of local South African applications of the HireVue solution.

Our Study: AI-assisted scoring in a local South African sample

One of our local clients recently completed a large-scale graduate recruitment drive utilizing the HireVue assessment process giving us an excellent opportunity to explore the functioning of the video assessments (and related AI-assisted scoring) in the South African context.

When introducing South African clients to assessment instruments and methodologies developed elsewhere in the world, regardless of the supporting research, the common questions and concerns that we encounter can be summarized as, “Yes, but will this work in the South African context?” As such, we set out to explore a key research question, namely: “Can AI successfully predict behavioral competency ratings of South African candidates?

The results reported by HireVue suggested that internationally trained AI algorithms could predict behavioral competency ratings of South African candidates, but we wanted to evaluate this question for ourselves.

So the main purpose of the research was to explore the level of agreement between AI-evaluated and human-scored interview ratings, for a sample of South African applicants, evaluated by South African based, expert raters.

Our Sample

For this research, we used data from interviews of over 1,500 graduate applicants from a client organization in the food manufacturing and consumer goods industry, who kindly gave permission for us to use their data.

As part of their graduate recruitment process, applicants completed HireVue’s Graduate Hiring Assessment Solution, which consists of six competency-based video interview questions and a series of game-based cognitive tests measuring critical thinking, while the communication competency was measured across all six video responses.

We decided to construct our study as a smaller-scale replication of HireVue’s process to develop structured interview competency algorithms.

Within the study, we focused on convergent validity, and instead of working with thousands of candidate interviews, we drew a stratified random sample of 200 interviews from the top, middle and bottom tiers as determined by HireVue’s AI algorithms.

Here was the first difference noted between our study and the international research previously carried out by HireVue: while HireVue’s research sample constituted 36% White and 33% Hispanic candidates with only 17% Black candidates, our sample included 90% Black candidates (as per the Employment Equity Act definition of “Black” encompassing African, Coloured, and Indian candidates).

Rating and validation method

Each of the 200 interviews was evaluated by two of our HPCSA-registered consultants, meaning a total of 400 interviews were rated in the process. Prior to embarking on the study, raters went through a refresher course on behavioral observation and anchored evaluation, with a focus on minimizing bias and avoiding human error as far as possible.

Once our consultants had completed their interview ratings, we examined the extent to which the interview competency ratings assigned by our consultants aligned with those determined by the AI-scoring algorithms.

Results: AI and human rater convergence

Our results were strongly supportive of the convergent validity of AI-determined competency scores in the South African context. 

Specifically, the correlations between the human-assigned and AI-determined competency scores ranged from a low of .5 (on Dependability), to a high of just over .7 (on Drive for Results), with an average coefficient of .628. These relationships are on par with and, at times, even stronger than those observed in the international research (note that they also approach or exceed reliability levels of correlation).

The following graphs place the validity coefficients into context. They illustrate our consultants’ average competency ratings broken down by the AI-determined tier for the competencies showing the lowest and highest validity coefficients (i.e. dependability and drive for results, respectively).

And what they show is that, while our consultants remained ignorant of the tier assigned by the AI scoring algorithm, they nonetheless agreed with candidates’ placement in the bottom, middle and top tiers in their own evaluations of the interview answers.

These results were aligned with anecdotal evidence from our client’s recruiters and our own consultants. First, the client recruiters noted that, while they primarily focused on the top and middle tier candidates in their recruitment process, they periodically looked at bottom tier candidates for fear of potentially missing out on a good candidate. However, whenever they did so, they could clearly see the difference in quality between applicants in the different AI-assigned tiers.

Second, several of our consultants shared how, after completing their ratings for the purposes of the research, they checked on the AI-determined tiers and largely found alignment between their expectations and the AI results.

Final thoughts

In sum, the findings in this study provide strong support for the utilization of HireVue’s AI-enabled video assessments in the South African context.

It represents the first step in a series of planned studies forming part of our research program into the science and practice of AI-based video assessments in the South African context but initial results, as can be seen above, are very promising.

We look forward to conducting more research on this exciting technology in the coming months, and should you be interested in getting involved, please contact us at info@tts-talent.com