AI-enabled assessment design and the IO Psychologist

As generative AI continues to redefine talent management, assessment systems, and organizational decision-making, the role of IO Psychology practitioners is shifting rapidly.

In a recent workshop led by Prof Richard Landers (John P. Campbell Distinguished Professor of IO Psychology; University of Minnesota) and hosted by TTS as part of our Client Conference, the practical realities of AI-enabled assessment design were put under the microscope.

Key focus areas included: understanding the mechanics of generative systems, examining their fragilities, experimenting with design logic, and ultimately recognizing the expanding role IO Psychologists will play as co-creators of this emerging technology.

This article provides a detailed summary of the workshop, highlighting key learnings and implications for talent decision-making.

AI basics: Prediction, not cognition

The workshop began with a grounding principle that shaped every subsequent insight: Generative AI is always prediction, nothing more, nothing less.

This distinction is not semantic but foundational. Understanding generative AI as a predictive system, and not as a reasoning or understanding system, enables practitioners to approach it with clarity of purpose. Generative models do not “think” but rather generate the most statistically probable continuation of text given a particular input.

In organizational contexts, this means that:

AI is not independently goal-directed
AI does not understand appropriateness
AI cannot ensure validity unless designed for it
AI’s effectiveness depends on providing user-generated scaffolding, constraints, and context

This understanding allows IO practitioners to move from abstract fascination with AI’s capabilities to a more rigorous focus on design, system behavior, and controlled deployment.

What actually makes AI systems work?

Most talent professionals have engaged with tools like ChatGPT in some form. Few, however, have examined how these systems behave when integrated into talent applications such as employee assessments, coaching platforms, or HR productivity tools.

Prof Landers offered a structured walkthrough of these underlying mechanisms:

Language models generate outputs purely through probabilistic patterning.
Behavioral consistency emerges not from the model itself, but from the systems built around it.
Guardrails, system prompts, policy layers, and data retrieval pipelines shape interactions far more than end-user prompts.

Participants were introduced to Retrieval-Augmented Generation (RAG): a method increasingly central to enterprise-level AI application.

RAG allows AI systems to:

Draw on verified, curated, organizationally specific documents
Maintain contextual accuracy
Reduce hallucinations
Deliver repeatable, policy-aligned responses
Embed organizational knowledge without exposing proprietary data to external models

Real-world examples discussed in the workshop included:

a TTS-created tool that converts webpages into a podcast generator
an AI system creating structured mind maps and flashcards
an AI coaching agent designed to redirect off-topic queries

The practical implication of RAG is that highly impactful and useful systems can already be built to aid in talent processes. The power of this approach lies not in the model itself but in how the model is structured, constrained, and contextualised.

Why behavioural science matters

Despite innovations like RAG, it is important to fully understand what happens when AI technologies are employed without the necessary guardrails in place.

As an example, the apple pie test demonstrated how quickly a model can abandon context. When a generative AI coaching agent without guardrails in place is asked, “What is the recipe for apple pie?” it will tend to provide one, regardless of being situated in a leadership development environment.

This kind of unexpected and unwanted behaviour illustrates two critical insights:

AI does not understand context but follows patterns.
Design flaws manifest not in catastrophic errors but in subtle breaks that undermine professional integrity.

This inherent fragility of AI interactions requires a design approach deeply rooted in behavioural science.

In implementing or designing AI solutions, talent practitioners should consider questions like:

How might end-users intentionally or unintentionally push system boundaries?
What tone conveys authority without rigidity?
Which conversational paths must be prevented to preserve fairness?
What forms of confusion are most likely, and how should the system respond?
Where should flexibility be permitted, and where must firmness be preserved?

Building AI talent applications is as much behavioural science as it is engineering.

Generative AI design requires empathy, foresight, and a deep understanding of human behaviour, skills that IO Psychologists already possess.

Applying design thinking to AI Assessments

As an illustrative example, Prof Landers showed how using design thinking steps (i.e. empathizing, defining, ideating, prototyping, and testing) can be used to create a conversational simulation, designed to emulate a financial sales scenario with an AI-driven customer.

In the workshop, initial attempts generated by the model from a simple prompt were predictably generic. Problems with the application included:

Implausible customer questions
Drifting context
Flattened tone
Poor behavioral realism

But this failure was instructive. Generative models do not create assessment-quality content without deliberate design and iterative oversight.

Participants then began refining the simulation through structured adjustments, such as:

Tightening system prompts
Adding scenario constraints
Clarifying candidate roles
Providing contextual anchor points
Redefining expected behaviours

Each iteration resolved some of the initial problems while revealing new, emergent challenges. This trail-and-error interchange is a natural aspect of AI development workflows.

However, by the eighth iteration, the simulation was no longer generic. It was coherent, realistic, and aligned with assessment objectives.

The workshop experience revealed an important difference between how AI applications like the conversational model and more traditional technologies (e.g. classic capability assessments) tend to be designed.

Traditional assessment development often follows a waterfall model. First define everything (i.e. constructs of interest, items, conceptual frameworks, etc), then build. Generative AI renders this approach obsolete. The workshop demonstrated that:

Requirements evolve through interaction
Blind spots only emerge during testing
Refinement is continuous, not sequential
Psychological validity must be integrated iteratively

This is why the design thinking loop of rapid, structured iteration is becoming the dominant paradigm for AI-enabled assessment development.

IO Psychologists as central to AI-enabled assessment development

One of the workshop’s strongest themes was the indispensable need for IO psychology expertise in shaping AI-enabled tools. Throughout the exercises, it became clear that:

Simple prompts can unintentionally introduce bias
User expectations influence perceived fairness
Clarity, tone, and structure shape behaviour and engagement
Guardrails affect the psychometric properties of interactions
Simulated scenarios require domain validity, not mere realism
Iterative refinement benefits from applied validation logic

Engineering teams alone cannot anticipate the nuances of workplace behaviour, assessment validity, or fairness criteria. Conversely, IO Psychologists cannot design robust AI systems without understanding technical constraints. The future of AI-enabled assessment design lies at the intersection of these two disciplines.

Final thoughts

The workshop highlighted that IO Psychologists are no longer just users of AI-powered tools but also co-creators of the frameworks that govern them. Consequently, we must design guardrails and behavioural logic and not merely evaluate AI outputs.

This hints at an important potential role for IO Psychologists as leaders of multidisciplinary teams that include engineers, product developers, and data scientists. In a broader sense, the principle of human leadership in assessment development was also underscored.

Uniquely, human experts can bring sophisticated ethical judgement, contextual insights and fairness-oriented reasoning to the design process. By applying these human-centric factors to the iterative development discipline required in AI-enabled assessments, robust solutions can be produced.

In this way, generative AI is not shaping the assessment field on its own. IO Psychologists are shaping it through the systems they design around it.

The next generation of assessment tools will not emerge solely from engineering innovation, nor solely from psychological science. They will be the end-product of collaboration between these disciplines, guided by practitioners who understand human behaviour, organizational realities, and the principles of responsible innovation.

If you would like to know more about how AI-enabled assessments might benefit your organization, connect with us at info@tts-talent.com.