Large Language Models take on the Situational Judgment Test: Evaluating Dilemma-Based Scenarios
Thursday, September 19, 2024
4:30 PM – 5:30 PM CT
Location: New York/Illinois Central (Second Floor)
Abstract: As medical school admissions move toward a more holistic application process, some have integrated the situational judgment test (SJT), designed to evaluate the pre-professional competencies essential for the medical field. It focuses on presenting applicants with a series of real world ethical dilemmas, evaluating how candidates ascertain the effectiveness of each response to a given situation. In light of artificial intelligence (AI), our study investigates the performance of popular large language models (LLMs) in assessing standardized ethical scenarios in a professional setting. Our objective is to determine whether AI has the capacity to evaluate and mimic idealized human soft skills.
Using the 2021 Association of American Medical College SJT, we found that ChatGPT-4.0 outperformed ChatGPT-3.5 and Bard across 186 responses. This performance suggests that LLMs possess the ability to navigate certain ethical dilemmas in a way that is deemed correct by human evaluators. However, this raises critical questions on the effectiveness of these assessments in medical school admissions. If AI can approximate human responses to socioethical dilemmas, it challenges whether these tests are a robust measure of human qualities such as empathy, moral reasoning, and interpersonal skills. Conversely, our findings also open dialogue on the potential benefits of LLMs, including their use as an adjunctive apparatus to assess the values underpinning ethical decision-making. This study serves as the first of its kind in analyzing the quality of LLMs in real-world ethical dilemmas.
Learning Objectives:
After participating in this conference, attendees should be able to:
Describe the significance of a strong performance by artificial intelligence on situational judgment tests and its implications on tests that are typically reserved to measure human soft skills.
Understand the efficacy of situational judgment tests (SJTs) in accurately assessing the human soft skills of prospective students, especially in the light of emerging artificial intelligence models.
Explore whether artificial intelligence can contribute to the future of ethical decision making processes and under what circumstances might it play a significant role.
Angelo Cadiente – Hackensack Meridian School of Medicine; Lora Kasselman – Hackensack Meridian Health; Bryan Pilkington – Hackensack Meridian School of Medicine