A new study on meta reinforcement learning algorithms helps us understand how the human brain learns to adapt to complexity and uncertainty when learning and making decisions. A research team, led by Professor Sang Wan Lee at KAIST jointly with John O’Doherty at Caltech, succeeded in discovering both a computational and neural mechanism for human meta reinforcement learning, opening up the possibility of porting key elements of human intelligence into artificial intelligence algorithms. This study provides a glimpse into how it might ultimately use computational models to reverse engineer human reinforcement learning.
This work was published on Dec 16, 2019 in the journal Nature Communications. The title of the paper is “Task complexity interacts with state-space uncertainty in the arbitration between model-based and model-free learning.”
Human reinforcement learning is an inherently complex and dynamic process, involving goal setting, strategy choice, action selection, strategy modification, cognitive resource allocation etc. This a very challenging problem for humans to solve owing to the rapidly changing and multifaced environment in which humans have to operate. To make matters worse, humans often need to often rapidly make important decisions even before getting the opportunity to collect a lot of information, unlike the case when using deep learning methods to model learning and decision-making in artificial intelligence applications.
In order to solve this problem, the research team used a technique called 'reinforcement learning theory-based experiment design' to optimize the three variables of the two-stage Markov decision task - goal, task complexity, and task uncertainty. This experimental design technique allowed the team not only to control confounding factors, but also to create a situation similar to that which occurs in actual human problem solving.
Secondly, the team used a technique called ‘model-based neuroimaging analysis.’ Based on the acquired behavior and fMRI data, more than 100 different types of meta reinforcement learning algorithms were pitted against each other to find a computational model that can explain both behavioral and neural data. Thirdly, for the sake of a more rigorous verification, the team applied an analytical method called ‘parameter recovery analysis,’ which involves high-precision behavioral profiling of both human subjects and computational models.
In this way, the team was able to accurately identify a computational model of meta reinforcement learning, ensuring not only that the model’s apparent behavior is similar to that of humans, but also that the model solves the problem in the same way as humans do.
The team found that people tended to increase planning-based reinforcement learning (called model-based control), in response to increasing task complexity. However, they resorted to a simpler, more resource efficient strategy called model-free control, when both uncertainty and task complexity were high. This suggests that both the task uncertainty and the task complexity interact during the meta control of reinforcement learning. Computational fMRI analyses revealed that task complexity interacts with neural representations of the reliability of the learning strategies in the inferior prefrontal cortex.
These findings significantly advance understanding of the nature of the computations being implemented in the inferior prefrontal cortex during meta reinforcement learning as well as providing insight into the more general question of how the brain resolves uncertainty and complexity in a dynamically changing environment. Identifying the key computational variables that drive prefrontal meta reinforcement learning, can also inform understanding of how this process might be vulnerable to break down in certain psychiatric disorders such as depression and OCD. Furthermore, gaining a computational understanding of how this process can sometimes lead to increased model-free control, can provide insights into how under some situations task performance might break down under conditions of high cognitive load.
Professor Lee said, “This study will be of enormous interest to researchers in both the artificial intelligence and human/computer interaction fields since this holds significant potential for applying core insights gleaned into how human intelligence works with AI algorithms.”
This work was funded by the National Institute on Drug Abuse, the National Research Foundation of Korea, the Ministry of Science and ICT, Samsung Research Funding Center of Samsung Electronics.
< external_image >
Figure 1 (modified from the figures of the original paper doi:10.1038/s41467-019-13632-1). Computations implemented in the inferior prefrontal cortex during meta reinforcement learning. (A) Computational model of human prefrontal meta reinforcement learning (left) and the brain areas whose neural activity patterns are explained by the latent variables of the model. (B) Examples of behavioral profiles. Shown on the left is choice bias for different goal types and on the right is choice optimality for task complexity and uncertainty. (C) Parameter recoverability analysis. Compared are the effect of task uncertainty (left) and task complexity (right) on choice optimality.
<ID-style photograph against a laboratory background featuring an OLED contact lens sample (center), flanked by the principal authors (left: Professor Seunghyup Yoo ; right: Dr. Jee Hoon Sim). Above them (from top to bottom) are: Professor Se Joon Woo, Professor Sei Kwang Hahn, Dr. Su-Bon Kim, and Dr. Hyeonwook Chae> Electroretinography (ERG) is an ophthalmic diagnostic method used to determine whether the retina is functioning normally. It is widely employed for diagnosing hereditary
2025-08-12< (From left) Ph.D candidate Wonho Zhung, Ph.D cadidate Joongwon Lee , Prof. Woo Young Kim , Ph.D candidate Jisu Seo > Traditional drug development methods involve identifying a target protin (e.g., a cancer cell receptor) that causes disease, and then searching through countless molecular candidates (potential drugs) that could bind to that protein and block its function. This process is costly, time-consuming, and has a low success rate. KAIST researchers have developed an AI model th
2025-08-12<Photo1. Group photo at the end of the program> KAIST (President Kwang Hyung Lee) announced on the 11thof August that it successfully hosted the 'APEC Youth STEM Conference KAIST Academic Program,' a global science exchange program for 28 youth researchers from 10 countries and over 30 experts who participated in the '2025 APEC Youth STEM* Collaborative Research and Competition.' The event was held at the main campus in Daejeon on Saturday, August 9. STEM (Science, Technology, Eng
2025-08-11<Photo1. Group Photo of Team Atlanta> Team Atlanta, led by Professor Insu Yun of the Department of Electrical and Electronic Engineering at KAIST and Tae-soo Kim, an executive from Samsung Research, along with researchers from POSTECH and Georgia Tech, won the final championship at the AI Cyber Challenge (AIxCC) hosted by the Defense Advanced Research Projects Agency (DARPA). The final was held at the world's largest hacking conference, DEF CON 33, in Las Vegas on August 8 (local time)
2025-08-10<(From Left) Ph.D candidate Jeongseok Oh from KAIST, Dr. Seungwoo Yoon from KAIST, Prof.Joon-Ho Wang from Samsung Medical Center, Prof.Seungbum Koo from KAIST> Professor Seungbum Koo’s research team received the Clinical Biomechanics Award at the 30th International Society of Biomechanics (ISB) Conference, held in July 2025 in Stockholm, Sweden. The Plenary Lecture was delivered by first author and Ph.D. candidate Jeongseok Oh. This research was conducted in collaboration with P
2025-08-10