
<(From Left) M.S candidate Soyoung Choi, Ph.D candidate Seong-Hyeon Hwang, Professor Steven Euijong Whang>
Just as human eyes tend to focus on pictures before reading accompanying text, multimodal artificial intelligence (AI)—which processes multiple types of sensory data at once—also tends to depend more heavily on certain types of data. KAIST researchers have now developed a new multimodal AI training technology that enables models to recognize both text and images evenly, enabling far more accurate predictions.
KAIST (President Kwang Hyung Lee) announced on the 14th that a research team led by Professor Steven Euijong Whang from the School of Electrical Engineering has developed a novel data augmentation method that enables multimodal AI systems—those that must process multiple data types simultaneously—to make balanced use of all input data.
Multimodal AI combines various forms of information, such as text and video, to make judgments. However, AI models often show a tendency to rely excessively on one particular type of data, resulting in degraded prediction performance.
To solve this problem, the research team deliberately trained AI models using mismatched or incongruent data pairs. By doing so, the model learned to rely on all modalities—text, images, and even audio—in a balanced way, regardless of context.
The team further improved performance stability by incorporating a training strategy that compensates for low-quality data while emphasizing more challenging examples. The method is not tied to any specific model architecture and can be easily applied to various data types, making it highly scalable and practical.

<Model Prediction Changes with a Data-Centric Multimodal AI Training Framework>

Professor Steven Euijong Whang explained, “Improving AI performance is not just about changing model architectures or algorithms—it’s much more important how we design and use the data for training.” He continued, “This research demonstrates that designing and refining the data itself can be an effective approach to help multimodal AI utilize information more evenly, without becoming biased toward a specific modality such as images or text.”
The study was co-led by doctoral student Seong-Hyeon Hwang and master’s student Soyoung Choi, with Professor Steven Euijong Whang serving as the corresponding author. The results will be presented at NeurIPS 2025 (Conference on Neural Information Processing Systems), the world’s premier conference in the field of AI, which will be held this December in San Diego, USA, and Mexico City, Mexico.
※ Paper title: “MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning,” Original paper: https://arxiv.org/pdf/2509.25831
The research was supported by the Institute for Information & Communications Technology Planning & Evaluation (IITP) under the projects “Robust, Fair, and Scalable Data-Centric Continual Learning” (RS-2022-II220157) and “AI Technology for Non-Invasive Near-Infrared-Based Diagnosis and Treatment of Brain Disorders” (RS-2024-00444862).
The Graduate School of Global Digital Innovation (GDI) of KAIST will host the "AI⁺ Global Prosperity Forum 2026" on June 24 at the Chung Kunmo Conference Hall (5F), KAIST Academic Cultural Complex (E9). KAIST Graduate School of Global Digital Innovation (GDI) is carrying out the "ICT Global Specialized Convergence Talent Cultivation Program" supported by the Ministry of Science and ICT and the Institute of Information & Communications Technology Planning & Evaluation (IITP). Since t
2026-06-11< (From left) Professor Chang D. Yoo, Tung M. Luu (PhD candidate, first author) at the back center, and Hwanhee Kim (M.S candidate, second author) at the front right > “Robots that make judgments like humans are coming faster than we think.” A core technology that will accelerate the era where robots understand human intentions and choose the correct actions on their own has been developed in South Korea. KAIST researchers solved a key challenge in the commercialization o
2026-06-10<Human Behavior and Mental Health Symposium Poster> KAIST announced the official launch of the KAIST Mind Care & Growth Center (KMCG), a new integrated platform that strengthens mental health support for students and faculty while advancing digital mental health research. To mark the occasion, KAIST hosted an international symposium titled "Human Behavior and Mental Health" on June 10, 2026, at the Cho Su-mi Hall in the Chang Young Shin Student Activity Center on its main Daejeon ca
2026-06-10<(From Left) Hyun-Bin Oh, Takida Yuhta, Uesaka Toshimitsu, Tae-Hyun Oh, Mitsufuji Yuki> When people watch a scene in the film Jurassic Park where a giant dinosaur walks toward them, they naturally imagine a heavy, rumbling sound, as if the ground were shaking. This is because humans predict sound by considering not only the shape of an object, but also physical properties such as its size, weight, and speed of movement. However, existing video-to-audio generation AI mainly generates sou
2026-05-27KAIST announced on May 22nd that the entire faculty of the Graduate School of AI welcomes South Korea's hosting of the 'Global AI Hub.' The faculty determined that hosting this will serve as a crucial momentum builder for South Korea to earnestly contribute to international cooperation and the responsible use of technology in the artificial intelligence (AI) era. In a joint statement, the faculty of the KAIST Graduate School of AI expressed, "Hosting the Global AI Hub goes beyond simply attr
2026-05-22