Medical AI Application Research

Clinical NLP for psychiatry records and medication perception

AI MED LLMs, clinical records, and public health text

Visual Guide

The AI/MED title image is intentionally simple because the work is research rather than a product UI. It summarizes the two research directions represented on this page: language models applied to psychiatric records, and AI analysis of public-health text about antidepressants. The visual is used as a cover so recruiters can quickly separate this research work from the software product projects.

Overview

I co-authored two SCI-level medical AI papers with Ajou Medical School's biomedical informatics lab. The work gave me a practical introduction to how AI methods are used in medicine: not as isolated models, but as tools that must be evaluated against real clinical language, public health behavior, and statistical evidence. Across the two studies, the NLP pipelines drew on both BERT- and GPT-based models.

The Papers

Open-Source LLMs In Psychiatry

JMIR, 2025

Comparative analysis of open-source LLM performance on non-English psychiatric records and English translations. PubMed

Antidepressant Perception

JMIR, 2025

AI-based analysis of how public attitudes toward antidepressants changed over a decade of online discussion. PubMed

My Role

Under the supervision of MD-PhD Kim Min Kyu, I supported statistical modeling and research implementation work. I learned how to treat model output as evidence that has to be tested, summarized, and interpreted carefully, especially when the data involves clinical records or health-related public discussion.

The experience helped me connect software engineering with medical research. I had to think about data quality, language translation, model comparison, statistical reporting, and how to make AI results understandable to researchers who care about clinical meaning rather than only benchmark scores.

Engineering: Speeding Up the OHDSI ETL

Beyond the papers, my most concrete contribution was on the data pipeline. The lab's ETL for transforming raw medical data into the OHDSI Common Data Model (OMOP) was written in R, and its slowest stage was vocabulary mapping: converting each source code into its OMOP concept_id. I translated the heavy R routines into Python and cut end-to-end processing time by roughly 23%, almost entirely by changing how that one stage iterated.

The original mapping was row-wise. In R, rowwise() / apply / a for loop expresses the logic literally — "for each row, look up this code and replace it" — calling the lookup once per row. On a table of tens of millions of rows, that pays the interpreter's per-call overhead tens of millions of times, and that tax (the function dispatch, not the lookup itself) is what dominates.

The fix was to express the same logic as one vectorized operation over the whole column. A vocabulary lookup is really a join between the data and the vocabulary table, so in Python I used a single pandas.merge (or a dict + .map() when the vocabulary fit in memory). That replaces repeated per-row searches — O(N×M) — with a one-pass hash join — O(N+M) — while moving the per-element loop down into pandas' compiled C/NumPy layer, where there is no per-row Python overhead. The speedup comes on two axes at once: a better algorithm and far less interpreter tax.

This stage dominated the runtime for two reasons. It runs across every row of the largest CDM tables (condition, drug, measurement, observation), and it was the hand-written part of the pipeline — exactly where row-wise idioms creep in — while extract and load were already delegated to an optimized database driver. Fixing the slowest idiom sitting on the highest-volume operation is what bent the whole pipeline's curve.

How I Learned

To build enough foundation for the work, I studied AI and machine learning through WikiDocs AI/ML material and Andrej Karpathy's Neural Networks: Zero to Hero. Those resources helped me move from using AI tools at a surface level to understanding the underlying ideas behind training, evaluation, backpropagation, language modeling, and why model behavior needs careful measurement.

What It Taught Me

Medical AI is not just about getting a model to run. It requires statistical caution, domain supervision, reproducible analysis, and respect for the clinical context around the data. That perspective now influences how I build AI tools: practical, measurable, and grounded in the real workflow they are meant to support.

(go back)