Unlocking Deeper AI Diagnosis: MIMIC-SR-ICD11 Dataset & LL-Rank for Clinical Narratives
By Yuexin Wu, Shiqi Wang, Vasile Rus
Published on November 10, 2025| Vol. 1, Issue No. 1
Content Source
This is a curated briefing. The original article was published on cs.CL updates on arXiv.org.
Summary
The MIMIC-SR-ICD11 project introduces a novel, large English diagnostic dataset derived from Electronic Health Record (EHR) discharge notes, explicitly aligned with the World Health Organization's ICD-11 terminology. This dataset addresses the challenge of subtle but crucial clinical signals often lost in templated EHR documentation by leveraging self-reports. Alongside the dataset, the authors present LL-Rank, a likelihood-based re-ranking framework that significantly improves diagnostic accuracy across various model backbones. LL-Rank's primary innovation lies in its PMI-based scoring mechanism, which effectively isolates semantic compatibility from inherent label frequency bias, outperforming a strong generation-plus-mapping baseline (GenMap).
Why It Matters
This development marks a significant leap forward for AI in healthcare, particularly in the realm of diagnostic support. The creation of MIMIC-SR-ICD11 directly tackles a critical limitation of current clinical AI models: their reliance on structured, often attenuated, EHR data. By leveraging rich, narrative self-reports, the dataset provides AI systems with access to the nuanced, contextual information that human clinicians use for accurate diagnosis, potentially reducing diagnostic errors and improving patient outcomes. For AI professionals, this means new opportunities to build more robust and clinically relevant NLP models, moving beyond simple keyword matching to deeper semantic understanding of complex medical texts. The native alignment with ICD-11 is a crucial step towards global interoperability and standardization in medical coding, facilitating broader adoption and comparative research across different healthcare systems. Furthermore, the LL-Rank framework's innovative PMI-based scoring method offers a powerful technique to mitigate common biases in machine learning, specifically addressing the pervasive issue of label frequency bias. This focus on fairness and accuracy in critical diagnostic applications is paramount for building trust and ensuring ethical deployment of AI in medicine. The ability to extract subtle but consequential details from unstructured clinical narratives through such advanced techniques will empower healthcare providers with more precise diagnostic insights, ultimately enhancing the quality and efficiency of care.