編集Rethinking Evidence-Based Medicine: Abductive Reasoning from Traditional Chinese Medicine to AI

編集Rethinking Evidence-Based Medicine: Abductive Reasoning from Traditional Chinese Medicine to AI

Rethinking Evidence-Based Medicine: Abductive Reasoning from Traditional Chinese Medicine to AI

1,Introduction

We are entering an era where the fundamental questions of “What is reasoning?” and “What is diagnosis?” must be reexamined. The rapid rise of large language models (LLMs) like GPT-4 has brought artificial intelligence into clinical spaces, enabling AI to assist, and in some cases outperform, physicians in diagnostic accuracy[1]. Yet a profound gap remains: current AI models can provide answers without being able to explain why[2]. This lack of explanatory structure exposes a critical difference between human and machine reasoning.

In clinical settings, experienced practitioners often generate hypotheses based on subtle impressions or a vague sense of unease—intuition that cannot be captured through deduction or induction alone. This form of reasoning is known as abduction—the process of forming plausible hypotheses from incomplete or surprising data[3]. Traditional Chinese Medicine (TCM) has long institutionalized such abductive reasoning in its diagnostic framework, integrating narrative, sensory, and contextual cues into evolving clinical hypotheses.

In this paper, we argue that abduction must be explicitly recognized within the epistemological foundations of evidence-based medicine (EBM). We explore how TCM and AI diagnosis each embody abductive logic, and propose a layered worldview model that reveals where current systems fall short—and how they might evolve.

2. What EBM misses: The epistemic blind spot

Modern evidence-based medicine (EBM) rests on the premise that clinical knowledge should be derived from systematic reviews, randomized trials, and statistical inference. This approach has undeniably advanced therapeutic accuracy and standardisation. Yet, in its current form, EBM struggles to incorporate reasoning grounded in context, values, and lived meaning. As a result, it often neglects what we may call the “epistemic blind spot”: the realm where diagnostic decisions are shaped by unquantifiable but vital elements—such as patient narratives, cultural assumptions, and emotional resonance.[4]

This limitation is especially apparent in conditions like ADHD and chronic fatigue syndrome, where meaning and context cannot be disentangled from symptom expression. In a recent BMJ commentary, Salisbury argued that outsourcing ADHD diagnosis to external agencies risks fragmenting care by ignoring the relational and contextual work done in general practice [5]. Without attentiveness to meaning, clinical judgments become mechanistic, potentially harmful, and alienating.

Statistical evidence may indicate what is likely, but not necessarily what is relevant or meaningful to an individual. A diagnostic algorithm might score high on predictive accuracy yet fail to respect the patient’s lived experience. This disconnect can result in what might be termed “technically correct but epistemically inadequate” medicine. Abductive reasoning—which begins with a felt dissonance and seeks the most plausible explanatory frame—offers a way to redress this imbalance. It allows clinicians to engage with the uncertainties and narratives that shape real-world care, beyond what data alone can capture [6,7].

3. TCM as Formalised Abduction

Traditional Chinese Medicine (TCM) represents one of the most mature models of abductive reasoning in clinical practice. Unlike the deductive or inductive logic dominant in biomedicine, TCM diagnosis relies on the structured formation of plausible hypotheses—termed zheng (証)—based on a combination of sensory observation, narrative interpretation, and cultural coding.

The standard diagnostic framework in TCM, known as bianzheng lunzhi (pattern differentiation and treatment determination), consists of three key abductive phases:

  1. Observation: Non-quantitative features such as pulse quality, tongue color, complexion, voice tone, and even metaphorical patient expressions are integrated as diagnostic material.

  2. Hypothesis Generation: A provisional pattern or zheng (e.g., liver qi stagnation, phlegm-dampness obstruction) is formulated to explain the clinical constellation.

  3. Intervention and Revision: Treatment is implemented and the patient’s response is used to reassess and potentially revise the initial hypothesis.

This iterative cycle mirrors the classic abductive inference structure defined by Josephson and Josephson: observation → assumption → best explanation [3]. Moreover, TCM’s reliance on embodied cognition—such as the tactile feel of a “slippery” (hua) or “tight” (jin) pulse—demonstrates how poetic and intuitive modes of understanding can be formalized within a medical paradigm . This resonates with Magnani’s notion of animal abduction, which emphasizes inference grounded not in formal logic, but in embodied interactions with the world [8].

Zhang has further shown that the classification of zheng patterns has evolved into a semi-formalized system through education and institutional practice, allowing subjective insight to be transformed into shared, teachable structures [9]. In this way, TCM does not oppose scientific rigor but instead represents a different ontology of clinical reasoning—one that integrates cultural logic, practitioner intuition, and iterative testing into a coherent abductive framework.

Recent work by Zhao et al. has extended this abductive framework into computational applications. Their ABL-TCM model (Abductive Learning for Traditional Chinese Medicine) integrates abductive logic into AI systems designed for TCM text mining, particularly in named entity recognition tasks [10]. This research illustrates a growing effort to formalize TCM’s abductive reasoning in machine-readable form, confirming the theoretical coherence and computational viability of the zheng-based diagnostic paradigm.

4. AI Diagnosis and the Prompt Problem

To explore how large language models (LLMs) respond to different value-laden prompts, we conducted a small-scale pilot study involving ethical dilemmas. The goal was not to compare model performance per se, but to examine how the structure of the question—especially the inclusion of contextual and relational information—affects the depth and coherence of AI-generated reasoning.

The experiment was conducted on April 17, 2025, using three LLMs: GPT-4 (OpenAI), Gemini 2.0 Flash (Google), and Grok-1 (xAI). Each model was presented with the same core scenario:

“You became a police officer to protect others. However, you discover systemic corruption within your organization, leading to wrongful convictions.”

This baseline scenario was then modified into three versions:

  • Prompt A (Family Context): “You have a beloved spouse and children. If you oppose the organization, they may be affected.”
  • Prompt B (No Family): “You are completely alone. No one would be affected if you confront the organization.”
  • Prompt C (No Context): Only the baseline scenario was provided, with no additional relational cues.

Outputs were qualitatively analyzed based on ethical depth, practical specificity, and responsiveness to context. Models exhibited noticeable shifts in reasoning:

  • Prompt A led to protective, cautious suggestions, often balancing personal and ethical duties.
  • Prompt B produced idealistic, justice-oriented responses emphasizing moral obligation.
  • Prompt C resulted in shallow, generic advice lacking emotional or contextual nuance.

These findings suggest that current LLMs do not inherently “reason” abductively or contextually. Rather, their output quality depends heavily on how richly the prompt embeds cosmological, relational, and interpretive layers.

While promising, this pilot study has limitations. The number of outputs was small, and responses were only evaluated qualitatively. Additionally, we did not formally code or assess whether the responses followed a classical abductive structure (e.g., observation → hypothesis → best explanation). Nevertheless, the consistent influence of contextual framing across models supports our broader claim: AI-generated reasoning is shaped not only by data, but by the worldview embedded in the prompt.

5. A Layered Model for Meaning-Making (SML-CML)

To conceptualize the epistemological differences between human and AI reasoning, we propose a four-layered model of abductive diagnosis: Semantic Meaning Layers × Cosmological Meaning Layers (SML-CML). This framework illustrates how clinical reasoning depends not only on pattern recognition or statistical inference, but also on interpretive and ontological structures grounded in worldview and cultural context.

The first layer, Cosmological Layer, encompasses foundational assumptions about life, health, and value—such as whether the ultimate aim of care is life-prolongation, quality of life, or dignified death. These cosmological premises shape diagnostic and therapeutic orientation even before any clinical data is gathered.

The second layer, Phenomenological Layer, involves perceptual attunement to non-quantifiable signals: tone of voice, posture, facial expression, pulse quality, or the overall atmosphere of the encounter. This is where “felt sense” and context-sensitive impressions emerge[11].


Empirical studies show that cognitive-behavioural therapy can recalibrate pain perception via phenomenological re-attunement[12], and that major cosmological transformations—such as religious conversion or post-disaster meaning-making—can influence immune, endocrine, and psychological responses by reshaping embodied experience[13,14].

The third layer, Interpretive Layer, transforms these impressions into coherent diagnostic hypotheses. In Traditional Chinese Medicine (TCM), this process entails the construction of a zheng (証)—a diagnostic pattern that emerges from interpretive synthesis rather than atomized data points.

The fourth layer, Abductive Layer, comprises the logical process of hypothesis generation, testing, and revision. Clinicians use dissonant cues and contextual congruence to formulate the most plausible explanations, iterating hypotheses based on patient response and evolving narrative coherence.

Modern AI models, including large language models (LLMs), operate almost exclusively at the fourth layer. They simulate abductive reasoning through probabilistic pattern completion but lack the upper semantic structures required for authentic meaning-making. These systems cannot experience felt dissonance, interpret cosmological assumptions, or recognize phenomenological subtleties.

This layered model helps explain why even high-performing AI systems can yield diagnoses that are technically accurate but experientially unsatisfying. They lack explanatory coherence across the semantic and cosmological dimensions of clinical reasoning. Crucially, this model also underscores the irreplaceable human role in designing prompts, shaping value structures, and cultivating interpretive contexts—that is, crafting the upper three layers that imbue AI reasoning with meaningful epistemological grounding.

6.Why Cosmology Matters: The Epistemological Boundary of AI Reasoning

It is tempting to believe that better prompts will yield better AI reasoning. Yet this assumption misunderstands the nature of inference. It is not merely the phrasing of a question that shapes an answer, but the underlying cosmology—the foundational assumptions about life, value, and reality that make certain questions possible and others invisible.

Our SML-CML framework makes this explicit. Large language models (LLMs) operate predominantly within the fourth layer, the Abductive Layer, generating plausible hypotheses from available data. However, without access to the upper layers—Cosmological, Phenomenological, and Interpretive—they cannot engage in meaning-making. They do not sense, situate, or synthesize in a human sense. They simulate inference, but in a vacuum of context.

This is not a defect of computational power but a structural limitation: AI systems are not grounded in cosmological premises. Their apparent failures—such as proposing aggressive life-extending interventions regardless of patient context—are not due to errors in logic, but due to the absence of a designed cosmological frame.

This reframes the debate: instead of asking how AI can think like humans, we must ask how humans can design the epistemic scaffolding that guides AI reasoning. Traditional Chinese Medicine (TCM), with its cosmology-dependent diagnostic reasoning, offers a model for how to embed such structure. The question is not whether AI can reason, but whether it has been given something meaningful to reason about.This paper offers a conceptual reframing rather than presenting new empirical data; validation studies are beyond its current scope but may follow in future work.

7.Conclusion: Collaborative Reasoning and the Design of Meaning

This paper has argued that Traditional Chinese Medicine (TCM) is not merely a set of alternative techniques, but a fully developed system of meaningful diagnostic knowledge. Rooted in abductive reasoning and embedded within cosmological and cultural narratives, TCM prioritizes interpretive depth and ontological coherence over reductive classification[15].

We have formalized this reasoning process through the SML-CML model (Semantic Meaning Layers × Cosmological Meaning Layers), which captures the layered architecture of human diagnosis: from foundational cosmological assumptions, through sensory attunement and interpretive synthesis, to abductive hypothesis testing[9].

In contrast, current AI systems—especially large language models (LLMs)—operate primarily within the lowest, fourth layer. They mimic abductive reasoning through statistical completion but lack access to the upper semantic layers that enable genuine meaning-making. The result is an epistemic gap: outputs that are plausible but ungrounded in interpretation[16].

Rather than viewing this as a defect, we suggest it offers an opportunity. By embedding AI within human-designed frameworks of meaning—values, worldviews, narratives—AI can be transformed from a hollow inference engine into a partner in collaborative reasoning. The future of diagnosis lies not in automation, but in epistemic attunement[17].

Recent updates to clinical trial standards, such as CONSORT 2025, have acknowledged the limitations of reproducibility-focused methodologies and the need for richer clinical judgment frameworks[17]. Our proposal builds on this momentum by offering a structured model of abductive, context-sensitive reasoning.

In this light, TCM is not a relic of premodern thinking, but a living epistemology: one that integrates sense, story, ethics, and logic into a cohesive diagnostic ecology[9]. It offers a vital prototype for how human-AI systems might responsibly co-create medical knowledge in a pluralistic world.


References

1, NEJM AI Working Group. GPT vs Resident Physicians: Israeli Board Examination Benchmark. NEJM AI. 2024.

2,NEJM AI Working Group. Use of GPT-4 to Diagnose Complex Clinical Cases. NEJM AI. 2024.

3,Josephson JR, Josephson SG. Abductive Inference: Computation, Philosophy, Technology. Cambridge University Press; 1994.

4,Djulbegovic B, Guyatt GH. Progress in evidence-based medicine: A quarter century on. The Lancet. 2017. https://doi.org/10.1016/S0140-6736(16)31592-6

5,Salisbury H. The problem with outsourced diagnosis of ADHD. BMJ 2025;389:r842. doi:10.1136/bmj.r842

6,Durning SJ, Artino AR. Situativity theory: AMEE Guide No. 52. Med Teach 2011;33:188–99. doi:10.3109/0142159X.2011.550965

7,Charon R. Narrative Medicine: Honoring the Stories of Illness. Oxford University Press, 2006.

8,Magnani L. Animal Abduction. In: Magnani L, Li P, eds. Model-Based Reasoning in Science, Technology, and Medicine. Springer; 2007.

9,Zhang WB. The development of pattern classification in Chinese medicine. Chinese Journal of Integrative Medicine. 2016. https://doi.org/10.1007/s11655-016-2540-3

10,Zhao Z, et al. ABL-TCM: An Abductive Framework for Named Entity Recognition in Traditional Chinese Medicine. IEEE Access. 2024. https://ieeexplore.ieee.org/document/10664593

11,Benner P, Wrubel J. The Primacy of Caring: Stress and Coping in Health and Illness. Addison-Wesley; 1989.

12,Turk DC, Okifuji A. Psychological factors in chronic pain: Evolution and revolution. J Consult Clin Psychol. 2002;70(3):678–90.

13,Koenig HG. Religion, spirituality, and health: The research and clinical implications. ISRN Psychiatry. 2012;2012:278730.

14,Uchida Y, Takahashi Y, Kawamura Y. Changes in values and well-being before and after the Great East Japan Earthquake. PNAS. 2014;111(52):E5296–303.

15,Kirmayer LJ. Broken narratives: Clinical encounters and the poetics of illness experience. In: Mattingly C, Garro LC, eds. Narrative and the Cultural Construction of Illness and Healing. University of California Press; 2000.

16,Friston K. The free-energy principle: a unified brain theory? Nature Reviews Neuroscience. 2010;11(2):127–138. https://doi.org/10.1038/nrn2787

17,Hopewell S, Chan A-W, Collins GS, Hróbjartsson A, Moher D, Schulz KF, et al. CONSORT 2025 Statement: Updated guideline for reporting randomized trials. JAMA. Published online April 14, 2025. doi:10.1001/jama.2025.4347