Abstract

Chain-of-Thought (CoT) prompting has demonstrated strong effectiveness in improving the reasoning capabilities of Large Language Models (LLMs). However, existing CoT optimization approaches still lack systematic mechanisms for evaluating and refining prompts. To address this gap, we propose Adversarial Chain-of-Thought (adv-CoT), a framework that introduces adversarial learning into prompt optimization. Adv-CoT iteratively refines an initial prompt through generator–discriminator interactions and integrates both feedback and verification mechanisms. This process enables more targeted and interpretable improvements to CoT instructions and demonstrations. We evaluate adv-CoT on twelve datasets across commonsense, factual, symbolic, and arithmetic reasoning. Across 12 reasoning datasets, adv-CoT yields an average improvement of 4.44% on GPT-3.5-turbo and 1.08% on GPT-4o-mini, with both gains being statistically significant (paired t-test, p < 0.05). The experimental results show that the framework yields consistent but task-dependent gains, particularly on numerical and factual reasoning tasks, and maintains competitive performance on symbolic and commonsense benchmarks. Paired significance tests further indicate that improvements are statistically reliable on high-capacity proprietary models, while results on smaller open-source models exhibit greater variance. Although these findings demonstrate the promise of adversarial refinement for CoT prompting, the conclusions remain preliminary. The effectiveness of adv-CoT depends on the base model’s reasoning capability, and the current evaluation is limited to four major categories of reasoning tasks. We will release the full implementation and prompts to support further investigation into broader applications and more generalizable prompt optimization strategies.

Affiliated Institutions

Related Publications

Publication Info

Year
2025
Type
article
Volume
16
Issue
12
Pages
1092-1092
Citations
0
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

0
OpenAlex

Cite This

Guang Yang, Xiantao Cai, Shaohe Wang et al. (2025). Chain-of-Thought Prompt Optimization via Adversarial Learning. Information , 16 (12) , 1092-1092. https://doi.org/10.3390/info16121092

Identifiers

DOI
10.3390/info16121092