LLM 3

[논문 리뷰] RAG-DDR: OPTIMIZING RETRIEVAL-AUGMENTEDGENERATION USING DIFFERENTIABLE DATA REWARDS

https://openreview.net/forum?id=uaYKxmn3Ab RAG-DDR: Optimizing Retrieval-Augmented Generation Using...Retrieval-Augmented Generation (RAG) has proven its effectiveness in mitigating hallucinations in Large Language Models (LLMs) by retrieving knowledge from external resources. To adapt LLMs for RAG...openreview.netGithub: https://github.com/OpenMatch/RAG-DDR개요LLM의 hallucination 문제를 완화하기 위해 RAG(R..

논문 2025.07.14

[논문 리뷰] Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

https://arxiv.org/abs/2505.10425 Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMsLarge language models (LLMs) excel at complex tasks thanks to advances in reasoning abilities. However, existing methods overlook the trade-off between reasoning effectiveness and computational efficiency, often encouraging unnecessarily long reasoning chaarxiv.orgLLM의 최근 연구들은 추론 시 더 많은 토큰..

논문 2025.06.02

[논문 리뷰] DAPO: An Open-Source LLM Reinforcement Learning System at Scale

https://arxiv.org/abs/2503.14476 DAPO: An Open-Source LLM Reinforcement Learning System at ScaleInference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blogarxiv.orgIntroductionTest-time scaling은 더 긴 Cha..

논문 2025.04.28