강화 학습 3

[논문 리뷰] MMSearch-R1: Incentivizing LMMs to Search

https://arxiv.org/abs/2506.20670 MMSearch-R1: Incentivizing LMMs to SearchRobust deployment of large multimodal models (LMMs) in real-world scenarios requires access to external knowledge sources, given the complexity and dynamic nature of real-world information. Existing approaches such as retrieval-augmented generation (RAG) aarxiv.orgAbstract현실 세계 시나리오에서 대형 멀티모달 모델(LMMs)의 안정적인 배포를 위해, 현실 세계 정..

논문 2025.06.30

[논문 리뷰] Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

https://arxiv.org/abs/2505.10425 Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMsLarge language models (LLMs) excel at complex tasks thanks to advances in reasoning abilities. However, existing methods overlook the trade-off between reasoning effectiveness and computational efficiency, often encouraging unnecessarily long reasoning chaarxiv.orgLLM의 최근 연구들은 추론 시 더 많은 토큰..

논문 2025.06.02

[논문 리뷰] DAPO: An Open-Source LLM Reinforcement Learning System at Scale

https://arxiv.org/abs/2503.14476 DAPO: An Open-Source LLM Reinforcement Learning System at ScaleInference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blogarxiv.orgIntroductionTest-time scaling은 더 긴 Cha..

논문 2025.04.28