'Terminal-bench' 태그의 글 목록

[논문 리뷰] TERMINAL-BENCH:BENCHMARKING AGENTS ON HARD, REALISTICTASKS IN COMMAND LINE INTERFACES

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line InterfacesAI agents may soon become capable of autonomously completing valuable, long-horizon tasks in diverse domains. Current benchmarks either do not measure real-world tasks, or are not sufficiently difficult to meaningfully measure frontier models. To this end,arxiv.orgTask FormulationTerminal-Bench의 테스크는 에이전트가 현실적..

카테고리 없음 2026.03.18

« 2026/05 »

일

월

화

수

목

금

토

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

khseon7 님의 블로그

Terminal-bench 1

티스토리툴바