judge - 搜索 News

资讯

1 天

大模型给自己当裁判并不靠谱！上海交通大学新研究揭示LLM-as-a-judge ...

大语言模型（LLM）正从工具进化为“裁判”（LLM-as-a-judge），开始大规模地评判由AI自己生成的内容。这种高效的评估范式，其可靠性与人类判断的一致性，却很少被深入验证。

2 天

LLM裁判翻车！上海交大PersonaEval揭示AI角色识别瓶颈，大模型评估待解

研究团队构建了名为 PersonaEval 的全新基准测试，旨在衡量LLM在角色扮演场景中识别说话者的能力。该测试基于真实的人类创作数据，并精心设计了“干扰项”，以模拟更复杂的场景。测试结果显示，即使是表现最好的模型 Gemini-2.5-pro ...

2 天

大模型给自己当裁判并不靠谱！上交揭示LLM-as-a-judge机制缺陷

中国日报网6 年

Referee, umpire, judge, official 四个表 “裁判” 的近义词

你问我答 Referee, umpire, judge, official 四个表 “裁判” 的近义词 Play audio file 本集节目内容简介听众 Hongyi 对名词 “referee、umpire、judge、official” 的用法感到困惑。在谈论体育运动的语境中，这四个 “裁判员” 似乎做着同样的工作，但为什么他们却有不同的名字呢？

China.org.cn7 天

Judge sanctioned by U.S. hails Brazil's constitution for guaranteeing independent judiciary

SAO PAULO, Aug. 11 (Xinhua) -- Brazil's Supreme Federal Court judge Alexandre de Moraes, who the United States sanctioned for putting former President Jair Bolsonaro on trial, praised Brazil's constit ...

西部网（陕西新闻网）8 天

Dancesport judge shares secrets of elegance

Want to know the secret to staying elegant? Sergejs Sinkins, Dancesport Judge at the Chengdu World Games, has gone viral on ...

法国国际广播电台16 天

华府联邦检察长任命案参院表决过关

74岁的皮洛过去担任过纽约州韦斯切斯特郡（Westchester County）地区检察官，2008到2011年间因主持周末电视节目「皮洛法官」（Judge Jeanine Pirro）而声名大噪，之后在福斯新闻频道主持「皮洛法官谈正义」（Justice with Judge Jeanine）长达11年。

China.org.cn12 天

U.S. judge orders temporary halt to construction of "Alligator Alcatraz" detention center

WASHINGTON, Aug. 7 (Xinhua) -- A U.S. federal judge on Thursday ordered Florida to temporarily halt construction at an immigration detention center in the Everglades, marking a temporary win for ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果

资讯