Carnegie Mellon College at ICML 2025 – Machine Studying Weblog

CMU researchers are presenting 127 papers on the Forty-Second Worldwide Convention on Machine Studying (ICML 2025), held from July Thirteenth-Nineteenth on the Vancouver Conference Middle. Here’s a fast overview of the areas our researchers are engaged on:

Listed below are our most frequent collaborator establishments:

Oral Papers

Anticipated Variational Inequalities

Authors: Brian Zhang, Ioannis Anagnostides, Emanuel Tewolde, Ratip Emin Berker, Gabriele Farina, Vincent Conitzer, Tuomas Sandholm

This paper introduces anticipated variational inequalities (EVIs), a relaxed model of variational inequalities (VIs) the place the objective is to discover a distribution that satisfies the VI situation in expectation. Whereas VIs are typically exhausting to resolve, the authors present that EVIs may be solved effectively, even underneath difficult, non-monotone circumstances, by leveraging concepts from sport principle. EVIs generalize the idea of correlated equilibria and unify varied outcomes throughout easy video games, constrained video games, and settings with non-concave utilities, making them broadly relevant past conventional game-theoretic contexts.

Exploring and Mitigating Adversarial Manipulation of Voting-Primarily based Leaderboards

Authors: Yangsibo Huang, Milad Nasr, Anastasios Angelopoulos, Nicholas Carlini, Wei-Lin Chiang, Christopher A. Choquette Choo, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Ken Ziyu Liu, Ion Stoica, Florian Tramer, Chiyuan Zhang

This paper reveals that voting-based benchmarks for evaluating LLMs (akin to Chatbot Area) may be susceptible to adversarial manipulation if correct defenses aren’t in place. The authors present that an attacker can determine which mannequin generated a response after which strategically vote to spice up or demote particular fashions, altering the leaderboard with solely round a thousand votes in a simulated surroundings. They collaborate with Chatbot Area’s builders to suggest and implement safety measures akin to reCAPTCHA and login necessities that considerably increase the price of such assaults and improve the platform’s robustness.

Excessive-Dimensional Prediction for Sequential Determination Making

Authors: Georgy Noarov, Ramya Ramalingam, Aaron Roth, Stephan Xie

This paper presents a brand new algorithmic framework for making dependable, multi-dimensional forecasts in adversarial, nonstationary environments. In contrast to present on-line studying strategies, this strategy gives simultaneous efficiency ensures for a lot of brokers, even after they face totally different goals, act over giant motion areas, or care about particular circumstances (e.g. climate or route alternative). The algorithm ensures low bias throughout many conditional occasions and allows every agent to attain sturdy ensures like diminishing remorse. Functions embody environment friendly options for on-line combinatorial optimization and multicalibration.

LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Massive Language Fashions

Authors: Parshin Shojaee, Ngoc Hieu Nguyen, Kazem Meidani, Amir Barati Farimani, Khoa Doan, Chandan Reddy

This paper introduces LLM-SRBench, a brand new benchmark designed to carefully consider the flexibility of LLMs to find scientific equations (moderately than merely recall them from coaching knowledge). Present assessments typically depend on well-known equations, making it exhausting to inform whether or not fashions are actually reasoning or simply memorizing. LLM-SRBench addresses this by together with 239 difficult issues throughout 4 scientific domains, cut up into two classes: one which disguises acquainted physics equations (LSR-Remodel) and one other that options absolutely artificial, reasoning-driven duties (LSR-Synth). Evaluations present that even one of the best present fashions solely obtain 31.5% accuracy, highlighting the problem of the duty and establishing LLM-SRBench as a priceless software for driving progress in LLM-based scientific discovery.

On Differential Privateness for Adaptively Fixing Search Issues through Sketching

Authors: Shiyuan Feng, Ying Feng, George Li, Zhao Music, David Woodruff, Lichen Zhang

This paper explores use differential privateness to guard towards data leakage in adaptive search queries, a tougher downside than conventional non-public estimation duties. In contrast to prior work that solely returns numerical summaries (e.g., value), the authors design algorithms that return precise options, like nearest neighbors or regression vectors, even when the inputs or queries change over time. They present how key downside parameters (just like the variety of approximate close to neighbors or situation variety of the info matrix) have an effect on the efficiency of those non-public algorithms. This work has sensible implications for AI programs that depend on non-public database searches or real-time regression, enabling them to offer helpful outcomes whereas safeguarding delicate data from attackers.

Roll the cube & look earlier than you leap: Going past the artistic limits of next-token prediction

Authors: Vaishnavh Nagarajan, Chen Wu, Charles Ding, Aditi Raghunathan

This paper proposes a set of straightforward, summary duties designed to probe the artistic limits of as we speak’s language fashions in a managed and measurable approach. These duties mimic real-world open-ended challenges like producing analogies or designing puzzles, the place success requires discovering new connections or establishing novel patterns. The authors present that commonplace next-token prediction tends to be short-sighted and overly reliant on memorization, whereas different approaches like teacherless coaching and diffusion fashions produce extra numerous, unique outputs. Additionally they introduce a method referred to as seed-conditioning, which provides randomness on the enter moderately than the output and may enhance coherence with out sacrificing creativity.

Coaching a Usually Curious Agent

Authors: Fahim Tajwar, Yiding Jiang, Abitha Thankaraj, Sumaita Rahman, Zico Kolter, Jeff Schneider, Russ Salakhutdinov

This paper introduces Paprika, a fine-tuning methodology that equips language fashions with common decision-making and exploration methods, enabling them to adapt to new duties by means of interplay alone (i.e. with out additional coaching). Paprika trains fashions on artificial environments requiring totally different exploration behaviors, encouraging them to study versatile methods moderately than memorizing options. To enhance effectivity, it makes use of a curriculum learning-based strategy that prioritizes duties with excessive studying worth, taking advantage of restricted interplay knowledge. Fashions educated with Paprika present sturdy switch to utterly new duties, suggesting a promising route for constructing AI brokers that may study to resolve unfamiliar, sequential issues with minimal supervision.

Highlight Papers

GMAIL: Generative Modality Alignment for generated Picture Studying

Authors: Shentong Mo, Sukmin Yun

Generative fashions can create lifelike photographs that would assist prepare machine studying fashions, however utilizing them as in the event that they had been actual photographs can result in issues due to variations between the 2. This paper introduces a way referred to as GMAIL that treats actual and generated photographs as separate varieties (or modalities) and aligns them in a shared latent area throughout coaching, moderately than simply mixing them on the pixel stage. The strategy fine-tunes fashions on generated knowledge utilizing a particular loss to bridge the hole, then makes use of these aligned fashions to enhance coaching on duties like picture captioning and retrieval. The outcomes present that GMAIL improves efficiency on a number of vision-language duties and scales nicely as extra generated knowledge is added.

LOCATE 3D: Actual-World Object Localization through Self-Supervised Studying in 3D

Authors: Paul McVay, Sergio Arnaud, Ada Martin, Arjun Majumdar, Krishna Murthy Jatavallabhula, Phillip Thomas, Ruslan Partsey, Daniel Dugas, Abha Gejji, Alexander Sax, Vincent-Pierre Berges, Mikael Henaff, Ayush Jain, Ang Cao, Ishita Prasad, Mrinal Kalakrishnan, Michael Rabbat, Nicolas Ballas, Mahmoud Assran, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

LOCATE 3D is a mannequin that may discover particular objects in 3D scenes primarily based on pure language descriptions (like “the small espresso desk between the couch and the lamp”). It achieves state-of-the-art efficiency on commonplace benchmarks and works nicely in real-world settings, like on robots or AR gadgets, through the use of RGB-D sensor knowledge. A key element is 3D-JEPA, a brand new self-supervised studying methodology that makes use of options from 2D imaginative and prescient fashions (like CLIP or DINO) to know 3D level clouds by means of masked prediction duties. The mannequin is educated on a newly launched giant dataset (130K+ examples), serving to it generalize higher throughout totally different environments.

Masked Autoencoders Are Efficient Tokenizers for Diffusion Fashions

Authors: Hao Chen, Yujin Han, Fangyi Chen, Xiang Li, Yidong Wang, Jindong Wang, Ze Wang, Zicheng Liu, Difan Zou, Bhiksha Raj

This paper introduces MAETok, a masked autoencoder designed to create a high-quality, semantically significant latent area for diffusion fashions. The authors present that having a well-structured latent area, that means fewer Gaussian modes and extra discriminative options, results in higher picture technology with no need advanced variational autoencoders. MAETok outperforms present strategies on ImageNet utilizing simply 128 tokens, and it’s additionally a lot quicker: 76× faster to coach and 31× quicker throughout inference. The important thing takeaway is that the construction of the latent area, not variational constraints, is what actually issues for high-quality diffusion-based technology.

Place: In-Home Analysis Is Not Sufficient. In direction of Sturdy Third-Occasion Analysis and Flaw Disclosure for Common-Function AI

Authors: Shayne Longpre, Kevin Klyman, Ruth Elisabeth Appel, Sayash Kapoor, Rishi Bommasani, Michelle Sahar, Sean McGregor, Avijit Ghosh, Borhane Blili-Hamelin, Nathan Butters, Alondra Nelson, Amit Elazari, Andrew Sellars, Casey Ellis, Dane Sherrets, Daybreak Music, Harley Geiger, Ilona Cohen, Lauren McIlvenny, Madhulika Srikumar, Mark Jaycox, Markus Anderljung, Nadine Johnson, Nicholas Carlini, Nicolas Miailhe, Nik Marda, Peter Henderson, Rebecca Portnoff, Rebecca Weiss, Victoria Westerhoff, Yacine Jernite, Rumman Chowdhury, Percy Liang, Arvind Narayanan

This paper highlights the shortage of strong programs for figuring out and reporting flaws in general-purpose AI (GPAI), particularly in comparison with mature fields like software program safety. The authors suggest three key options: (1) standardized reporting codecs and engagement guidelines to streamline flaw reporting and triaging, (2) formal disclosure applications with authorized protections for researchers (much like bug bounties), and (3) higher infrastructure for distributing flaw stories to related stakeholders. These steps goal to deal with rising dangers like jailbreaks and cross-system vulnerabilities, finally bettering the protection and accountability of GPAI programs.

Scaling Take a look at-Time Compute With out Verification or RL is Suboptimal

Authors: Amrith Setlur, Nived Rajaraman, Sergey Levine, Aviral Kumar

This paper explores finest scale test-time compute for giant language fashions (LLMs), evaluating two methods: (1) distilling search traces (verifier-free, or VF) and (2) utilizing verifiers or rewards to information studying (verifier-based, or VB). The authors present—each theoretically and thru experiments—that VB strategies considerably outperform VF ones when working with restricted compute or knowledge. They clarify that this efficiency hole grows as fashions and duties get extra advanced, particularly when resolution paths fluctuate in model or high quality. In the end, the paper argues that verification is important for successfully scaling LLM efficiency, particularly for reasoning duties.

ShadowKV: KV Cache in Shadows for Excessive-Throughput Lengthy-Context LLM Inference

Authors: Hanshi Solar, Li-Wen Chang, Wenlei Bao, Dimension Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen

As long-context LLMs grow to be extra widespread, their rising reminiscence calls for throughout inference decelerate efficiency, particularly because of the increasing key-value (KV) cache. This paper introduces ShadowKV, a system that considerably improves throughput by compressing the important thing cache utilizing low-rank representations and offloading the worth cache with out main latency prices. It reconstructs solely the mandatory KV pairs throughout decoding to take care of velocity and accuracy. Experiments present ShadowKV helps a lot bigger batch sizes (as much as 6×) and improves throughput by over 3× on commonplace {hardware}, all whereas preserving mannequin high quality throughout a number of LLMs and benchmarks.

Poster Papers

Accountability, Transparency, And Interpretability

Lively Studying And Interactive Studying

Functions

Home windows Agent Area: Evaluating Multi-Modal OS Brokers at Scale

Authors: Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Yadong Lu, Justin Wagle, Kazuhito Koishida, Arthur Bucker, Lawrence Jang, Zheng Hui

Causality

Chemistry, Physics, And Earth Sciences

Pc Imaginative and prescient

From Hundreds to Billions: 3D Visible Language Grounding through Render-Supervised Distillation from 2D VLMs

Authors: Ang Cao, Sergio Arnaud, Oleksandr Maksymets, Jianing Yang, Ayush Jain, Ada Martin, Vincent-Pierre Berges, Paul McVay, Ruslan Partsey, Aravind Rajeswaran, Franziska Meier, Justin Johnson, Jeong Joon Park, Alexander Sax

Deep Studying

Discrete And Combinatorial Optimization

Area Adaptation And Switch Studying

Analysis

RBench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Advanced Reasoning Analysis

Authors: Meng-Hao Guo, Jiajun Xu, Yi Zhang, Jiaxi Music, Haoyang Peng, Yi-Xuan Deng, Xinzhi Dong, Kiyohiro Nakayama, Zhengyang Geng, Chen Wang, Bolin Ni, Guo-Wei Yang, Yongming Rao, Houwen Peng, Han Hu, Gordon Wetzstein, Shi-min Hu

Every part Else

Equity

Basis Fashions

Sport Idea

Common Machine Studying

Graph Neural Networks

Graphical Fashions

Well being / Medication

Language, Speech And Dialog

Massive Language Fashions

An Structure Search Framework for Inference-Time Strategies

Authors: Jon Saad-Falcon, Adrian Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, Estefany Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Re, Azalia Mirhoseini

Suppose Smarter not Tougher: Adaptive Reasoning with Inference Conscious Optimization

Authors: Zishun Yu, Tengyu Xu, Di Jin, Karthik Abinav Sankararaman, Yun He, Wenxuan Zhou, Zhouhao Zeng, Eryk Helenowski, Chen Zhu, Sinong Wang, Hao Ma, Han Fang

Unnatural Languages Are Not Bugs however Options for LLMs

Authors: Keyu Duan, Yiran Zhao, Zhili Feng, Jinjie Ni, Tianyu Pang, Qian Liu, Tianle Cai, Longxu Dou, Kenji Kawaguchi, Anirudh Goyal, Zico Kolter, Michael Shieh

Studying Idea

Multi-agent

On-line Studying And Bandits

On-line Studying, Lively Studying And Bandits

Optimization

Privateness

Probabilistic Strategies

Reinforcement Studying And Planning

Illustration Studying

Analysis Priorities, Methodology, And Analysis

Robotics

Security

SafetyAnalyst: Interpretable, Clear, and Steerable Security Moderation for AI Habits

Authors: Jing-Jing Li, Valentina Pyatkin, Max Kleiman-Weiner, Liwei Jiang, Nouha Dziri, Anne Collins, Jana Schaich Borg, Maarten Sap, Yejin Choi, Sydney Levine

Safety

Sequential Fashions, Time Collection

Enhancing Basis Fashions for Time Collection Forecasting through Wavelet-based Tokenization

Authors: Luca Masserano, Abdul Fatir Ansari, Boran Han, Xiyuan Zhang, Christos Faloutsos, Michael Mahoney, Andrew Wilson, Youngsuk Park, Syama Sundar Yadav Rangapuram, Danielle Maddix, Yuyang Wang

Carnegie Mellon College at ICML 2025 – Machine Studying Weblog | ML@CMU

Admin

Monopoly Go advert brings again Will Ferrell as Mr Monopoly wrapped in a whole lot of 90s TV sitcom nostalgia

Leave a Reply Cancel reply

Trending.

Discover Vibrant Spring 2025 Kitchen Decor Colours and Equipment – Chefio

Reconeyez Launches New Web site | SDM Journal

Safety Amplified: Audio’s Affect Speaks Volumes About Preventive Safety

Flip Your Toilet Right into a Good Oasis

Apollo joins the Works With House Assistant Program

TechTrendFeed

Categories

Recent News

2 main upgrades coming to Apple’s next-generation Imaginative and prescient Professional