Pre-Prints

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Chenyu Wang, Paria Rashidinejad, DiJia Su, Song Jiang, Sid Wang, Siyan Zhao, Cai Zhou, Shannon Zejiang Shen, Feiyu Chen, Tommi Jaakkola, Yuandong Tian, Bo Liu

MATH-AI@NeurIPS 2025; Efficient Reasoning@NeurIPS 2025

Publications

Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking

Paria Rashidinejad, Yuandong Tian

International Conference on Learning Representations (ICLR) 2025

Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

Hanlin Zhu, Paria Rashidinejad, Jiantao Jiao

Advances in Neural Information Processing Systems (NeurIPS) 2023

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, Jiantao Jiao

International Conference on Learning Representations (ICLR), 2023 (Spotlight, top 8%)

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, Stuart Russell

IEEE Transactions on Information Theory, 2022
Advances in Neural Information Processing Systems (NeurIPS), 2021
Paper Talk

MADE: Exploration via Maximizing Deviation from Explored Regions

Tianjun Zhang*, Paria Rashidinejad*, Jiantao Jiao, Yuandong Tian, Joseph Gonzalez, Stuart Russell
*equal contribution

Advances in Neural Information Processing Systems (NeurIPS), 2021

Patient-Adaptable Intracranial Pressure Morphology Analysis Using a Probabilistic Model-Based Approach

Paria Rashidinejad, Xiao Hu, Stuart Russell

Physiological Measurement, 2020

SLIP: Learning to Predict in Unknown Dynamical Systems with Long-Term Memory

Paria Rashidinejad, Jiantao Jiao, Stuart Russell

Advances in Neural Information Processing Systems (NeurIPS), 2020 (Oral, top 1%)

Patents

Techniques for Accurately Estimating the Reliability of Storage Systems

Paria Rashidinejad, Navaneeth Jamadagni, Arun Raghavan, Craig Schelp, Charles Gordon

U.S. Patent No. 11,416,324.