SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
Research
Pre-Prints
Publications
Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning
Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
MADE: Exploration via Maximizing Deviation from Explored Regions
Patient-Adaptable Intracranial Pressure Morphology Analysis Using a Probabilistic Model-Based Approach
SLIP: Learning to Predict in Unknown Dynamical Systems with Long-Term Memory
Patents
Techniques for Accurately Estimating the Reliability of Storage Systems