Abstract
Multimodal large language models (MLLMs) show strong potential as judges. However, existing approaches face a fundamental trade-off: adapting MLLMs to output a single score misaligns with the generative nature of MLLMs and limits fine-grained requirement understanding, whereas autoregressively generating judging analyses is prohibitively slow in high-throughput settings. Observing that judgment reduces to verifying whether inputs satisfy a set of structured requirements, we propose YOFO, a template-conditioned method that judges all requirements in a single forward pass. Built on an autoregressive model, YOFO accepts a structured requirement template and, in one inference step, produces a binary yes/no decision for each requirement by reading the logits of the final token associated with that requirement. This design yields orders-of-magnitude speedups while preserving interpretability. Extensive experiments show that YOFO not only achieves state-of-the-art results on standard recommendation datasets, but also supports dependency-aware analysis—where subsequent judgments are conditioned on previous ones—and further benefits from post-hoc CoT.
Key Features & Results
Paradigm Comparison
Transitioning from single score prediction to single-forward compositional judging.
Accuracy and Interpretability
YOFO enables fine-grained understanding and enhances interpretability.
Performance Results
Comprehensive evaluation shows YOFO outperforming existing judges in accuracy and ranking metrics.
Contact & Opportunities
Interested in YOFO or seeking collaboration? Contact us:
We're Hiring!
Accio Lab is actively seeking researchers and interns!
Research Areas
MLLM, MLLM-as-Judge, Efficient Inference, Agentic AI
What We Look For
Passion for AI research, strong coding skills, and independent thinking
What We Offer
Cutting-edge research, mentorship, and collaboration opportunities
What You'll Get
Cutting-edge technology, mature Agent products, and flexible work environment
BibTeX
@article{zhang2025you,
title={You Only Forward Once: An Efficient Compositional Judging Paradigm},
author={Zhang, Tianlong and Xue, Hongwei and Yan, Shilin and Wu, Di Institue and Xu, Chen and Guannan Zhang and Yang, Yunyun},
journal={arXiv preprint arXiv:2511.16600},
year={2025}
}