A framework to evaluate reasoning capabilities in video generation models at scale.
VMEvalKit is meant to be a permissively open-source shared playground for everyone. If youβre interested in machine cognition, video models, evaluation, or anything anything π¦β¨, weβd love to build with you:
π¬ Join us on Slack to ask questions, propose ideas, or start a collab: Slack Invite π
Here we keep track of papers spinned off from this code infrastructure and some works in progress.
This paper implements our experimental framework and demonstrates that leading video generation models (Sora-2 etc) can perform visual reasoning tasks with >60% success rates. See results.
Apache 2.0
If you find VMEvalKit useful in your research, please cite:
@misc{VMEvalKit,
author = {VMEvalKit Team},
title = {VMEvalKit: A framework for evaluating reasoning abilities in foundational video models},
year = {2025},
howpublished = {\url{https://github.com/Video-Reason/VMEvalKit}}
}