VMEvalKit πŸŽ₯🧠

Github video reasoning evaluation toolkit
VMevalkit slack community for VMEvalKit

results Paper Hugging Face WeChat Homepage

A framework to evaluate reasoning capabilities in video generation models at scale.

Invitation to Collaborate 🀝

VMEvalKit is meant to be a permissively open-source shared playground for everyone. If you’re interested in machine cognition, video models, evaluation, or anything anything πŸ¦„βœ¨, we’d love to build with you:

πŸ’¬ Join us on Slack to ask questions, propose ideas, or start a collab: Slack Invite πŸš€

Research

Here we keep track of papers spinned off from this code infrastructure and some works in progress.

This paper implements our experimental framework and demonstrates that leading video generation models (Sora-2 etc) can perform visual reasoning tasks with >60% success rates. See results.

License

Apache 2.0

Citation

If you find VMEvalKit useful in your research, please cite:

@misc{VMEvalKit,
  author       = {VMEvalKit Team},
  title        = {VMEvalKit: A framework for evaluating reasoning abilities in foundational video models},
  year         = {2025},
  howpublished = {\url{https://github.com/Video-Reason/VMEvalKit}}
}