BabyVLM Tutorial – ICDL 2026

Abstract

"What I cannot create, I do not understand." — Richard Feynman

Pretraining vision foundation models (VFMs) is prohibitively expensive, making it a privilege for institutions with abundant resources and leaving independent researchers to downstream tasks such as benchmarking, interpreting, and aligning VFMs. This situation is a crisis for computer vision research: independent researchers and the public cannot gain a true understanding, trust, or safe use of VFMs passively from open weights or APIs.

We propose democratizing VFM pretraining by scaling it down to a developmentally plausible framework that is scientifically reasonable and computationally friendly to university budgets. Our goal is to promote exploration rather than exploitation of pretraining, enabling independent researchers to build general-purpose VFMs that approach "baby intelligence" — benefiting efforts toward "grown-up" AI.

This framework closely mimics the minimal yet highly informative sensory experiences of human infants, encompassing three pillars:

🎥

Pretraining Data

Curated from longitudinal, egocentric audiovisual recordings of babies — capturing how infants naturally perceive the world.

🧠

Evaluation Benchmarks

A suite of developmentally aligned benchmarks assessing capabilities against cognitive milestones like object permanence, social skills, and language acquisition.

💻

Pretraining Codebase

A user-friendly codebase and baseline models designed to run on university-scale compute budgets.

Tutorial Schedule

Time	Session	Speaker
09:00 – 09:20	Opening & Motivation Talk	TBD
09:20 – 09:50	VFM Pretraining 101 Talk	TBD
09:50 – 10:20	BabyVLM Dataset & Curation Pipeline Talk	TBD
10:20 – 10:35	Coffee Break Break	—
10:35 – 11:05	Developmentally Aligned Benchmarks Talk	TBD
11:05 – 11:35	Hands-On: Train Your Baby VFM Hands-on	TBD
11:35 – 12:00	Live Demo & Q&A Demo	All Presenters

Presenters

👤

Presenter One

University A

👤

Presenter Two

University B

👤

Presenter Three

Institute C

Resources

All materials will be made available before the tutorial date. Links will be updated here.

Resource	Description	Link
Paper	Full technical report	(coming soon)
Dataset	Egocentric baby video corpus	(coming soon)
Code	Pretraining codebase & baselines	(coming soon)
Slides	Tutorial slide decks	(coming soon)
Notebook	Hands-on Colab notebook	(coming soon)

Citation

If you find this work useful, please cite:

TBD