Robotique

Sign in to follow this category

Multimodal models and Vision-Language-Action (VLA)

From CLIP and ViT to VLMs (LLaVA, Flamingo, GPT-4o) then VLAs for robotics (RT-2, OpenVLA, π0): modality fusion and actions as tokens.

2026-06-16 14 min read