Zhipu AI Releases Open-Source GLM-4.6V Vision-Language Models

Chinese AI startup Zhipu AI (also known as Z.ai) has released its GLM-4.6V series, according to VentureBeat AI. The new generation of open-source vision-language models (VLMs) are optimized for multimodal reasoning, frontend automation, and high-efficiency deployment.

According to the report, the release includes two models differentiated by size: GLM-4.6V (106B) in a “large” configuration, and a “small” variant. The models are designed as native tool-calling vision models specifically built for multimodal reasoning tasks.

The release marks another entry in the growing field of open-source vision-language models, which combine visual understanding with language processing capabilities. By offering models in different sizes, Zhipu AI appears to be addressing varied deployment scenarios and computational resource constraints.

The focus on “native tool-calling” suggests the models are designed to integrate with external tools and APIs, a capability increasingly important for practical AI applications that need to interact with software systems and databases.