Big models call for open source multimodal AI to accelerate the sprint into the physical world
From June 6 to 7, the 2025 Zhiyuan Conference was held in Zhongguancun, Beijing, attracting industry insiders from home and abroad. The conference focused on the current status and future direction of AI and embodied intelligence. Participants believe that large models are moving from the digital world to the physical world, and multimodal technology is expected to usher in an inflection point for large-scale implementation this year. Wang Zhongyuan, director of the Zhiyuan Research Institute, said that large models are evolving from large language models to native multimodal large models and world models, and are accelerating from the digital world to the physical world. The conference set up an interactive experience booth for AI scientific research results, where participants can experience the latest applications, including robots based on RoboOS 2 and RoboBrain 2, which can complete actions such as making hamburgers and pouring drinks. Industry insiders heatedly discussed multimodal models and judged that this year is expected to be an inflection point for the large-scale production of multimodal models. The guests at the conference generally believed that open source is the core of AI development, data set sharing is the key foundation, and global cooperation is crucial.