Sora proves that Tesla is thinking right, and Tesla proves that Sora is worth more than just generating video
Sora Launch.MuskProbably the one with the most mixed feelings. Not only because of his own early involvement with OpenAI, but also because Sora's implementation is actually aNikola Tesla (1856-1943), Serbian inventor and engineerA direction that had been explored during the early years.
On February 18, Musk left a comment under a video by tech anchor @Dr.KnowItAll with the subject line "OpenAI's bombshell confirms Tesla's theory," stating that "Tesla has been able to make real-world videos using precise physics for about a year now.
He then retweeted a 2023 video on X featuring Ashok Elluswamy, Tesla's director of Autopilot, explaining how Tesla uses AI to simulate real-world driving. In the video, the AI generates seven different angles of driving video at the same time, and at the same time just needs to input commands such as "go straight" or "change lanes" to make these seven videos change in synchronization.
Of course, this does not mean that Tesla mastered Sora's technology a year ago, after all, Tesla's generation technology is only used to simulate the vehicle driving, while Sora can deal with the environment, the scene, Prompt, the laws of physics and other information is more complex, the two can not be compared with each other in terms of difficulty.
But Tesla AI and Sora are trained along the same lines: it's not about training the AI how to generate video, it's about training the AI to understand and generate a real scenario or world, and the video is just a period of time in which the scenario is viewed from a certain point of view. These are two very different companies with very different existing businesses, with different approaches to perceiving the real world, and they share a common desire to lead to AGI (General Artificial Intelligence), or even more specifically, Embodied Intelligence and Intelligent Bodies.
At the core of understanding this idea is understanding that OpenAI's mission for Sora is not just to replace the creators of video generation, but to use video generation as a 'simulator' to help AI make sense of the real world. If Tesla's millions of vehicles still need to feel the world 'in the flesh', then Sora relies solely on the input of data to build its knowledge of the world.
Now there are two completely different companies, OpenAI and Tesla, with very different approaches and paths to achieve the same goal of "enabling AI to understand the physical world through video generation".
A quick look at Sora's runtime logic: OpenAI says that Sora combines the Transformer and the Diffusion Two of the most important models of the past few years. language models such as ChatGPT, Gemini, and LLaMA are based on the Transformer model, which tags words and generates the next word, and the Diffusion model, which stands for 'text-generated graphs'.
If we look at Sora from the perspective of "understanding the world", then the image quality of a certain frame, the relationship between the frames is not the criterion for the quality of the model, and even the 60-second one-shot video released on the official website is not the most important part. What's important is that the generated video can be edited - the relationship between the characters and the background in the video remains highly "consistent" in different camera positions, whether it's wide-angle, medium-range, close-up, or close-up. This is what makes Sora so far ahead and close to the real thing.
Scale and quality are at the heart of training models.
Tesla's data comes from real roads, with sensor-equipped vehicles; while OpenAI's bulk of data, from what's publicly available so far, comes from the web. In the dimension of quality, in Musk's biography, author Isaacson writes that Tesla trains FSD by cooperating with Uber to obtain material from "five-star drivers"; and from scale, OpenAI's recent desire to raise trillion-dollar-sized funds is a concrete manifestation of the heavy focus on computing power and scale.
In Musk's view, AGI arrives when AI can actually solve a problem (physics, math, chemistry, etc.). But there is another dimension of understanding, and that is embodied intelligence. After all, the real world is not just mathematical formulas and written rules; kittens and puppies with a certain level of intelligence can also rely on movement to interact with the physical world in a real way.
This is difficult for AI that can only input two-dimensional information in the past. This is also why Musk after seeing Sora on X evaluation is "GG Humans", in his view Sora today, has broken the past dimension wall, and can understand the real world and continue to learn, the AI also has the ability to further influence the real world.