Tesla turned on its Dojo supercomputer this week, a move that will significantly accelerate dataset training for Full Self-Driving (FSD) Beta.
Tesla Dojo is a supercomputer designed and built by Tesla for computer vision video processing and recognition. It will be used to train Tesla’s machine learning models to improve its Full Self-Driving (FSD) advanced driver-assistance system. According to Tesla, it had begun to be used in production in July 2023.
The name “Dojo” is inspired by the concept of a training hall in Japanese martial arts, emphasizing the platform’s focus on training and refining the skills of Tesla’s AI systems.
Just as martial artists train and learn from their experiences, Tesla Dojo enables AI algorithms to continuously learn and improve through iterative training.
Dojo’s goal is to efficiently process millions of terabytes of video data captured from real-life driving situations from Tesla’s +4 million cars. This goal led to a considerably different architecture than conventional supercomputer designs.
Tesla operates several massively parallel computing clusters as it develops its Autopilot advanced driver assistance system. Its primary unnamed cluster using 5,760 Nvidia A100 graphics processing units (GPUs) was touted by Andrej Karpathy in 2021 at the Fourth International Joint Conference on Computer Vision and Pattern Recognition (CCVPR 2021) as “roughly the number five supercomputer in the world” at approximately 81.6 petaflops, based on scaling the performance of the Nvidia Selene supercomputer, which uses similar components. However, the performance of the primary Tesla GPU cluster has been disputed, as it was not clear if this was measured using single-precision or double-precision floating point numbers. Tesla also operates a second 4,032 GPU cluster for training and a third 1,752 GPU cluster for automatic labeling of objects.
Dojo was officially announced at Tesla’s Artificial Intelligence (AI) Day on August 19, 2021. Tesla revealed details of the D1 chip and its plans for “Project Dojo”, a data center that would house 3,000 D1 chips; the first “Training Tile” had been completed and delivered the week before. In October 2021, Tesla released a “Dojo Technology” whitepaper describing the Configurable Float8 (CFloat8) and Configurable Float16 (CFloat16) floating point formats and arithmetic operations as an extension of Institute of Electrical and Electronics Engineers (IEEE) standard 754.