OpenAI LLC this week detailed POINT-E, a new artificial intelligence system that can generate 3D models based on text prompts.
The research group has made the code for POINT-E available on GitHub.
There are multiple AI applications that can generate two-dimensional images based on a text description provided by a user. According to OpenAI, such applications render images in a few seconds or minutes when running on a single data center graphics card. In contrast, generating a 3D model typically takes a few hours when using comparable hardware.
OpenAI built POINT-E to speed up the process. According to the research group, POINT-E can generate a 3D model in as little as one minute when running on an Nvidia V100 graphics card.
When the AI system receives a user prompt describing an object, it doesn’t generate a 3D model of the object directly. Instead, the AI system first creates a two-dimensional drawing of the specified object. From there, POINT-E turns the two-dimensional drawing into a three-dimensional point cloud, which is a basic version of a 3D model that functions as an outline.
Each step of the process is carried out by a different neural network. The first step, which involves turning the user’s description of an object into a two-dimensional drawing, is carried out by a neural network dubbed GLIDE that OpenAI originally released last year. The version of GLIDE used in POINT-E features three billion parameters. Parameters are the configuration settings that define how a neural network goes about processing data.
After POINT-E generates a two-dimensional drawing of an object, the drawing is turned into a point cloud by two separate neural networks. The first neural network generates an initial, low-resolution point cloud with 1,000 pixels. The second algorithm, which is described as a simpler version of the first, adds 3,000 more pixels to increase the point cloud’s resolution.
“For image diffusion models, the best quality is typically achieved by using some form of hierarchy, where a low resolution base model produces output which is then upsampled by another model,” OpenAI scientists detailed in a research paper explaining POINT-E. “Our upsampler uses the same architecture as our base model.”
The neural networks that Point-E uses to generate 3D models are based on a machine learning method known as diffusion. The method, which was was first introduced in 2015, also powers an image generation AI that Google LLC debuted earlier this year.
To build a diffusion model, engineers create images that contain a type of error known as Gaussian noise. They then task the diffusion model with removing this noise. By repeating the process many times, a neural network can learn techniques that allow it to generate new images from scratch.
After POINT-E creates a point cloud of an object, it turns the point cloud into a 3D model with the help of Blender, an open-source graphic design application. The process of creating a 3D model in Blender is managed by an automated script.
“While our method performs worse on this evaluation than state-of-the-art techniques, it produces samples in a small fraction of the time,” OpenAI’s researchers detailed. “This could make it more practical for certain applications, or could allow for the discovery of higher-quality 3D objects by sampling many objects and selecting the best one.”