NVIDIA Expands Generative AI Ecosystem with New NIMs and Partnerships

NVIDIA has announced a major expansion of its generative AI capabilities, going beyond its traditional chip business. The company has introduced a range of new offerings for its “NIM” container software, addressing numerous functions and industries. Additionally, NVIDIA has partnered with developer tools site Hugging Face to launch a cloud service called “Inference-as-a-service” for running these programs.

The move signifies NVIDIA’s commitment to making generative AI more accessible and easier to deploy for developers across various sectors. By expanding its NIM roster from a couple dozen to over 100, the company aims to cater to diverse use cases and industries, including robotics, digital biology, and 3D product development.

With the introduction of new NIMs for open-source AI models like Meta’s Llama 3.1 and Mistral NeMo 12B, as well as a NIM for adding speech to chatbots, NVIDIA is empowering developers to create more sophisticated and interactive applications. The company has also upgraded its Edify NIM in collaboration with Getty Images, significantly improving the rendering speed of Getty’s generative AI image-making.

Furthermore, NVIDIA’s Omniverse SDK for Apple’s Vision Pro headset opens up new possibilities for creating virtual worlds and using virtual twins for tasks such as training robotic computers. This development showcases the company’s commitment to pushing the boundaries of generative AI and its applications in various domains.

By offering the Hugging Face inference service on its own DGX Cloud, NVIDIA provides developers with a significant performance boost, enabling faster inference operations for models like Llama 3.1 70B. Although currently limited to models “NIM-ified” by NVIDIA, this service marks a significant step towards making generative AI more accessible and efficient.

According to Yahoo, NVIDIA’s AI Enterprise subscription, priced at $4,500 per GPU per year, allows developers to access and run NIMs in any environment with GPUs, providing flexibility and scalability for their projects. As the company continues to focus on performance and accessibility, it is poised to drive further advancements in the field of generative AI.

What are NIMs?

Nvidia introduced its NIM (Nvidia Inference Microservices) platform in January 2023. NIMs are designed to package AI models in application containers, making it easier for developers to integrate them into their applications. This approach simplifies the process of deploying AI models and ensures they run efficiently on Nvidia’s hardware.

Since its introduction, Nvidia has rapidly expanded its NIM roster from just a couple dozen to over 100. These NIMs cover a wide range of industries and use cases, demonstrating Nvidia’s commitment to making generative AI more accessible and versatile.

The expansion of the NIM platform marks a significant step in Nvidia’s strategy to establish itself as a leader in the generative AI space. By providing a comprehensive suite of tools and services that extend beyond its traditional chip business, Nvidia is positioning itself to play a crucial role in the future development and deployment of generative AI technologies.

New NIMs and Partnerships

Nvidia has introduced a slew of new NIMs for open-source AI models. These include Meta’s Llama 3.1 and Mistral NeMo 12B, jointly developed with Nvidia and French AI company Mistral AI.

A new NIM adds speech capabilities to chatbots, built on the Parakeet model by Nvidia and Suno.ai. Other NIMs focus on robotics, digital biology, and 3D product development, with support for Open USD, a standard for translating between 3D simulation environments.

In partnership with Getty Images, Nvidia has upgraded its Edify NIM, significantly improving the rendering speed of Getty’s generative AI image creation. As ZDnet points out, the company also released the first version of its Omniverse SDK for Apple’s Vision Pro headset, enabling users to create virtual worlds and use virtual twins for tasks like training robotic computers.

These partnerships and advancements showcase Nvidia’s commitment to expanding its generative AI capabilities across various industries and use cases.

Hugging Face Inference Service

Nvidia has partnered with Hugging Face, a popular developer tools site, to launch a new cloud service called “Inference-as-a-service.” This service runs on Nvidia’s powerful DGX Cloud and provides a significant performance boost for running AI models.

One notable example is the Llama 3.1 70B model, which can perform inference operations up to five times faster on the Hugging Face Inference Service compared to running on “off-the-shelf” hardware. This speed improvement is a game-changer for developers looking to deploy large-scale AI models efficiently.

Although Hugging Face offers an impressive 750,000 models, the inference-as-a-service is currently limited to the models that Nvidia has “NIM-ified.” This means that developers can only take advantage of the performance benefits for models that have been packaged into Nvidia’s NIM container software.

Despite this limitation, the Hugging Face Inference Service represents a significant step forward in making high-performance AI inference more accessible to developers. By leveraging Nvidia’s cutting-edge hardware and software optimizations, developers can now deploy their models at scale with ease, paving the way for more innovative and impactful AI applications.

Image credit: Nvidia