- NVIDIA launches Llama Nemotron, a family of open-source reasoning AI models built for developing agentic AI platforms.
- These models are post-trained for higher accuracy and efficiency, with flexible deployment across PCs, data centres, and multi-GPU systems.
- Accenture, Amdocs, Atlassian, Box, Cadence, CrowdStrike, Deloitte, IQVIA, Microsoft, SAP, and ServiceNow are working with NVIDIA to develop advanced AI agents that will change the way people work.
NVIDIA has introduced Llama Nemotron, a new family of advanced reasoning models developed to support developers and enterprises in building agentic AI platforms. These models mark a significant development in how artificial intelligence is designed and deployed, providing structured reasoning capabilities that can be applied across a wide range of industries and tasks.
The Llama Nemotron models are open and post-trained by NVIDIA, offering a business-ready solution for building AI agents that perform complex decision-making, multistep calculations, and code generation.
Leading AI platform companies, such as Accenture, Amdocs, Atlassian, Box, Cadence, CrowdStrike, Deloitte, IQVIA, Microsoft, SAP, and ServiceNow are working with NVIDIA to use its new reasoning models and software.
These models are built on Meta’s Llama architecture and further enhanced using curated training methods, including reasoning capabilities adapted from DeepSeek-R1. The result is a set of models designed to work independently or as part of agentic systems that operate collaboratively to solve problems and carry out tasks.
“Reasoning and agentic AI adoption is incredible. NVIDIA’s open reasoning models, software and tools give developers and enterprises everywhere the building blocks to create an accelerated agentic AI workforce,” said Jensen Huang, founder and CEO of NVIDIA.,
What is NVIDIA Llama Nemotron?
Llama Nemotron is a suite of open-source reasoning AI models developed by NVIDIA and built upon the Llama architecture. These models are post-trained using NVIDIA’s expertise and enriched with reasoning capabilities inspired by DeepSeek-R1. Designed for a wide range of deployment needs, the models are suitable for use in data centres, cloud platforms, personal computers, and edge devices.
What makes Llama Nemotron stand out is its ability to switch reasoning on and off, depending on the task. This unique feature helps save computing power and reduces inference costs, especially when deep reasoning is not required for a specific query. It offers developers flexibility and cost-efficiency — two crucial aspects of AI deployment in large-scale operations.
Key benefits of Llama Nemotron for enterprises
High Accuracy: Post-trained by NVIDIA, the models offer up to 20% improved accuracy over the original Llama base model. This makes them particularly strong in complex areas like scientific reasoning, multistep problem-solving, and programming.
Compute Efficiency: The models are optimised to deliver 5x faster inference speeds, allowing businesses to run powerful AI systems while reducing operational costs. The ability to toggle reasoning further enhances performance and energy efficiency.
Commercial Viability: The Llama Nemotron models have been developed with transparency and adaptability in mind. They are secure, maintain internet-scale knowledge, and can be deployed across enterprise-grade, GPU-accelerated platforms.
Llama Nemotron model types
The Llama Nemotron models are released in three sizes, each designed to address different operational requirements:
- Nano: Suitable for PCs and edge computing, providing accurate reasoning performance on smaller devices.
- Super: Designed for data centres, offering high accuracy and strong throughput on single GPU environments.
- Ultra: Built for large-scale multi-GPU infrastructure, supporting the most complex agentic systems and decision-making tasks.
Tools to build intelligent AI agents
Llama Nemotron is part of a broader AI infrastructure supported by NVIDIA AI Enterprise, offering a set of deployment tools that make it easier for developers to create agentic systems. These include:
- NVIDIA NIM™: A set of microservices that provide stable APIs for deploying performance-optimised generative and reasoning AI models in enterprise environments.
- NVIDIA AI-Q Blueprint: A reference framework that allows developers to connect AI agents to structured knowledge bases. It supports multimodal information retrieval via NeMo Retriever™ and enables agent-to-data connections and transparency using the open-source AgentIQ toolkit.
- NVIDIA AI Data Platform: A reference architecture for enterprise infrastructure designed to support AI query agents and continuous learning processes.
- NVIDIA NeMo microservices: These services enable AI agents to establish a continuous feedback loop known as a data flywheel, allowing them to learn from both human and AI-generated data. This flywheel model supports ongoing improvements in agent performance and adaptability.
- AgentIQ Toolkit: Available now on GitHub, this toolkit provides developers with tools to improve agent reasoning, decision tracking, and deployment transparency.
These tools provide a complete environment for developers and enterprises to prototype, test, and scale agentic AI applications with secure, optimised, and flexible deployment options.
NVIDIA Llama Nemotron’s industry adoption and enterprise integration
Leading global companies are already adopting the Llama Nemotron family to build powerful agentic AI systems.
Microsoft is integrating the models and NIM microservices into Azure AI Foundry, expanding its catalogue to include reasoning tools that support platforms like Microsoft 365 AI Agent Services.
SAP is applying the models within its SAP Business AI portfolio and the Joule AI copilot. It is also using NVIDIA NeMo™ and NIM microservices to improve code completion for the ABAP programming language. “We are collaborating with NVIDIA to integrate Llama Nemotron reasoning models into Joule to enhance our AI agents, making them more intuitive, accurate and cost effective,” says Walter Sun, Global Head of AI at SAP.
ServiceNow is incorporating the models to develop AI agents aimed at improving enterprise productivity across workflows.
Accenture is deploying Llama Nemotron within its AI Refinery platform, enabling its clients to create and deploy AI agents designed for industry-specific problems.
Deloitte plans to use the models within Zora, its newly launched agentic AI platform. Zora focuses on simulating human decision-making and includes agents with deep functional expertise and transparency.
Other collaborators include Amdocs, Cadence, CrowdStrike, Atlassian, IQVIA, Box, and Soft Serve, all using Llama Nemotron to enhance their agentic AI platforms.
How to get started with Llama Nemotron
Developers and enterprises can explore Llama Nemotron models through:
- build.nvidia.com and Hugging Face (for free development access)
- NVIDIA AI Enterprise (for production deployment)
No credits are needed to start prototyping, and data privacy is maintained as user data is not used for further model training. Support is also available from NVIDIA’s AI specialists.
Developers can also build custom agents using NVIDIA NeMo™ and deploy reference workflows using NVIDIA Blueprints, which provide examples such as multimodal retrieval-augmented generation (RAG), digital human agents, and AI reporting assistants.
The AI-Q Blueprint will be publicly available in April 2025, while the AgentIQ toolkit is already live on GitHub.
Why NVIDIA Llama Nemotron matters?
Llama Nemotron addresses a clear demand in the AI ecosystem: the need for models that go beyond text generation and into structured reasoning. These models are designed to handle multi-step tasks, make logical decisions, follow instructions, and even interact with tools or external software systems.
The ability to switch reasoning capabilities on and off further improves usability and efficiency, especially for enterprises working with high volumes of varied queries. Whether it’s a digital assistant supporting complex customer questions or an agent helping engineers write and review code, Llama Nemotron offers a foundation to support that functionality.
By making the models open and equipping developers with deployment tools, NVIDIA enables a wider ecosystem of businesses and organisations to adopt agentic AI, without needing to build every system from the ground up.
Pallavi Singal is the Vice President of Content at ztudium, where she leads innovative content strategies and oversees the development of high-impact editorial initiatives. With a strong background in digital media and a passion for storytelling, Pallavi plays a pivotal role in scaling the content operations for ztudium’s platforms, including Businessabc, Citiesabc, and IntelligentHQ, Wisdomia.ai, MStores, and many others. Her expertise spans content creation, SEO, and digital marketing, driving engagement and growth across multiple channels. Pallavi’s work is characterised by a keen insight into emerging trends in business, technologies like AI, blockchain, metaverse and others, and society, making her a trusted voice in the industry.