Login to Continue Learning
NVIDIA and OpenAI have introduced the latest GPT-OOs family of open-source AI models to consumers, enabling high-performance computing on RTX GPUs.
### NVIDIA’s RTX 5090 Delivers 250 Tokens/s Performance on OpenAI’s GPT-OOs 20B Model; PRO GPUs Also Ready for GPT-OOs 120B
**Press Release:**
Today, NVIDIA announced a collaboration with OpenAI to bring the new GPT-OOs family of open-source models to consumers. This allows state-of-the-art AI previously exclusive to cloud data centers to run at incredible speed on RTX-powered PCs and workstations.
NVIDIA founder and CEO Jensen Huang emphasized:
“OpenAI demonstrated what could be built using NVIDIA’s AI, and now they’re advancing innovation in open-source software. The GPT-OOs models allow developers worldwide to build on this leading-edge open-source foundation, enhancing U.S. technology leadership in AI—all supported by the world’s largest AI compute infrastructure.”
The launch marks a new generation of faster, smarter on-device AI, powered by GeForce RTX GPUs and PRO GPUs. Two variants are now available:
– **GPT-OOs-20B Model:** Optimized for NVIDIA RTX AI PCs with at least 16GB VRAM, it delivers up to 250 tokens per second on an RTX 5090 GPU.
– **Larger GPT-OOs-120B Model:** Supported on professional workstations accelerated by NVIDIA RTX PRO GPUs.
Trained on NVIDIA H100 GPUs, these models are the first to support MXFP4 precision, improving model quality and accuracy without additional performance costs. Both models can handle up to 131,072 context lengths, among the longest available for local inference. They feature a flexible mixture-of-experts (MoE) architecture with chain-of-thought capabilities, supporting instruction-following and tool use.
### RTX AI Garage Highlights How to Get Started
This week’s RTX AI Garage showcases how AI enthusiasts and developers can begin using the new OpenAI models on NVIDIA RTX GPUs:
– **Ollama App:** The simplest way to test these models is through the Ollama app, which offers out-of-the-box support for GPT-OOs models fully optimized for RTX GPUs.
– **Llama.cpp:** NVIDIA collaborates with the open-source community to enhance performance on RTX GPUs. Recent contributions include CUDA Graphs to reduce overhead, available at the Llama.cpp GitHub repository.
– **Microsoft AI Foundry:** Windows developers can access these models via Microsoft AI Foundry Local (in public preview). Getting started is as simple as running `Foundry model run gpt-oss-20b` in a terminal.
📚 Reading Comprehension Quiz
According to the press release, which GPU is mentioned as delivering up to 250 tokens per second on the GPT-OOs-20B model?
Please login or register to take the quiz and earn points!