NVIDIA partnered with OpenAI to release gpt-oss open-source artificial intelligence models for consumer graphics cards. The RTX 5090 processes the gpt-oss 20b model at 250 tokens per second while requiring 16GB of video memory. Professional workstations can run the larger gpt-oss 120b variant through RTX PRO graphics processors. Both models use MXFP4 precision technology and support 131,072 context lengths for enhanced performance. The mixture-of-experts architecture enables advanced reasoning and tool integration capabilities.
Developers can access these models through three primary platforms. The Ollama application provides the simplest interface for testing RTX-optimized gpt-oss variants. Llama.cpp offers open-source community support with CUDA Graphs optimization for reduced processing overhead. Microsoft AI Foundry Local allows Windows users to run models through simple terminal commands during its public preview phase. H100 graphics processors trained both model versions before their consumer release.
Developers can access these models through three primary platforms. The Ollama application provides the simplest interface for testing RTX-optimized gpt-oss variants. Llama.cpp offers open-source community support with CUDA Graphs optimization for reduced processing overhead. Microsoft AI Foundry Local allows Windows users to run models through simple terminal commands during its public preview phase. H100 graphics processors trained both model versions before their consumer release.