News
Accelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight
1+ day, 14+ hour ago (840+ words) In vision AI systems, model throughput continues to improve. The surrounding pipeline stages must keep pace, including decode, preprocessing, and GPU scheduling. In the previous post, Build High-Performance Vision AI Pipelines with NVIDIA CUDA-Accelerated VC-6, this was described as the data-to-tensor…...
Bringing AI Closer to the Edge and On-Device with Gemma 4
2+ day, 1+ hour ago (242+ words) The bundle includes four models, including Gemma's first MoE model, which can all fit on a single NVIDIA H100 GPU and supports over 140 languages. The 31B and 26B A4B variants are high-performing reasoning models suitable for both local and data center environments. The E4B and…...
Achieving Single-Digit Microsecond Latency Inference for Capital Markets
1+ day, 18+ hour ago (1177+ words) NVIDIA GH200 Grace Hopper Superchip sets record in STAC-ML benchmark The NVIDIA GH200 Grace Hopper Superchip in the Supermicro ARS-111GL-NHR server has achieved single-digit microsecond latencies in the STAC-ML Markets (Inference) benchmark, Tacana suite (audited by STAC), providing performance comparable to…...
Accelerate Token Production in AI Factories Using Unified Services and Real-Time AI
2+ day, 20+ hour ago (356+ words) Operations teams and administrators need more than dashboards. They need flexibility and foresight. In one example where NVIDIA had MAX-Q profile in operation, domain power service allowed the data center to run at 85% power with only 7% throughput loss. It was…...
NVIDIA Extreme Co-Design Delivers New MLPerf Inference Records
2+ day, 19+ hour ago (671+ words) Co-designed hardware, software, and models are key to delivering the highest AI factory throughput and lowest token cost. Measuring this goes far beyond peak chip specifications. Rigorous AI inference performance benchmarks are critical to understanding real-world token output, which drives…...
CUDA Tile Programming Now Available for BASIC!
2+ day, 19+ hour ago (949+ words) CUDA 13.1 introduced CUDA Tile, a next generation tile-based GPU programming paradigm designed to make fine-grained parallelism more accessible and flexible. One of its key strengths is language openness: any programming language can target CUDA Tile, enabling developers to bring tile-based…...
NVIDIA Grace CPU Delivers High Bandwidth and Efficiency for Modern Data Centers
3+ mon, 4+ week ago (372+ words) In this blog post, we'll explore the advantages of the Grace Non-Uniform Memory Access (NUMA) monolithic architecture. We'll dive into memory bandwidth per-core, scalability, and efficiency, and compare its design approach to traditional x86 chiplet-based CPUs. Figure 1 below shows the NVIDIA…...
Maximize AI Infrastructure Throughput by Consolidating Underutilized GPU Workloads
1+ week, 2+ day ago (706+ words) Solving this isn't just about cost reduction'it's about optimizing cluster density to serve more concurrent users on the same world-class hardware. This guide details how to implement and benchmark GPU partitioning strategies, specifically NVIDIA Multi-Instance GPU (MIG) and time-slicing to…...
How Centralized Radar Processing on NVIDIA DRIVE Enables Safer, Smarter Level 4 Autonomy
1+ week, 2+ day ago (841+ words) The real 3D/4D "image" signal is instead processed inside the edge device. The radar outputs objects, or in some cases point clouds, which is similar to a camera outputting a classical CV Canny edge'detection image. In this blog, we explain how…...
Designing Protein Binders Using the Generative Model Proteina-Complexa
1+ week, 2+ day ago (895+ words) To address these challenges, NVIDIA has released Proteina-Complexa, a generative model that designs de novo protein binders and enzymes." In this post, we detail the key technologies behind Proteina-Complexa, explore primary use cases, and highlight the extensive experimental validation of…...