This is Part 2 of a 3-part series covering GTC 2025:

Here are some of the technical highlights of my experience at the conference

CUDA C++ Workshop

On the sunday prior to the conference start, I was able to join NVIDIA CUDA engineers on a 4 hours CUDA C++ workshop, which took us through building a multi-body heat dissipation solver using the Thrust library. The workshop covered asynchronous copying, pinned memory, CUDA streams, and Nsight Systems debugging tools.

Thrust library overview

The Thrust library is not CUDA or NVIDIA specific. It works with containers that can be specified for device or host, and provides many synchronous algorithms such as sort, copy, find, tabulate, reduce_by_key and many others.

CUDA: New Features and Beyond

CUDA stack layers

In the “CUDA: New Features and Beyond [S72383]” session, Stephen Jones reminded everyone that CUDA is a whole lot more than just its bare C++ part, with a comprehensive diagram of all the layers that are part of the CUDA ecosystem.

All the CUDA presenters had a consistent message across these sessions: Don’t write any CUDA C++ kernel code unless you absolutely have to! Instead, use the growing library of highly-optimized CUDA-X libraries.

CuTile Programming Paradigm

CuTile programming comparison

One big announcement was for CuTile, a new programming paradigm that extends CUDA C++ kernel approach but instead of thinking in threads, groups and clusters, developers can think in terms of tiles, promising “shorter and clearer expression of the computation.”

CUDA Developer Sessions

GTC 2025 CUDA Developer Sessions

There were many sessions dedicated to the growing CUDA Python ecosystem, introducing new capabilities such as RAPIDS AI support as drop-in replacements for common Python tools: RAPIDS cuDF (pandas/Polars), cuML (scikit-learn), and cuPyNumeric (NumPy).

Summary of the NVIDIA CUDA-X Libraries

Compiled from https://developer.nvidia.com/gpu-accelerated-libraries

Category Library Description
Math Libraries cuBLAS GPU-accelerated basic linear algebra (BLAS) library
  cuFFT GPU-accelerated library for Fast Fourier Transform implementations
  cuRAND GPU-accelerated random number generation
  cuSOLVER Dense and sparse direct solvers for computer vision, CFD, computational chemistry, and linear optimization
  cuSPARSE GPU-accelerated library for sparse matrices
  cuTENSOR GPU-accelerated tensor linear algebra library
  cuDSS GPU-accelerated sparse direct solver
Math Libraries (Python) nvmath-python (Beta) Open source library providing high-performance access to core mathematical operations in the NVIDIA stack through a Pythonic interface
Parallel Algorithm Libraries Thrust GPU-accelerated library of C++ parallel algorithms and data structures (uses CUB)
Data Processing Libraries RAPIDS cuDF Accelerate tabular data, including pandas and Polars, with zero code changes
  RAPIDS cuML Speed up ML algorithms in scikit-learn, UMAP, and HDBSCAN
  RAPIDS cuGraph Accelerate NetworkX and graph analytics
  cuVS GPU-accelerated vector search and clustering
  NeMo Curator GPU-accelerated data curation for LLM pretraining
  Morpheus GPU-accelerated data pipeline for cybersecurity
  GPU Direct Storage Direct path between storage and GPU memory
Image and Video Libraries RAPIDS cuCIM Accelerate IO, computer vision, and image processing for biomedical images
  CV-CUDA Open-source GPU-accelerated library for cloud-scale image processing and computer vision
  DALI GPU-accelerated data augmentation and image loading for deep learning
  nvJPEG GPU-accelerated JPEG decoder, encoder, and transcoder
  Optical Flow SDK GPU-accelerated computation of optical flow vectors
Communication Libraries NVSHMEM OpenSHMEM standard for GPU memory, with extensions for improved performance on GPUs
  NCCL Open-source library for fast multi-GPU, multi-node collective communications
Deep Learning Core Libraries NVIDIA TensorRT High-performance deep learning inference optimizer and runtime for production deployment
  NVIDIA cuDNN GPU-accelerated library of primitives for deep neural networks
Quantum Libraries cuQuantum Set of highly optimized libraries for accelerating quantum computing simulations
  cuPQC SDK of optimized libraries for Post Quantum Cryptography




Continue to Part 3: Exhibit Hall - Hardware and Robotics to explore the cutting-edge hardware, liquid cooling systems, and robotics demonstrations from the exhibit hall.