NVIDIA CUDA-X Library · Open Source

GPU-Accelerated
Vector Search at Extreme Scale

NVIDIA cuVS is the world's fastest open-source library for vector similarity search and clustering on GPU. Power your RAG pipelines, recommender systems, and semantic search with 21× faster indexing and 29× higher throughput than CPU.

↗ Get Started on GitHub 📄 Official Docs

21×

Faster Indexing

GPU vs CPU (AWS A10g)

29×

Higher Throughput

H100 vs Xeon (10K batch)

12.5×

Lower Cost

Index build on cloud GPU

11×

Lower Latency

Single-query on H100

// What is cuVS

The GPU-Native Vector Search Engine

NVIDIA cuVS is an open-source library built on the CUDA software stack. It contains state-of-the-art implementations of approximate and exact nearest neighbor search, clustering, and dimensionality reduction — all optimized for GPU parallelism.

Python

C++

Rust

Java

⚡

Real-Time Index Updates

Dynamically integrate new embeddings without rebuilding the entire index — critical for live LLM and RAG pipelines.

🔄

CPU–GPU Interoperability

Build indexes on GPU, deploy and search on CPU. CAGRA graphs convert natively to HNSW for CPU serving.

🧠

Multi-Type Support

Binary, 8-bit, 16-bit, and 32-bit vector types. Memory-optimized for maximum throughput across hardware tiers.

📦

Out-of-Core Indexing

Build indexes larger than GPU memory. Lower costs per gigabyte with flexible GPU selection across cloud providers.

// Algorithms

State-of-the-Art ANN Algorithms

Each algorithm is performance-tuned for the latest NVIDIA GPU architectures, from Ampere to Hopper.

GPU-Native · Recommended

CAGRA

Compressed-Adjacency-Graph Retrieval Algorithm

NVIDIA's flagship graph-based ANN algorithm. Fastest index build times, record-breaking throughput. Can be converted to HNSW for CPU deployment. Backed by NeurIPS research.

Billion-Scale

IVF-PQ

Inverted File Index + Product Quantization

4–5× compression vs IVF-Flat. Ideal for billion-scale datasets where memory efficiency matters. 3–4× better than IVF-Flat on large batches due to smaller index size.

High Recall

IVF-Flat

Inverted File Index — Flat Storage

High-recall approximate search with no compression loss. The baseline for quality benchmarks. Excellent for moderate-scale, precision-critical workloads.

Exact Search

Brute-Force

Exact Nearest Neighbor Search

Guaranteed perfect recall. Used as ground truth in benchmarks and for smaller datasets where exhaustive search is feasible on GPU hardware.

Clustering

cuSLINK

Single-Linkage Agglomerative Clustering on GPU

Hierarchical clustering at GPU speed. Powers large-scale dendrogram construction for taxonomy discovery and data organization tasks.

Dimensionality

UMAP

Uniform Manifold Approximation & Projection

GPU-accelerated UMAP for visualization and dimensionality reduction. Used in production by Adoreboard, Studentpulse, and BERTopic for large-scale topic modeling.

// Performance

World's Fastest Vector Search

Benchmarks from official NVIDIA testing. GPU vs CPU across index build time, cost, throughput, and latency.

Index Build Time — 8× A10g vs Intel Ice Lake (AWS)

GPU (8× A10g) Minutes

CPU (Intel Ice Lake) Hours

Query Throughput (vectors/sec) — H100 vs Intel Xeon 8470Q

GPU (H100) — 10K batch 29× more

CPU (Intel Xeon) Baseline

Query Latency — H100 vs Intel Xeon 8470Q (single query)

GPU (H100) 11× lower

CPU (Intel Xeon) Baseline

Cost to Build Index — GPU vs CPU in AWS Cloud

GPU (8× A10g) 12.5× cheaper

CPU (Intel Ice Lake) Baseline

21×

Faster Indexing

29×

Higher Throughput

12.5×

Lower Cost

11×

Lower Latency

// Applications

Built for Every AI Workload

From genomics to e-commerce, cuVS powers the similarity search backbone of modern AI systems.

🤖

RAG Pipelines

Accelerate retrieval-augmented generation by finding relevant context vectors in milliseconds at billions-of-document scale.

🛍️

Recommender Systems

Real-time product and content recommendations using GPU-accelerated similarity search over user and item embeddings.

🔍

Semantic Search

Power meaning-based search across documents, images, code, and media. Replace keyword search with embedding-based retrieval.

🚨

Fraud Detection

Detect anomalous transactions by identifying outliers in high-dimensional feature spaces at real-time transaction speeds.

🧬

Single-Cell Genomics

rapids-singlecell uses cuVS + cuML for groundbreaking performance in cell type annotation and trajectory analysis.

📊

Topic Modeling

BERTopic on GPU via cuVS UMAP integration. Turn hours of embedding clustering into minutes for large-scale NLP.

🖼️

Multi-Modal Search

Unified search over images, text, audio, and video embeddings from large multi-modal foundation models.

⚡

Hybrid Search

Combine GPU vector search with full-text BM25 scoring. Apache Lucene + cuVS delivers 40× faster index builds.

// Quickstart

Up and Running in Minutes

Install via conda or pip, and run your first GPU-accelerated ANN search with just a few lines of Python.

Install cuVS

Install via conda: conda install -c rapidsai -c conda-forge cuvs

Prepare your vectors

Load your embedding dataset as a numpy or cupy array. cuVS supports float16, float32, int8, and binary types.

Build a CAGRA index

Call cagra.build() — index construction runs entirely on GPU. Hours become minutes.

Search at GPU speed

Run cagra.search() for approximate nearest neighbor queries. Achieve 29× higher throughput than CPU.

cagra_example.py

import numpy as np
from cuvs.neighbors import cagra
from pylibraft.common import DeviceResources

# Create GPU resource handle
handle = DeviceResources()

# 1M vectors, 128-dimensional embeddings
dataset = np.random.random(
  (1_000_000, 128)
).astype(np.float32)

# Build CAGRA index on GPU
index_params = cagra.IndexParams(
  metric="sqeuclidean",
  intermediate_graph_degree=64,
  graph_degree=32,
)
index = cagra.build(index_params, dataset)

# Search: find top-10 neighbors for 1000 queries
queries = np.random.random(
  (1000, 128)
).astype(np.float32)

search_params = cagra.SearchParams()
distances, neighbors = cagra.search(
  search_params, index, queries, k=10
)

# neighbors.shape → (1000, 10)
print(f"Found {neighbors.shape} results")

// Ecosystem

Powers the AI Search Stack

cuVS is integrated into the world's leading vector databases, search engines, and ML frameworks.

FAISS

Vector Library

12× faster

Milvus

Vector Database

22× faster

Weaviate

Vector Database

8× faster

Elasticsearch

Search Engine

12× faster

Apache Lucene

Search Library

40× faster

Apache Solr

Search Platform

6× faster

OpenSearch

Search Engine

9.4× faster

Kinetica

Analytics DB

Native

// Premium Domain Available

cuVS.ai

Own the exact-match premium domain for NVIDIA's fastest-growing GPU library.
Perfect for vector database companies, AI infrastructure teams, and NVIDIA ecosystem partners.

✓ Exact-match .ai domain

✓ Growing NVIDIA ecosystem

✓ High SEO authority potential

✓ Instant brand credibility

✓ Registered via Cloudflare

✓ Clean transfer history

📬 Inquire About Acquisition

[email protected]

Also listed on Afternic · Sedo · Dan.com

GPU-Accelerated Vector Search at Extreme Scale