⚡ This premium domain cuvs.ai is available for acquisition — Contact us to make an offer →
NVIDIA CUDA-X Library · Open Source

GPU-Accelerated
Vector Search at Extreme Scale

NVIDIA cuVS is the world's fastest open-source library for vector similarity search and clustering on GPU. Power your RAG pipelines, recommender systems, and semantic search with 21× faster indexing and 29× higher throughput than CPU.

21×
Faster Indexing
GPU vs CPU (AWS A10g)
29×
Higher Throughput
H100 vs Xeon (10K batch)
12.5×
Lower Cost
Index build on cloud GPU
11×
Lower Latency
Single-query on H100

The GPU-Native Vector Search Engine

NVIDIA cuVS is an open-source library built on the CUDA software stack. It contains state-of-the-art implementations of approximate and exact nearest neighbor search, clustering, and dimensionality reduction — all optimized for GPU parallelism.

Python
C++
C
Rust
Java
Go
Real-Time Index Updates
Dynamically integrate new embeddings without rebuilding the entire index — critical for live LLM and RAG pipelines.
🔄
CPU–GPU Interoperability
Build indexes on GPU, deploy and search on CPU. CAGRA graphs convert natively to HNSW for CPU serving.
🧠
Multi-Type Support
Binary, 8-bit, 16-bit, and 32-bit vector types. Memory-optimized for maximum throughput across hardware tiers.
📦
Out-of-Core Indexing
Build indexes larger than GPU memory. Lower costs per gigabyte with flexible GPU selection across cloud providers.

State-of-the-Art ANN Algorithms

Each algorithm is performance-tuned for the latest NVIDIA GPU architectures, from Ampere to Hopper.

Billion-Scale
IVF-PQ
Inverted File Index + Product Quantization
4–5× compression vs IVF-Flat. Ideal for billion-scale datasets where memory efficiency matters. 3–4× better than IVF-Flat on large batches due to smaller index size.
High Recall
IVF-Flat
Inverted File Index — Flat Storage
High-recall approximate search with no compression loss. The baseline for quality benchmarks. Excellent for moderate-scale, precision-critical workloads.
Exact Search
Brute-Force
Exact Nearest Neighbor Search
Guaranteed perfect recall. Used as ground truth in benchmarks and for smaller datasets where exhaustive search is feasible on GPU hardware.
Clustering
cuSLINK
Single-Linkage Agglomerative Clustering on GPU
Hierarchical clustering at GPU speed. Powers large-scale dendrogram construction for taxonomy discovery and data organization tasks.
Dimensionality
UMAP
Uniform Manifold Approximation & Projection
GPU-accelerated UMAP for visualization and dimensionality reduction. Used in production by Adoreboard, Studentpulse, and BERTopic for large-scale topic modeling.

World's Fastest Vector Search

Benchmarks from official NVIDIA testing. GPU vs CPU across index build time, cost, throughput, and latency.

Index Build Time — 8× A10g vs Intel Ice Lake (AWS)
GPU (8× A10g) Minutes
CPU (Intel Ice Lake) Hours
Query Throughput (vectors/sec) — H100 vs Intel Xeon 8470Q
GPU (H100) — 10K batch 29× more
CPU (Intel Xeon) Baseline
Query Latency — H100 vs Intel Xeon 8470Q (single query)
GPU (H100) 11× lower
CPU (Intel Xeon) Baseline
Cost to Build Index — GPU vs CPU in AWS Cloud
GPU (8× A10g) 12.5× cheaper
CPU (Intel Ice Lake) Baseline
21×
Faster Indexing
29×
Higher Throughput
12.5×
Lower Cost
11×
Lower Latency

Built for Every AI Workload

From genomics to e-commerce, cuVS powers the similarity search backbone of modern AI systems.

🤖
RAG Pipelines
Accelerate retrieval-augmented generation by finding relevant context vectors in milliseconds at billions-of-document scale.
🛍️
Recommender Systems
Real-time product and content recommendations using GPU-accelerated similarity search over user and item embeddings.
🔍
Semantic Search
Power meaning-based search across documents, images, code, and media. Replace keyword search with embedding-based retrieval.
🚨
Fraud Detection
Detect anomalous transactions by identifying outliers in high-dimensional feature spaces at real-time transaction speeds.
🧬
Single-Cell Genomics
rapids-singlecell uses cuVS + cuML for groundbreaking performance in cell type annotation and trajectory analysis.
📊
Topic Modeling
BERTopic on GPU via cuVS UMAP integration. Turn hours of embedding clustering into minutes for large-scale NLP.
🖼️
Multi-Modal Search
Unified search over images, text, audio, and video embeddings from large multi-modal foundation models.
Hybrid Search
Combine GPU vector search with full-text BM25 scoring. Apache Lucene + cuVS delivers 40× faster index builds.

Up and Running in Minutes

Install via conda or pip, and run your first GPU-accelerated ANN search with just a few lines of Python.

1
Install cuVS
Install via conda: conda install -c rapidsai -c conda-forge cuvs
2
Prepare your vectors
Load your embedding dataset as a numpy or cupy array. cuVS supports float16, float32, int8, and binary types.
3
Build a CAGRA index
Call cagra.build() — index construction runs entirely on GPU. Hours become minutes.
4
Search at GPU speed
Run cagra.search() for approximate nearest neighbor queries. Achieve 29× higher throughput than CPU.
cagra_example.py
import numpy as np
from cuvs.neighbors import cagra
from pylibraft.common import DeviceResources

# Create GPU resource handle
handle = DeviceResources()

# 1M vectors, 128-dimensional embeddings
dataset = np.random.random(
  (1_000_000, 128)
).astype(np.float32)

# Build CAGRA index on GPU
index_params = cagra.IndexParams(
  metric="sqeuclidean",
  intermediate_graph_degree=64,
  graph_degree=32,
)
index = cagra.build(index_params, dataset)

# Search: find top-10 neighbors for 1000 queries
queries = np.random.random(
  (1000, 128)
).astype(np.float32)

search_params = cagra.SearchParams()
distances, neighbors = cagra.search(
  search_params, index, queries, k=10
)

# neighbors.shape → (1000, 10)
print(f"Found {neighbors.shape} results")

Powers the AI Search Stack

cuVS is integrated into the world's leading vector databases, search engines, and ML frameworks.

FAISS
Vector Library
12× faster
Milvus
Vector Database
22× faster
Weaviate
Vector Database
8× faster
Elasticsearch
Search Engine
12× faster
Apache Lucene
Search Library
40× faster
Apache Solr
Search Platform
6× faster
OpenSearch
Search Engine
9.4× faster
Kinetica
Analytics DB
Native
cuVS.ai

Own the exact-match premium domain for NVIDIA's fastest-growing GPU library.
Perfect for vector database companies, AI infrastructure teams, and NVIDIA ecosystem partners.

Exact-match .ai domain
Growing NVIDIA ecosystem
High SEO authority potential
Instant brand credibility
Registered via Cloudflare
Clean transfer history
📬  Inquire About Acquisition

[email protected]

Also listed on Afternic · Sedo · Dan.com

Learn More

Everything you need to get started with cuVS — from notebooks to research papers.