Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions nosql-create-index-java/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Azure Cosmos DB for NoSQL
AZURE_COSMOSDB_ENDPOINT="https://<your-account>.documents.azure.com:443/"
AZURE_COSMOSDB_DATABASENAME="Hotels"
AZURE_COSMOSDB_CONTAINER_NAME=""

# Azure OpenAI embedding configuration
AZURE_OPENAI_EMBEDDING_ENDPOINT="https://<your-openai-resource>.openai.azure.com/"
AZURE_OPENAI_EMBEDDING_DEPLOYMENT="text-embedding-3-small"
AZURE_OPENAI_EMBEDDING_API_VERSION="2024-08-01-preview"

# Vector query selection
# Set diskann or quantizedflat. Leave empty to run both containers.
VECTOR_ALGORITHM=""

# Shared repo-root dataset, referenced relative to this sample folder
DATA_FILE_WITH_VECTORS="..\data\HotelsData_toCosmosDB_Vector.json"
98 changes: 98 additions & 0 deletions nosql-create-index-java/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Azure Cosmos DB for NoSQL create-index sample with Java

This sample shows how to load pre-vectorized hotel documents into existing Azure Cosmos DB for NoSQL containers and run vector similarity queries with Java.

It uses:
- `DefaultAzureCredential` for Azure Cosmos DB and the Azure OpenAI client
- existing `Hotels` database resources created by `azd up`
- the shared `..\data\HotelsData_toCosmosDB_Vector.json` dataset
- bulk upsert operations for `hotels_diskann` and `hotels_quantizedflat`
- `VectorDistance()` SQL queries for similarity search

> [!IMPORTANT]
> This sample is data-plane only. It does not create databases, containers, or vector indexes. Run `azd up` from the repo root before you run this sample.

## Prerequisites

- Java 17 LTS or later
- Maven 3.9 or later
- Azure CLI installed and signed in with `az login`
- Azure resources already provisioned by `azd up`
- Microsoft Entra ID roles:
- **Cosmos DB Built-in Data Contributor**
- **Cognitive Services OpenAI User**

The sample expects these existing containers in the `Hotels` database:
- `hotels_diskann`
- `hotels_quantizedflat`

## Set up the sample

1. Copy the environment template.

```powershell
Copy-Item sample.env .env
```

2. Update `.env` with your Azure Cosmos DB endpoint and Azure OpenAI settings.

Notes:
- Leave `AZURE_COSMOSDB_CONTAINER_NAME` empty to run all supported containers.
- Leave `VECTOR_ALGORITHM` empty to run both algorithms.
- Set `VECTOR_ALGORITHM` to `diskann` or `quantizedflat` to run one algorithm.
- Set `AZURE_COSMOSDB_CONTAINER_NAME` only if you want to target one container directly.

3. Build the project.

```powershell
mvn compile
```

## Run the sample

Run from this directory:

```powershell
mvn exec:java
```

Examples:

```powershell
# Run both containers (default)
mvn exec:java

# Run only DiskANN
$env:VECTOR_ALGORITHM = 'diskann'
mvn exec:java

# Run only QuantizedFlat
$env:VECTOR_ALGORITHM = 'quantizedflat'
mvn exec:java
```

## Expected output

The sample prints:
- configuration and target container selection
- embedding dimension verification for `text-embedding-3-small`
- bulk ingestion status for each container
- top vector matches from each queried container

See `output/sample-output.txt` for example console output.

## Project structure

```text
nosql-create-index-java/
├── .env.example
├── output/
│ └── sample-output.txt
├── pom.xml
├── README.md
├── sample.env
└── src/main/java/com/azure/cosmos/createindex/
├── App.java
├── Config.java
└── DataPlane.java
```
39 changes: 39 additions & 0 deletions nosql-create-index-java/output/sample-output.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
========================================================================
Azure Cosmos DB for NoSQL - create and query vector indexes with Java
========================================================================
Database: Hotels
Data file: C:\project-dina-ai-dev-tools\repos\public-azure-samples-cosmos-db-vector-samples\data\HotelsData_toCosmosDB_Vector.json
Target containers: hotels_diskann, hotels_quantizedflat

=== Verify embedding dimensions ===
Deployment: text-embedding-3-small
Actual: 1536
Expected: 1536

=== Ingest documents: hotels_diskann ===
Upserted 50/50 documents using bulk operations. RU: 6812.47

=== Ingest documents: hotels_quantizedflat ===
Upserted 50/50 documents using bulk operations. RU: 6810.92

Query text: hotel near the ocean

=== Query results: hotels_diskann (DiskANN) ===
Request charge: 5.33 RUs
1. HotelId=11 | HotelName=Royal Cottage Resort | score=0.4991 | Description=Your home away from home. Brand new fully equipped premium rooms, fast WiFi, full kitchen, washer & dryer...
2. HotelId=47 | HotelName=Country Comfort Inn | score=0.4786 | Description=Situated conveniently at the north end of the village, the inn is just a short walk from the lake...
3. HotelId=48 | HotelName=Nordick's Valley Motel | score=0.4635 | Description=Only 90 miles from the nation's capital and nearby most everything the historic valley has to offer...
4. HotelId=19 | HotelName=Economy Universe Motel | score=0.4461 | Description=Local, family-run hotel in bustling downtown Redmond. We are a pet-friendly establishment...
5. HotelId=7 | HotelName=Roach Motel | score=0.4388 | Description=Perfect Location on Main Street. Earn points while enjoying close proximity to the city's best shopping...

=== Query results: hotels_quantizedflat (QuantizedFlat) ===
Request charge: 5.35 RUs
1. HotelId=11 | HotelName=Royal Cottage Resort | score=0.4991 | Description=Your home away from home. Brand new fully equipped premium rooms, fast WiFi, full kitchen, washer & dryer...
2. HotelId=47 | HotelName=Country Comfort Inn | score=0.4786 | Description=Situated conveniently at the north end of the village, the inn is just a short walk from the lake...
3. HotelId=48 | HotelName=Nordick's Valley Motel | score=0.4635 | Description=Only 90 miles from the nation's capital and nearby most everything the historic valley has to offer...
4. HotelId=19 | HotelName=Economy Universe Motel | score=0.4461 | Description=Local, family-run hotel in bustling downtown Redmond. We are a pet-friendly establishment...
5. HotelId=7 | HotelName=Roach Motel | score=0.4388 | Description=Perfect Location on Main Street. Earn points while enjoying close proximity to the city's best shopping...

========================================================================
Complete
========================================================================
80 changes: 80 additions & 0 deletions nosql-create-index-java/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>com.azure.samples.cosmosdb</groupId>
<artifactId>nosql-create-index-java</artifactId>
<version>1.0.0</version>
<name>Azure Cosmos DB NoSQL Create Index - Java</name>

<properties>
<maven.compiler.release>17</maven.compiler.release>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-sdk-bom</artifactId>
<version>1.2.23</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>

<dependencies>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-identity</artifactId>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-openai</artifactId>
<version>1.0.0-beta.16</version>
</dependency>
<dependency>
<groupId>io.github.cdimascio</groupId>
<artifactId>dotenv-java</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.18.2</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-nop</artifactId>
<version>2.0.17</version>
<scope>runtime</scope>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.14.0</version>
<configuration>
<release>${maven.compiler.release}</release>
</configuration>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>3.5.0</version>
<configuration>
<mainClass>com.azure.cosmos.createindex.App</mainClass>
</configuration>
</plugin>
</plugins>
</build>
</project>
140 changes: 140 additions & 0 deletions nosql-create-index-java/quickstart-create-index-java.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
title: Quickstart: Create and query vector indexes in Azure Cosmos DB for NoSQL using Java
description: Use Java and the Azure SDK to load pre-vectorized hotel data into existing Azure Cosmos DB for NoSQL vector containers and run similarity queries with Azure OpenAI embeddings.
author: diberry
ms.author: diberry
ms.service: azure-cosmos-db
ms.topic: quickstart
ms.date: 2026-06-08
---

# Quickstart: Create and query vector indexes in Azure Cosmos DB for NoSQL using Java

In this quickstart, you use the Java sample in `Azure-Samples/cosmos-db-vector-samples` to load pre-vectorized hotel documents into existing Azure Cosmos DB for NoSQL containers and run vector similarity queries. The sample uses `DefaultAzureCredential` for Azure Cosmos DB and the Azure OpenAI client, so you don't need API keys.

The sample is data-plane only. It assumes `azd up` already created the database, the `hotels_diskann` container, and the `hotels_quantizedflat` container with vector policies and indexes.

Find the sample code on GitHub: `nosql-create-index-java/` in `Azure-Samples/cosmos-db-vector-samples`.

## Prerequisites

- An Azure subscription. If you don't have one, create a [free account](https://azure.microsoft.com/free/).
- An Azure Cosmos DB for NoSQL account provisioned by the sample repo's Bicep templates:
- Vector search enabled
- Serverless enabled
- `Hotels` database created
- `hotels_diskann` and `hotels_quantizedflat` containers created with `/HotelId` as the partition key path
- Microsoft Entra ID role assignments for your identity:
- **Cosmos DB Built-in Data Contributor**
- **Cognitive Services OpenAI User**
- An Azure OpenAI resource with a `text-embedding-3-small` deployment.
- [Java 17 LTS](https://learn.microsoft.com/java/openjdk/download)
- [Apache Maven 3.9](https://maven.apache.org/download.cgi) or later
- [!INCLUDE [Azure CLI](~/reusable-content/azure-cli/azure-cli-prepare-your-environment-no-header.md)]

## Clone the repository

```bash
git clone https://github.com/Azure-Samples/cosmos-db-vector-samples.git
cd cosmos-db-vector-samples/nosql-create-index-java
```

## Understand what the sample does

Azure Cosmos DB for NoSQL follows an infra-first pattern for vector indexes:

| Layer | Tool | Responsibility |
|---|---|---|
| Provisioning | `azd up` + Bicep | Creates the Azure Cosmos DB account, database, containers, vector policies, and RBAC |
| Runtime | Java sample | Loads documents, generates a query embedding, and runs `VectorDistance()` queries |

The Java code does **not** create containers or indexes. Vector indexes for Azure Cosmos DB for NoSQL are provisioned when the containers are created.

## Configure environment variables

1. Copy the template file.

```powershell
Copy-Item sample.env .env
```

1. Update `.env` with your Azure resource values.

```dotenv
AZURE_COSMOSDB_ENDPOINT="https://<your-account>.documents.azure.com:443/"
AZURE_COSMOSDB_DATABASENAME="Hotels"
AZURE_COSMOSDB_CONTAINER_NAME=""
AZURE_OPENAI_EMBEDDING_ENDPOINT="https://<your-openai-resource>.openai.azure.com/"
AZURE_OPENAI_EMBEDDING_DEPLOYMENT="text-embedding-3-small"
AZURE_OPENAI_EMBEDDING_API_VERSION="2024-08-01-preview"
VECTOR_ALGORITHM=""
DATA_FILE_WITH_VECTORS="..\\data\\HotelsData_toCosmosDB_Vector.json"
```

Leave `AZURE_COSMOSDB_CONTAINER_NAME` and `VECTOR_ALGORITHM` empty to run both containers. Set `VECTOR_ALGORITHM` to `diskann` or `quantizedflat` if you want to target one algorithm.

## Build and run the sample

Compile the sample:

```powershell
mvn compile
```

Run it:

```powershell
mvn exec:java
```

The sample performs these steps:

1. Loads configuration from `.env` and validates required values.
1. Creates one `DefaultAzureCredential` and passes it directly to `CosmosClient`.
1. Reads `..\data\HotelsData_toCosmosDB_Vector.json`.
1. Bulk-upserts documents into `hotels_diskann` and `hotels_quantizedflat`.
1. Uses the Azure OpenAI client to generate a query embedding.
1. Executes a parameterized `VectorDistance()` query and prints the top matches.

## Review the Java project structure

```text
nosql-create-index-java/
├── .env.example
├── output/
│ └── sample-output.txt
├── pom.xml
├── README.md
├── sample.env
└── src/main/java/com/azure/cosmos/createindex/
├── App.java
├── Config.java
└── DataPlane.java
```

### App.java

`App.java` orchestrates the sample. It loads configuration, creates the shared credential, verifies embedding dimensions, ingests the hotel dataset, and runs vector queries for each target container.

### Config.java

`Config.java` loads environment variables from the shell or `.env`, resolves the shared dataset path, and maps `VECTOR_ALGORITHM` values to the existing container names.

### DataPlane.java

`DataPlane.java` contains the Azure Cosmos DB and Azure OpenAI client factories plus the data-plane operations:

- bulk upsert using `executeBulkOperations()`
- embedding generation with `EmbeddingsOptions`
- field-name validation before interpolating the embedding field into `VectorDistance()`
- parameterized SQL queries for the embedding vector and `TOP` value

## Expected output

The sample prints embedding validation, ingestion status, and query results for each container. A representative output file is included in `output/sample-output.txt`.

## Next steps

- Learn more about [Azure Cosmos DB for NoSQL vector search](/azure/cosmos-db/nosql/vector-search).
- Review the full sample repo for other languages and scenarios.
- If you haven't provisioned the shared infrastructure yet, run `azd up` from the repo root before rerunning the Java sample.
16 changes: 16 additions & 0 deletions nosql-create-index-java/sample.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Azure Cosmos DB for NoSQL
AZURE_COSMOSDB_ENDPOINT="https://<your-account>.documents.azure.com:443/"
AZURE_COSMOSDB_DATABASENAME="Hotels"
AZURE_COSMOSDB_CONTAINER_NAME=""

# Azure OpenAI embedding configuration
AZURE_OPENAI_EMBEDDING_ENDPOINT="https://<your-openai-resource>.openai.azure.com/"
AZURE_OPENAI_EMBEDDING_DEPLOYMENT="text-embedding-3-small"
AZURE_OPENAI_EMBEDDING_API_VERSION="2024-08-01-preview"

# Vector query selection
# Set diskann or quantizedflat. Leave empty to run both containers.
VECTOR_ALGORITHM=""

# Shared repo-root dataset, referenced relative to this sample folder
DATA_FILE_WITH_VECTORS="..\data\HotelsData_toCosmosDB_Vector.json"
Loading