Chris McKenzie

Jun 25, 2025 • 7 min read

Setting Up Your First ChromaDB Server

Guide for JavaScript Engineers who want to start using vector databases

Setting Up Your First ChromaDB Server

Have you ever wondered how Spotify suggests songs that you might like? Or how Netflix knows which movies to recommend? Enter Vector Databases. One such database is ChromaDB.

Background

ChromaDB offers JavaScript developers a concise API for a powerful vector database. It prioritizes productivity and simplicity, allowing the storage of embeddings with their relevant metadata. The database, written in Python, has an intuitive and robust JavaScript client library for seamless document embedding and querying.

Embeddings

If you’re not familiar with the topic of embeddings, I highly recommend you learn more about them before diving into ChromaDB.

TL;DR

Embeddings convert data into fixed-size vectors, preserving its semantic meaning. These vectors can capture intricate relationships, making them pivotal for machine learning tasks such as search or recommendations. Using embeddings, the words “dog” and “puppy” might be translated into similar numerical arrays, allowing systems to recognize their semantic closeness.

For more on embeddings I recommend the following resources:

Prerequisites

Setup Server

For this, we’re just going to use a locally hosted server. However, you can host ChromaDB on AWS (or other cloud providers) by following their docs.

The easiest way to install the ChromaDB server is with pip however, if you don’t feel comfortable with that, you can install a server with Docker.

pip install chroma

Run Server

Let’s start the server!

chroma run --host localhost --port 8000

That’s it! You should now have a ChromaDB server running locally. You can verify this by visiting http://localhost:8000/ in your browser. You should see a response like this:

{"detail":"Not Found"}

If you don’t see this, check the logs or visit the troubleshooting page.

Setup Client

For this guide, I’ve created a starting point to avoid having to setup the boilerplate. Clone this repo and install dependencies.

I suggest you clone the repo to a different folder than the server to avoid confusion.

git clone [email protected]:kenzic/chromadb-demo.git
cd chromadb-demo
git fetch --all --tags
git checkout tags/basic-demo -b sandbox

Install dependencies:

yarn add chromadb openai

Great! Now, let’s start adding code to upload.js

We’ll start by importing the ChromaDB client and creating a new instance.

import { ChromaClient } from 'chromadb'
const client = new ChromaClient();

Next we’ll create a new collection with getOrCreateCollection. A collection is a group of embeddings. For example, you might have a collection of product embeddings and another collection of user embeddings.

getOrCreateCollection takes a name, and an optional embeddingFunction.

  • name must: contain valid URL characters, between 3 and 63 characters, unique, cannot have two consecutive dots, not be an ip address, and start and end with lowercase letter or digit.

  • If you provide an embeddingFunction you will need to supply that every time you get the collection.

To create a collection you can call the method createCollection on the client, but we’re going to use getOrCreateCollection instead. This will create the collection if it doesn’t exist, or return the existing collection if it does.

We will also use OpenAI’s Embedding API. To do so, you’ll need an API key, which you can obtain here.

import { OpenAIEmbeddingFunction } from 'chromadb'
const embedder = new OpenAIEmbeddingFunction({openai_api_key: "apiKey", openai_model: "text-embedding-3-small"})


async function main() {
  const collection = await client.getOrCreateCollection({
    name: "nasaArticles",
    embeddingFunction: embedder
  });
}

Add Documents

Photo by NASA on Unsplash

One of the features that make ChromaDB easy to use is you can add your documents directly to the database, and ChromaDB will handle the embedding for you. A document is just plain text that you want to store and vectorize for later retrieval. Included in the repo are 5 articles from NASA’s Blog for our demo data.

// add data import
import data from "./data";

// update main
async function main() {
  const embedder = new OpenAIEmbeddingFunction({openai_api_key: "apiKey", openai_model: "text-embedding-3-small"});
  const collection = await client.getOrCreateCollection({
    name: "nasaArticles",
    embeddingFunction: embedder
  });

  // add the following:
  const ids = [];
  const documents = [];
  const metadatas = [];
  data.forEach((article) => {
      ids.push(article.id);
      documents.push(article.document);
      metadatas.push({
        title: article.title,
        url: article.url
      });
  });

  // Add documents to collection
  await collection.add({
      ids,
      documents,
      metadatas
  });

  console.log("Uploaded!");
}
npx babel-node src/upload.js
> Uploaded!

If you already have embeddings you can store those directly by including the embeddings option

Now that we have our documents added, let’s query them!

Query Documents

Add the following to query.js:

import { ChromaClient, OpenAIEmbeddingFunction } from 'chromadb'

const client = new ChromaClient();

const embedder = new OpenAIEmbeddingFunction({openai_api_key: "apiKey", openai_model: "text-embedding-3-small"})

async function main() {
  const collection = await client.getCollection({
    name: "nasaArticles",
    embeddingFunction: embedder
  });
}

Next we’ll create a query. A query is just a document that we want to find similar documents to. For this we’ll use the first article in our dataset.

// add to `main` function just under `const collection = ...`
const results = await collection.query({
  nResults: 1,
  queryTexts: ["What's happening on the space station?"]
});
console.log(JSON.stringify(results, null, 2));

Before running this, let’s take a look at the options we’re passing to query:

  • nResults: This is the number of results we want to return. In this case we’re asking for 2.

  • queryTexts: This is an array of documents we want to find similar documents to. In this case we’re only passing one document.

Now let’s run the query:

npx babel-node src/query.js

Nice! The result is the most similar document to our query. Change the query to see how it changes the results.

Three important fields to note:

  • distances: This is the distance between the query and the result. The lower the distance the more similar the result is to the query.

  • documents: This is the document whose embedded representation is closest to the query.

  • embeddings: This is null by default. Embeddings are large and can be expensive to return. If you want to return embeddings you’ll need to add embeddings to your list of fields to include in includes

Final Thoughts

This was a very basic guide to setting up your first ChromaDB server and client. There are many more features that ChromaDB offers. I highly recommend you check out the docs to learn more. Things you’ll want to check out:

  • Embedding Functions — ChromaDB supports a number of different embedding functions, including OpenAI’s API, Cohere, Google PaLM, and Custom Embedding Functions.

  • Collections — There are a lot of methods and options for collections we didn’t cover. Before building your first app I recommend spending some time here.

  • Querying — Querying in ChromaDB is much more powerful than what we covered here. You can filter by metadata, and document content, query embeddings directly with pregenerated embeddings, as well as which fields are included in the result.

We’ve just started to scratch the surface. Now, it’s your turn. Set up your own ChromaDB server, experiment with its capabilities, and share your experiences in the comments.

I will follow up this guide with a more in-depth Youtube Search engine and recommendation system.


To stay connected and share your journey, feel free to reach out through the following channels:

  • 👨‍💼 LinkedIn: Join me for more insights into AI development and tech innovations.

  • 🤖 JavaScript + AI: Join the JavaScript and AI group and share what you’re working on.

  • 💻 GitHub: Explore my projects and contribute to ongoing work.

  • 📚 Medium: Follow my articles for more in-depth discussions on LangSmith, LangChain, and other AI technologies.

Join Chris on Peerlist!

Join amazing folks like Chris and thousands of other builders on Peerlist.

peerlist.io/

It’s available... this username is available! 😃

Claim your username before it's too late!

This username is already taken, you’re a little late.😐

0

9

0