From prototype to production: Vector databases in generative AI applications

Since the rise of ChatGPT, the general public has realized that generative artificial intelligence (GenAI) could potentially transform our lives. The availability of large language models (LLMs) has also changed how developers build AI-powered applications and has led to the emergence of various new developer tools. Although vector databases have been around long before ChatGPT, they have become an integral part of the GenAI technology stack, as vector databases can address some of LLMs’ key limitations, such as hallucinations and lack of long-term memory

This article first introduces vector databases and their use cases. Next, you will learn more about how vector databases are designed to help developers get started with building GenAI applications quickly. As a developer advocate at Weaviate, an open-source vector database, I will use Weaviate to demonstrate relevant concepts as we go along. In the final discussion, you will learn how they can address the challenges enterprises face when moving these prototypes to production.

Vector databases store and provide access to structured and unstructured data, such as text or images, alongside their vector embeddings. Vector embeddings are the data’s numerical representation as a long list of numbers that captures the original data object’s semantic meaning. Usually, machine learning models are used to generate the vector embeddings.

Because similar objects are close together in vector space, the similarity of data objects can be calculated based on the distance between the data object’s vector embeddings. This opens the door to a new type of search technique called vector search that retrieves objects based on similarity. In contrast to traditional keyword-based search, semantic search offers a more flexible way to search for items.

A three dimensional representation of a semantic vector space where the query, kitten, is close to cat, dog, and wolf, but far away from apple and banana.

While many traditional databases support storing vector embeddings to enable vector search, vector databases are AI-native, which means they are optimized to conduct lightning-fast vector searches at scale. Because vector search requires the calculation of the distances between the search query and every data object, a classical K-Nearest-Neighbor algorithm is computationally expensive. Vector databases use vector indexing to pre-calculate the distances to enable faster retrieval at query time. Thus, vector databases allow users to find and retrieve similar objects quickly at scale in production.

Traditionally, vector databases have been used in various applications in the search domain. However, with the rise of ChatGPT, it has become more apparent that vector databases can enhance LLMs’ capabilities.

Traditionally, vector databases are used to unlock natural-language searches. They enable semantic searches that are robust to different terminologies or even typos. Vector searches can be performed on and across any modalities, such as images, video, audio, or even their combinations. This, in turn, enables varied and powerful use cases for vector databases, even where traditional databases could not be used at all.

For example, vector databases are used in recommendation systems as a special use case of search. Also, Stack Overflow recently showcased how they used Weaviate to improve customer experiences with better search results.

With the rise of LLMs, vector databases have shown that they can enhance LLM capabilities by acting as an external memory. For example, enterprises use customized chatbots as a first line of customer support or as technical or financial assistants to improve customer experiences. But for a conversational AI to be successful, it needs to meet three criteria:

It needs to generate human language/reasoning.
It needs to be able to remember what was said earlier to hold a proper conversation.
It needs to be able to query factual information outside of the general knowledge.

An AI assistant dialogue that reads Hey, how can I help you? My printer is not printing... again. I understand. Do you see any error messages?

While general-purpose LLMs can cover the first criterion, they need support for the other two. This is where vector databases can come into play:

Give LLMs state: LLMs are stateless. That means that once an LLM is trained, its knowledge is frozen. Although you can fine-tune LLMs to extend their knowledge with further information, once the fine-tuning is done, the LLM is in a frozen state again. Vector databases can effectively give LLMs state because you can easily create and update the information in a vector database.
Act as an external knowledge database: LLMs, like GPT-3, generate confident-sounding answers independently of their factual accuracy. Especially if you move outside of the general knowledge into domain-specific areas where the relevant facts may not have been a part of the training data, they can start to “hallucinate” (a phenomenon where LLMs generate factually incorrect answers). To combat hallucinations, you can use a vector search engine to retrieve the relevant factual knowledge and pipe it into the LLM’s context window. This practice is known as retrieval-augmented generation (RAG) and helps LLMs generate factually accurate results.

Being able to rapidly prototype is important not only in a hackathon setting but to test out new ideas and derive faster decisions in any fast-paced environment. As an integral part of the technology stack, vector databases should help accelerate the development of GenAI applications. This section covers how vector databases enable developers to do rapid prototyping by addressing setup, vectorization, search, and results.

A map of items invloved in LLMs
Client contains Data which contains Text, Images, and Audio
Vector database contains Vectorization, Search, and Results
Vectorization contains Providers and Own Vectors
Search contains BM25, Semantic, and Hybrid
Results contains Q&A, Generate, and Reranking

In our example, we use Weaviate as it is simple to get started with and only requires a few lines of code (not to mention that we are very familiar with it).

To enable rapid prototyping, vector databases are usually easy to set up in a few lines of code. In this example, the setup consists of connecting your Weaviate client to your vector database instance. If you use embedding models or LLMs from providers, such as OpenAI, Cohere, or Hugging Face, you will provide your API key in this step to enable their integrations.

import weaviate

client = weaviate.Client(
  url = "<https://your-weaviate-endpoint>",
  additional_headers = {
    "X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY"
  }
)

Vector databases store and query vector embeddings that are generated from embedding models. That means data must be (manually or automatically) vectorized at import and query time. While you can use vector databases stand-alone (a.k.a. bring your own vectors), a vector database that enables rapid prototyping will take care of vectorization automatically so that you don’t have to write boilerplate code to vectorize your data and queries.

In this example, you define a data collection called MyCollection that provides the structure for your data within your vector database after the initial setup. In this step, you can configure further modules, such as a vectorizer that automatically vectorizes all data objects during import and query time (in this case, text2vec-openai). You can omit this line of code if you want to use the vector database standalone and provide your own vectors.

class_obj = {
  "class": "MyCollection",
  "vectorizer": "text2vec-openai",
}

client.schema.create_class(class_obj)

To populate the data collection MyCollection, import data objects in batches, as shown below. The data objects are vectorized automatically with the defined vectorizer.

client.batch.configure(batch_size=100) *# Configure batch

# Initialize batch process*
with client.batch as batch:
  batch.add_data_object(
    class_name="MyCollection",
    data_object={ "some_text_property": "foo", "some_number_property": 1 }
  )

The define vectorizer also vectorizes the query at search time, as shown in the next section.

The key use of vector databases is to enable semantic similarity search. In this example, once the vector database is set up and populated, you can retrieve data from it based on the similarity to the search query ("My query here"). If you defined a vectorizer in the previous step, it will also vectorize the query and retrieve data closest to it in the vector space.

response = (
  client.query
  .get("MyCollection", ["some_text"])
  .with_near_text({"concepts": ["My query here"]})
  .do()
)

However, lexical and semantic search are not mutually exclusive concepts. Vector databases also store the original data objects alongside their vector embeddings. This not only eliminates the need for a secondary database to host your original data objects but also enables keyword-based searches (BM25). The combination of keyword-based search and vector search as a hybrid search can improve search results. For example, Stack Overflow has implemented hybrid search with Weaviate to achieve better search results.

Because vector databases have become an integral part of the GenAI technology stack, they must be tightly integrated with the other components. For example, an integration between a vector database and an LLM will relieve developers from having to write separate pieces of boilerplate code to retrieve information from the vector database and then to feed it to the LLM. Instead, it will enable developers to do this in just a few lines of code.

For example, Weaviate’s modular ecosystem enables you to integrate state-of-the-art generative models from providers, such as OpenAI,Cohere, or Hugging Face, by defining a generative module (in this case, generative-openai). This enables you to extend the semantic search query (with the .with_generate() method) to a retrieval-augmented generative query. The .with_near_text() method first retrieves the relevant context for the property some_text, which is then used in the prompt "Summarize {some_text} in a tweet".

class_obj = {
  "class": "MyCollection",
  "vectorizer": "text2vec-openai",
  "moduleConfig": {
    "text2vec-openai": {},
    "generative-openai": {}
  }
}

# ...

response = (
  client.query
  .get("MyCollection", ["some_text"])
  .with_near_text({"concepts": ["My query here"]})
  .with_generate(
    single_prompt="Summarize {some_text} in a tweet."
  )
  .do()
)

Although it is easy to build impressive prototypes for GenAI applications, moving them to production comes with its own challenges regarding deployment and access management. This section discusses concepts that you need to take into consideration when moving GenAI solutions from prototype to production successfully.

While the amount of data in a prototype may not even require the search capabilities of a full-blown vector database, the amount of handled data can be drastically different in production. To anticipate the amount of data in production, vector databases must be able to scale into billions of data objects according to various needs, such as maximum ingestion, largest possible dataset size, maximum of queries per second, etc.

To enable lightning-fast vector searches at scale, vector databases use vector indexing. Vector indexing is what sets vector databases apart from other vector-capable databases that support vector search but are not optimized for it. For example, Weaviate uses hierarchical navigable small world (HNSW) algorithms for vector indexing in combination with product quantization on compressed vectors to unlock reduced memory usage and lightning-fast vector search even with filters. It typically performs nearest-neighbor searches of millions of objects in less than 100ms.

A vector database should be able to address different deployment requirements of various production environments. For example, Stack Overflow required a vector database that had to be open-source and not hosted so that it could be run on their existing Azure infrastructure.

To address such requirements, different vector databases come with different deployment options:

Managed services: Most vector databases offer fully managed data infrastructure. Additionally, some vector databases offer hybrid SaaS options to manage the data plane inside your own cloud environment.
Bring Your Own Cloud: Some vector databases are also available on different cloud marketplaces, allowing you to deploy the vector database cluster inside your own cloud environment.
Self-hosted: Many vector databases are also available as an open-source download, which runs on and scales via Kubernetes or Docker Compose.

While choosing the right deployment infrastructure is an essential part of ensuring data protection, access management, and resource isolation are as important to meet compliance regulations and ensure data protection. E.g., if user A uploads a document, only user A should be able to interact with it. Weaviate uses a multi-tenancy concept allows you to comply with regulatory data handling requirements (e.g., GDPR).

This article provides an overview of vector databases and their use cases. It highlights the importance of vector databases in improving search and enhancing LLM capabilities by giving them access to an external knowledge database to generate factually accurate results.

The article also showcases how vector databases can enable rapid prototyping of GenAI applications. Aside from an easy setup, vector databases can help developers if they handle vectorization of data automatically at import and query time, enable better search not only with vector search but in addition to keyword-based searches, and seamlessly integrate with other components of the technology stack. Additionally, the article discusses how vector databases can support enterprises in moving these prototypes to production by addressing concerns of scalability, deployment, and data protection.

If you are interested in using an open-source vector database for your GenAI application, head over to Weaviate’s Quickstart Guide and try it out yourself.

From prototype to production: Vector databases in generative AI applications

What are vector databases?

Use cases of vector databases

Natural-language search

Enhancing LLM capabilities

Prototyping with vector databases

Easy setup

Automatic vectorization

Enable better search

Integration with the technology stack

Considerations for vector databases in production

Horizontal scalability

Deployment

Data protection

Summary

Add to the discussion