Turn Your Internal Docs into an AI-Powered Knowledge Base

Zusammenfassung

RAG makes documentation searchable by combining semantic retrieval with an LLM that answers using only the retrieved document context.
The process has three main steps: chunk and index documents, retrieve the most relevant chunks with vector search, then generate a grounded answer from those chunks.
Dense vector search works by turning text into embeddings so semantically similar content can be found even when the wording is different.
This approach is useful for technical docs, support knowledge bases, compliance search, onboarding, transcript search, and product catalogs.
The main benefit is fast, grounded answers from your own content, especially when the whole pipeline runs on your own infrastructure for privacy and control.

The Documentation Problem

Every organization with a product, a process, or a compliance requirement faces the same challenge: knowledge exists, but it is not accessible. A support engineer on a call cannot search through 400 pages of documentation in real time. A new employee cannot absorb three years of internal guides in their first week. A customer cannot find the specific configuration setting buried on page 247 of a PDF.

Traditional search helps—but it requires exact keywords. Ask “How do I fix error 12” when the document says “status code 12: MicroKernel cannot find the specified file,” and the keyword search fails. You need semantic search—search that understands meaning, not just words.

Key insight: The answer already exists in your documentation. The challenge is making it findable—instantly, accurately, and without hallucination.

How RAG Works

Retrieval-Augmented Generation (RAG) separates two problems that LLMs struggle to solve: knowing your specific content, and reasoning about it. RAG handles the knowledge problem with retrieval. The LLM handles the reasoning problem with generation.

The three-step pipeline:

Index your documents. Documents are split into chunks. Each chunk is converted to a vector embedding by a language model, and both the vector and the original text are stored in a vector database.
Retrieve relevant context. When a user asks a question, the question is embedded using the same model. The vector database finds the most semantically similar document chunks, regardless of exact wording.
Generate a grounded answer. The top matching chunks are passed to an LLM as context. The model reads them and answers only from what it was given. No hallucination—just reasoning over your actual documents.

Privacy advantage: With Actian VectorAI and Ollama, the entire pipeline runs on your own servers. Your documents, queries, and answers never leave your infrastructure.

What is Dense Vector Search?

When an embedding model processes text, it converts it to a list of numbers—a vector—where the position in high-dimensional space encodes semantic meaning. Text with similar meaning ends up at similar positions. This is the foundation of dense search.

The nomic-embed-text, the model used in this guide, converts any text to 768 numbers. These dimensions are not human-readable. They are learned representations that emerge from training on billions of text examples. The model learns that “cannot find file” and “file not found” should produce similar vectors, even though they share none of the same words.

At search time, the query vector is compared against every stored document vector using cosine similarity—a measure of the angle between two vectors. Smaller angle = higher similarity = more relevant result. The HNSW index makes this comparison fast, even across millions of vectors.

Input	Similarity
Query: “fix status 12”	— (query vector)
Doc: “12: MicroKernel cannot find…”	0.94 — very relevant
Doc: “Crystal Reports for Zen…”	0.12 — not relevant

System Architecture

The dense RAG pipeline has two phases. The indexing phase runs once, or on a schedule when documents change. The query phase runs on every user request, typically completing in under 200ms.

system-architecture-vectorai-db

Notice the critical detail: the same embedding model is used for both indexing and querying. If you index with nomic-embed-text, you must query with nomic-embed-text. The vectors only make sense relative to the model that created them. Swapping models requires a full reindex.

How VectorAI Stores Your Data

Actian VectorAI DB stores vectors in a high-performance columnar storage engine using FAISS for indexing. Each vector is stored as 32-bit floating point numbers—the industry standard for ML embeddings. At 768 dimensions, that is exactly 3,088 bytes per vector, confirmed from the engine log:

# Confirmed from vde.log on a production deployment
[VectorStore::CreateFile] Created file: vectors.db
(dim=768, record_len=3088)

# The math:
768 dimensions x 4 bytes (float32) = 3,072 bytes
+ 16 bytes overhead
= 3,088 bytes per vector  (confirmed)

Storage facts—verified from production logs:

Parameter	Wert	Anmerkungen
Index algorithm	FAISS HNSW	Facebook AI Similarity Search
Storage backend	Vector DB files	Proven enterprise storage
Vector format	float32 (4 bytes)	Industry standard for ML
Record size (dim=768)	3,088 bytes	768 x 4 + 16 overhead
Record size (dim=384)	1,552 bytes	384 x 4 + 16 overhead
Segment limit	2 GB per segment	Auto-extends to new segments
File size limit	Keine	Limited only by disk space
HNSW efSearch	64	Candidates examined per query
HNSW efConstruct	200	Candidates during index build
HNSW M	32	Connections per node

The storage engine splits files into 2 GB segments automatically, so there is no practical vector count limit. Storage capacity is determined entirely by available disk space. At 3,088 bytes per vector, one terabyte of storage holds approximately 340 million vectors.

Real-World Use Cases

Technical Documentation Search. Index product manuals, API references, and configuration guides. Users ask questions in plain language and get answers with exact document citations and page references.

Support Knowledge Base. Index historical support tickets and resolutions. When a new case arrives, surface similar past cases and their solutions automatically, reducing resolution time significantly.

Compliance and Policy Search. Make legal documents and compliance policies instantly searchable. Always cites the specific clause or section that applies to the user’s question.

Employee Onboarding. New hires ask questions about HR policies, processes, and tools. The system answers from your actual internal documentation—personalised, accurate, and always up to date.

Video and Audio Search. Transcribe training videos and meetings with Whisper, index the transcripts, and search spoken content by meaning, with deep links to the exact timestamp.

Product Catalogue. Index product specifications, compatibility matrices, and release notes. Sales teams get instant, accurate answers during customer calls without leaving the conversation.

Building the Pipeline

Step 1 — Chunk Your Documents

Character-based chunking at 1,000 characters with 100-character overlap works well for technical documentation. The overlap ensures sentences are never cut off at a boundary:

def chunk_text(text: str, chunk_size=1000, overlap=100) -> list:
"""Split text into overlapping character-based chunks."""
chunks, i = [], 0
while i < len(text):
chunk = text[i : i + chunk_size].strip()
if len(chunk) > 80:     # skip near-empty chunks
chunks.append(chunk)
i += chunk_size - overlap
return chunks

Step 2 — Create a Collection and Index

from actian_vectorai import VectorAIClient, VectorParams, Distance, PointStruct
import requests
def embed(text: str) -> list:
r = requests.post("http://localhost:11434/api/embeddings",
json={"model": "nomic-embed-text", "prompt": text})
return r.json()["embedding"]
with VectorAIClient("192.168.x.x:6574") as client:
client.collections.create(
"my_docs",
vectors_config=VectorParams(size=768, distance=Distance.Cosine)
)
for i, chunk in enumerate(chunks):
client.points.upsert("my_docs", [PointStruct(
id     = i,
vector = embed(chunk),
payload = {"source": "doc.pdf", "chunk": i, "text": chunk}
)])

The Complete RAG Query Function

import requests
from actian_vectorai import VectorAIClient

VECTOR_SERVER = "192.168.x.x:6574"
OLLAMA        = "http://localhost:11434"
COLLECTION    = "my_docs"

def rag_query(question: str, top_n: int = 5) -> dict:
# Step 1: Embed the question
query_vec = embed(question)

# Step 2: Search VectorAI DB
with VectorAIClient(VECTOR_SERVER) as c:
results = c.points.search(
COLLECTION, vector=query_vec,
limit=top_n, with_payload=True,
)

# Step 3: Build context from retrieved chunks
context = "\n\n---\n\n".join(
r.payload["text"] for r in results)

# Step 4: Ask the LLM with grounded context
prompt = f"""You are a knowledgeable assistant.
Answer using ONLY the documentation excerpts below.
Documentation: {context}
Question: {question}
Answer:"""

resp = requests.post(f"{OLLAMA}/api/chat", json={
"model": "llama3:8b", "stream": False,
"options": {"temperature": 0.1, "num_predict": 1024},
"messages": [{"role": "user", "content": prompt}]
})
answer = resp.json()["message"]["content"].strip()

# Step 5: Return answer + citations
sources = list({r.payload["source"] for r in results})
return {"answer": answer, "sources": sources}

Why temperature=0.1? For factual Q&A, you want deterministic, accurate answers. Low temperature keeps the LLM focused on what the documentation actually says rather than extrapolating or embellishing.

Key Stats — Production Deployment

Metrisch	Wert
Embedding dimensions	768
Bytes per vector	3,088
Index algorithm	HNSW (FAISS)
Embedding model	nomic-embed-text (Ollama)
LLM	llama3:8b (Ollama)
gRPC port	6574
Infrastruktur	100% on-premise

Answers Grounded in Your Content

Dense vector RAG with Actian VectorAI DB turns static documentation into an instantly queryable knowledge base. The pipeline is simple: chunk your documents, embed them with nomic-embed-text, store them in VectorAI’s FAISS-HNSW index, and let the LLM answer from whatever context the search retrieves.

Because everything runs on your own infrastructure, your documents never leave your network. Every answer is grounded in your actual content, not in what a model learned during training.

Find out more about Actian VectorAI DB.

Built with Actian VectorAI DB · FAISS-HNSW · High-Performance Storage · Ollama · llama3:8b
All queries run 100% on-premises. No document, query, or answer leaves your infrastructure.

Über den Autor

Über Johnson Varughese

Johnson Varughese manages Support Engineering at Actian, assisting developers leveraging Actian data interfaces (ODBC, JDBC, ADO.NET, and more). As a Google Certified Data Practitioner and Generative AI Leader, he provides technical guidance and troubleshooting expertise to ensure robust application performance across different programming environments. Johnson's wealth of knowledge in data access interfaces has streamlined numerous development projects. His Actian blog entries detail best practices for integrating Actian’s database engines and interfaces. Explore his articles to optimize your database-driven applications.

Actian Data Intelligence Platform Neu

Zentrale Funktionen

Zentrale Funktionen

Actian Data Observability Neu

Zentrale Funktionen

Datenbanken

Produkte

Actian Data Platform

Zentrale Funktionen

Integration von Daten

Produkte

Produktübersicht

Alle Produkte

Turn Your Internal Docs into an AI-Powered Knowledge Base

Zusammenfassung

The Documentation Problem

How RAG Works

What is Dense Vector Search?

System Architecture

How VectorAI Stores Your Data

Real-World Use Cases

Building the Pipeline

Step 1 — Chunk Your Documents

Step 2 — Create a Collection and Index

The Complete RAG Query Function

Answers Grounded in Your Content

Turn Your Internal Docs into an AI-Powered Knowledge Base

Zusammenfassung

The Documentation Problem

How RAG Works

What is Dense Vector Search?

System Architecture

How VectorAI Stores Your Data

Real-World Use Cases

Building the Pipeline

Step 1 — Chunk Your Documents

Step 2 — Create a Collection and Index

The Complete RAG Query Function

Answers Grounded in Your Content

Bleiben Sie in Verbindung

Datenanalysen, die Ihnen geliefert werden.