Blog | Product Launches | | 7 min read

Turn Your Internal Docs into an AI-Powered Knowledge Base

turn-your-internal-docs-into-an-ai-powered-knowledge-base

Zusammenfassung

  • RAG makes documentation searchable by combining semantic retrieval with an LLM that answers using only the retrieved document context.
  • The process has three main steps: chunk and index documents, retrieve the most relevant chunks with vector search, then generate a grounded answer from those chunks.
  • Dense vector search works by turning text into embeddings so semantically similar content can be found even when the wording is different.
  • This approach is useful for technical docs, support knowledge bases, compliance search, onboarding, transcript search, and product catalogs.
  • The main benefit is fast, grounded answers from your own content, especially when the whole pipeline runs on your own infrastructure for privacy and control.

The Documentation Problem

Every organization with a product, a process, or a compliance requirement faces the same challenge: knowledge exists, but it is not accessible. A support engineer on a call cannot search through 400 pages of documentation in real time. A new employee cannot absorb three years of internal guides in their first week. A customer cannot find the specific configuration setting buried on page 247 of a PDF.

Traditional search helps—but it requires exact keywords. Ask “How do I fix error 12” when the document says “status code 12: MicroKernel cannot find the specified file,” and the keyword search fails. You need semantic search—search that understands meaning, not just words.

Key insight: The answer already exists in your documentation. The challenge is making it findable—instantly, accurately, and without hallucination.

How RAG Works

Retrieval-Augmented Generation (RAG) separates two problems that LLMs struggle to solve: knowing your specific content, and reasoning about it. RAG handles the knowledge problem with retrieval. The LLM handles the reasoning problem with generation.

The three-step pipeline:

  1. Index your documents. Documents are split into chunks. Each chunk is converted to a vector embedding by a language model, and both the vector and the original text are stored in a vector database.
  2. Retrieve relevant context. When a user asks a question, the question is embedded using the same model. The vector database finds the most semantically similar document chunks, regardless of exact wording.
  3. Generate a grounded answer. The top matching chunks are passed to an LLM as context. The model reads them and answers only from what it was given. No hallucination—just reasoning over your actual documents.

Privacy advantage: With Actian VectorAI and Ollama, the entire pipeline runs on your own servers. Your documents, queries, and answers never leave your infrastructure.

When an embedding model processes text, it converts it to a list of numbers—a vector—where the position in high-dimensional space encodes semantic meaning. Text with similar meaning ends up at similar positions. This is the foundation of dense search.

The nomic-embed-text, the model used in this guide, converts any text to 768 numbers. These dimensions are not human-readable. They are learned representations that emerge from training on billions of text examples. The model learns that “cannot find file” and “file not found” should produce similar vectors, even though they share none of the same words.

At search time, the query vector is compared against every stored document vector using cosine similarity—a measure of the angle between two vectors. Smaller angle = higher similarity = more relevant result. The HNSW index makes this comparison fast, even across millions of vectors.

Input Similarity
Query: “fix status 12” — (query vector)
Doc: “12: MicroKernel cannot find…” 0.94 — very relevant
Doc: “Crystal Reports for Zen…” 0.12 — not relevant

System Architecture

The dense RAG pipeline has two phases. The indexing phase runs once, or on a schedule when documents change. The query phase runs on every user request, typically completing in under 200ms.

system-architecture-vectorai-db

Notice the critical detail: the same embedding model is used for both indexing and querying. If you index with nomic-embed-text, you must query with nomic-embed-text. The vectors only make sense relative to the model that created them. Swapping models requires a full reindex.

How VectorAI Stores Your Data

Actian VectorAI DB stores vectors in a high-performance columnar storage engine using FAISS for indexing. Each vector is stored as 32-bit floating point numbers—the industry standard for ML embeddings. At 768 dimensions, that is exactly 3,088 bytes per vector, confirmed from the engine log:

# Confirmed from vde.log on a production deployment
[VectorStore::CreateFile] Created file: vectors.db
(dim=768, record_len=3088)

# The math:
768 dimensions x 4 bytes (float32) = 3,072 bytes
+ 16 bytes overhead
= 3,088 bytes per vector  (confirmed)

Storage facts—verified from production logs:

Parameter Wert Anmerkungen
Index algorithm FAISS HNSW Facebook AI Similarity Search
Storage backend Vector DB files Proven enterprise storage
Vector format float32 (4 bytes) Industry standard for ML
Record size (dim=768) 3,088 bytes 768 x 4 + 16 overhead
Record size (dim=384) 1,552 bytes 384 x 4 + 16 overhead
Segment limit 2 GB per segment Auto-extends to new segments
File size limit Keine Limited only by disk space
HNSW efSearch 64 Candidates examined per query
HNSW efConstruct 200 Candidates during index build
HNSW M 32 Connections per node

The storage engine splits files into 2 GB segments automatically, so there is no practical vector count limit. Storage capacity is determined entirely by available disk space. At 3,088 bytes per vector, one terabyte of storage holds approximately 340 million vectors.

Real-World Use Cases

Technical Documentation Search. Index product manuals, API references, and configuration guides. Users ask questions in plain language and get answers with exact document citations and page references.

Support Knowledge Base. Index historical support tickets and resolutions. When a new case arrives, surface similar past cases and their solutions automatically, reducing resolution time significantly.

Compliance and Policy Search. Make legal documents and compliance policies instantly searchable. Always cites the specific clause or section that applies to the user’s question.

Employee Onboarding. New hires ask questions about HR policies, processes, and tools. The system answers from your actual internal documentation—personalised, accurate, and always up to date.

Video and Audio Search. Transcribe training videos and meetings with Whisper, index the transcripts, and search spoken content by meaning, with deep links to the exact timestamp.

Product Catalogue. Index product specifications, compatibility matrices, and release notes. Sales teams get instant, accurate answers during customer calls without leaving the conversation.

Building the Pipeline

Step 1 — Chunk Your Documents

Character-based chunking at 1,000 characters with 100-character overlap works well for technical documentation. The overlap ensures sentences are never cut off at a boundary:

def chunk_text(text: str, chunk_size=1000, overlap=100) -> list:
"""Split text into overlapping character-based chunks."""
chunks, i = [], 0
while i < len(text):
chunk = text[i : i + chunk_size].strip()
if len(chunk) > 80:     # skip near-empty chunks
chunks.append(chunk)
i += chunk_size - overlap
return chunks

Step 2 — Create a Collection and Index

from actian_vectorai import VectorAIClient, VectorParams, Distance, PointStruct
import requests
def embed(text: str) -> list:
r = requests.post("http://localhost:11434/api/embeddings",
json={"model": "nomic-embed-text", "prompt": text})
return r.json()["embedding"]
with VectorAIClient("192.168.x.x:6574") as client:
client.collections.create(
"my_docs",
vectors_config=VectorParams(size=768, distance=Distance.Cosine)
)
for i, chunk in enumerate(chunks):
client.points.upsert("my_docs", [PointStruct(
id     = i,
vector = embed(chunk),
payload = {"source": "doc.pdf", "chunk": i, "text": chunk}
)])

The Complete RAG Query Function

import requests
from actian_vectorai import VectorAIClient

VECTOR_SERVER = "192.168.x.x:6574"
OLLAMA        = "http://localhost:11434"
COLLECTION    = "my_docs"

def rag_query(question: str, top_n: int = 5) -> dict:
# Step 1: Embed the question
query_vec = embed(question)

# Step 2: Search VectorAI DB
with VectorAIClient(VECTOR_SERVER) as c:
results = c.points.search(
COLLECTION, vector=query_vec,
limit=top_n, with_payload=True,
)

# Step 3: Build context from retrieved chunks
context = "\n\n---\n\n".join(
r.payload["text"] for r in results)

# Step 4: Ask the LLM with grounded context
prompt = f"""You are a knowledgeable assistant.
Answer using ONLY the documentation excerpts below.
Documentation: {context}
Question: {question}
Answer:"""

resp = requests.post(f"{OLLAMA}/api/chat", json={
"model": "llama3:8b", "stream": False,
"options": {"temperature": 0.1, "num_predict": 1024},
"messages": [{"role": "user", "content": prompt}]
})
answer = resp.json()["message"]["content"].strip()

# Step 5: Return answer + citations
sources = list({r.payload["source"] for r in results})
return {"answer": answer, "sources": sources}

Why temperature=0.1? For factual Q&A, you want deterministic, accurate answers. Low temperature keeps the LLM focused on what the documentation actually says rather than extrapolating or embellishing.

Key Stats — Production Deployment

Metrisch Wert
Embedding dimensions 768
Bytes per vector 3,088
Index algorithm HNSW (FAISS)
Embedding model nomic-embed-text (Ollama)
LLM llama3:8b (Ollama)
gRPC port 6574
Infrastruktur 100% on-premise

Answers Grounded in Your Content

Dense vector RAG with Actian VectorAI DB turns static documentation into an instantly queryable knowledge base. The pipeline is simple: chunk your documents, embed them with nomic-embed-text, store them in VectorAI’s FAISS-HNSW index, and let the LLM answer from whatever context the search retrieves.

Because everything runs on your own infrastructure, your documents never leave your network. Every answer is grounded in your actual content, not in what a model learned during training.

Find out more about Actian VectorAI DB.

 

Built with Actian VectorAI DB  ·  FAISS-HNSW  ·  High-Performance Storage  ·  Ollama  ·  llama3:8b
All queries run 100% on-premises. No document, query, or answer leaves your infrastructure.