Process

Four steps to efficiency

From document burden to intelligent generation. A flow designed for procurement technicians who value their time.

Upload your document history

Connect LicitadIA with your existing repositories: SharePoint, Nextcloud, local folders. The platform automatically indexes all your historical tender documentation.

Define the tender

Complete a simple form with basic tender data: document type, subject, CPV, procedure, and amount. LicitadIA immediately understands the context.

Generate the draft

The RAG system retrieves the most relevant documents from your history and, combined with the form information, generates a complete and coherent draft in minutes.

Review and adjust

The generated document is an intelligent draft, not a final product. Edit it freely, request specific changes from the conversational assistant, and export when satisfied.

Technical anatomy

How a RAG system works

The architecture that lets a language model consult your own documentation in every answer. A step-by-step guide through the two phases of the process —indexing and query— and through the maths that hold them together.

The conceptual backbone of LicitadIA, explained without shortcuts.

Start with the problem Skip to the animations

Scroll to begin

· Offline indexing
· Online query
· Cosine similarity
· Recursive chunking

The starting point

Three limitations of language models

An LLM on its own is a very smart student taking an exam from memory. Without access to reference material, it fails in exactly the cases that matter most.

Limitation 01

Frozen knowledge

A model trained in 2024 knows nothing of what happened after its cut-off date. Laws change, precedents evolve, specifications get updated — the LLM has no idea.

Limitation 02

It doesn't know your information

The LLM has never read your organisation's files, nor the historical specifications, nor the clauses your team has drafted over years. It completely lacks institutional memory.

Limitation 03

A tendency to hallucinate

When the model doesn't know something, it doesn't stay silent: it invents plausible-sounding but false answers. In an administrative context —where a wrong clause is a real risk— that is unacceptable.

The solution

A RAG solves all three problems at once: it gives the model, on every query, the exact context it needs to answer with precision, currency and traceability.

Let's see it

The contrast

Same question, two different answers

Before going into the technical detail, look at what changes when an LLM does — or does not — have access to your organisation's actual documentation.

Question "What is the return deadline under our framework contract with ACME?"

LLM without RAG

Plausible. Generic. Made up. The model hasn't read your contract — it extrapolates from the industry average.

LLM with RAG

"Clause 14.2: the customer has 30 calendar days from receipt of the product to exercise the right of return."

"Clause 14.3: returns must include the original packaging and all accessories."

Precise. Specific. Traceable. The model cites the exact clauses — the answer is auditable.

Two phases, one shared space

Architecture overview

Every RAG breaks down into two clearly distinct phases. One runs only once, asynchronously, over the documents. The other runs every time a user asks a question.

1 Indexing phase · offline

Done once

01 You gather the source documents.
02 You split them into coherent chunks.
03 Each chunk goes through an embedding model.
04 Vectors and text are stored in the vector database.

2 Query phase · online

Repeated on every question

01 The user's question is vectorised.
02 The most similar vectors are searched.
03 The top-k fragments are retrieved.
04 The augmented prompt is assembled.
05 The LLM generates the final answer.

Phase 01 · offline

How your institutional memory gets indexed

It happens once for every document you add. It is slow, it runs in the background and produces something invisible but essential: a numerical representation of every idea, ready to be queried in milliseconds.

Step 01

Source documents

Historical specifications, framework contracts, justification memoranda, regional regulations. Everything your organisation has produced over the years enters the system.

Step 02

Recursive chunking

Each document is broken into chunks of 200–1000 tokens, respecting natural boundaries (paragraph, sentence). A 10–20% overlap prevents losing information at the edges.

Step 03

Embedding vectors

Each chunk goes through an embedding model — typically a specialised transformer — that returns a vector of hundreds of dimensions encoding its meaning.

Step 04

Vector database

Vectors, text and metadata are stored and indexed with HNSW or another ANN algorithm. Ready to answer, in milliseconds, any future query.

Scroll to advance — indexing walks through the four stages in order.

Pause · vector space

Meanings turned into geometry

Each fragment becomes a vector of hundreds of dimensions. Here we project it into two dimensions to visualise it: texts about the same thing sit close together; those that aren't, far apart. That geometric proximity is semantic similarity.

How to read it

Each dot is a fragment. Hover over a cluster's title to highlight it. The real space has hundreds of dimensions — this is a 2D projection meant to convey the idea.

"The cat eats fish"
→ [0.0234, -0.187, ..., 0.002]

"A feline consumes seafood"
→ [0.0301, -0.165, ..., -0.011]

// Vectors close together in space

Phase 02 · online

What happens when someone asks

Five steps, executed in less than a second. The question goes through the same embedding model, sweeps the vector database, retrieves the relevant fragments and ends in a traceable answer.

Step 01

Vectorising the question

The user's question is converted into a vector using exactly the same embedding model the documents were indexed with.

Step 02

Similarity search

The vector database computes the cosine of the angle between the question vector and each indexed vector — millions, in milliseconds.

Step 03

Top-k fragments

The k most similar vectors (typically 3, 5 or 10) are returned, ordered from highest to lowest semantic proximity.

Step 04

Augmented prompt

A prompt is assembled that includes a system instruction, the retrieved fragments as context and the user's original question.

Step 05

Answer generation

The LLM generates the final answer based solely on that context, able to cite the exact source of every claim.

The maths of similarity

Cosine similarity, step by step

Two vectors form an angle. The cosine of that angle is the metric every RAG uses to decide what is "similar" and what isn't. It ignores magnitude and looks only at direction — and that's why it works equally well for short and long texts.

The formula

sim(A, B) = cos θ = A · B ‖A‖ · ‖B‖

Dot product of the vectors, divided by the product of their norms.

Try it

A numerical example

A = [1, 2, 3] · B = [2, 3, 4]
// dot product
A · B = 1·2 + 2·3 + 3·4 = 20
// norms
‖A‖ = √14 ≈ 3.742
‖B‖ = √29 ≈ 5.385
// similarity
sim = 20 / (3.742 · 5.385) ≈ 0.9926

1.0

Same direction

Perpendicular

−1.0

Opposite directions

Beyond vanilla RAG

Six techniques that change the result

What we've described so far is a basic RAG, already useful. In real-world projects — and in LicitadIA — improvements are layered on that multiply precision and reduce hallucinations to almost zero.

Hybrid search

Combines vector search with classic BM25. Semantics captures meaning; lexical search finds codes, proper names and exact technical terms. Reciprocal Rank Fusion merges them.

Re-ranking

Retrieve 20 candidates and reorder them with a more expensive but more precise cross-encoder. Keep the best 5. Drastic relevance gain, controlled latency.

Query expansion

An LLM rephrases the question into variants, or you apply HyDE: generate a hypothetical answer and vectorise that, not the question. Works because answers look more like documents.

Metadata filtering

Before searching, restrict the space: only documents from the past year, only from file X, only in Spanish. Reduces noise and multiplies relevance.

Parent context

Index small chunks (better search precision) but when retrieving one, pass the LLM the larger fragment containing it. Best of both worlds.

Agentic RAG

An agent runs iterative searches, refining the query as it discovers more, until it has enough information. Slower; unbeatable on complex questions.

Common challenges

Mis-calibrated chunks

There is no universal recipe; you have to experiment with sizes and overlaps.

Multi-document questions

Classic RAG struggles when the answer crosses sources; solutions: hierarchical summarisation, agentic RAG.

Evaluation

Recall@k, faithfulness and relevance. Frameworks like RAGAS automate the measurement.

Costs and privacy

Cache frequent queries; metadata-based access control for sensitive data.

THIS IS WHY LICITADIA

Your institutional memory, browsable like a library.

LicitadIA applies everything you've just seen — recursive chunking, specialised embeddings, hybrid search, re-ranking and per-file filtering — over the specifications, memoranda and clauses of your own organisation. The AI stops inventing and starts citing.

Request a free demo See features

less time drafting specifications

traceability of cited sources

hallucinations on private data