Building a Secure RAG System for Internal DevOps Documentation

It’s 2am. You’re paged for a SEV1. Your runbooks are scattered across Confluence pages, SharePoint libraries, GitHub wikis, and a legacy shared drive that nobody admits still exists. You search three systems, find conflicting procedures with different dates, and end up calling the person who wrote the original playbook—who is, naturally, asleep. A Retrieval-Augmented Generation (RAG) system built on your internal docs solves this. But only if the architecture ensures that a query about your payment service doesn’t leak context from your restricted security procedures to the wrong person asking.

Most RAG implementations reach for the simplest path: public APIs, shared indexes, no access controls. That works fine for public documentation. Your internal DevOps docs are a different problem. They contain architecture diagrams with subscription IDs, credential rotation procedures, and network topology details that must not cross team boundaries. A secure RAG system requires private networking, identity-based access, and document-level security trimming.

1. RAG Architecture for Internal DevOps Documentation

Unlike fine-tuning, which “bakes” knowledge into model weights at training time, RAG retrieves relevant document chunks at query time. For DevOps teams, where runbooks change with every incident and architecture decision, RAG’s freshness is decisive. A new playbook added Monday is available to your on-call engineers Monday afternoon, without model retraining.

The Secure RAG Component Stack

The architecture consists of five core layers:

Azure Blob Storage: The document corpus containing raw Markdown, PDFs, or post-mortems.
Azure AI Search: The vector engine storing document embeddings and metadata.
Azure OpenAI (Embeddings): Converts document chunks and queries into 1536-dimensional vectors.
Azure OpenAI (Chat): Constructs the final answer using the retrieved context.
Azure Functions (Orchestrator): Handles ingestion triggers and query-time security filtering.

Every component sits inside a Virtual Network. Traffic between AI Search and Azure OpenAI never leaves the Microsoft backbone, and your orchestrator communicates with the models via Private Endpoints.

2. Private Networking for the RAG Stack

The foundation of a secure knowledge base is removing public internet exposure. Configure Azure AI Search and Blob Storage with Private Endpoints, just like the OpenAI resource described in [Cluster Post 1].

Outbound Shared Private Link

A common architectural oversight is the ingestion path. If your AI Search indexer needs to call Azure OpenAI to vectorize documents, that call must be secure. In 2026, we use Shared Private Link Resources to enable outbound connectivity from AI Search to other private Azure services.

// Outbound Shared Private Link from Search to OpenAI
resource searchToOpenAiLink 'Microsoft.Search/searchServices/sharedPrivateLinkResources@2024-03-01-preview' = {
  parent: searchService
  name: 'link-to-openai'
  properties: {
    privateLinkResourceId: openAiResourceId // Full ARM resource ID of OpenAI
    groupId: 'openai_account' // Specific group for outbound Azure OpenAI calls
    requestMessage: 'Enable secure vectorization for DevOps indexer.'
  }
}

Note: Once deployed, this connection enters a “Pending” status and MUST be approved in the target Azure OpenAI resource’s Networking blade before the indexer can communicate.

Ingestion Path via Shared Private Link

3. Document-Level Security Trimming

A shared RAG index containing documents from multiple teams creates a real data boundary risk. Without security trimming, a user could craft a query to retrieve documents they aren’t authorized to read. The LLM won’t refuse—it will happily summarize whatever context you hand it.

The Identity Filter Pattern

The correct solution is the Identity Filter Pattern. During ingestion, map source document permissions (e.g., Confluence Space permissions) to Entra ID Group Object IDs. Store these IDs in a Collection(Edm.String) field named allowedGroups in your AI Search index.

At query time, your orchestrator retrieves the user’s active group memberships via the Microsoft Graph API and applies them as a hard OData filter.

# Query-time security trimming using the search.in function
def get_security_filter(user_groups: list[str]):
    # 'search.in' is optimized for large lists of Group Object IDs
    group_string = ",".join(user_groups)
    return f"allowedGroups/any(g: search.in(g, '{group_string}'))"

# AI Search execution with security trimming
results = search_client.search(
    search_text=user_query,
    filter=get_security_filter(active_groups),
    vector_queries=[vector_query]
)

Identity Filter Pattern (Security Trimming)

By filtering at the retrieval layer, the LLM never receives context blocks that the user isn’t permitted to see. Security is enforced at the data source, not the model prompt.

4. Ingestion Pipeline and Deletion Detection

For DevOps content, Markdown-aware chunking is superior to arbitrary character counts. Runbooks are structured by headings. Splitting on ## ensures that a “Node Drain Procedure” stays as a single, coherent context block.

Deletion Policy: No Ghost Runbooks

A RAG system that retains deleted runbooks is dangerous. If an incident playbook is archived because it contains a flaw, the AI must stop suggesting it immediately—not at the next scheduled sync, immediately. Use the Native Blob Soft Delete pattern. When a file is removed from Blob Storage, the AI Search indexer detects the soft-deleted state and automatically purges the corresponding chunks from the index during the next run.

// AI Search Indexer Deletion Policy
"dataDeletionDetectionPolicy": {
    "@odata.type": "#Microsoft.Azure.Search.NativeBlobSoftDeleteDeletionDetectionPolicy"
}

5. Query Pipeline and Hybrid Search

Technical documentation relies on exact terms: service names, error codes, and CLI flags. Pure vector search (semantic similarity) can miss these exact matches.

Why Hybrid search + RRF?

Use Hybrid Search, which executes BM25 (keyword) and Vector search in parallel. Azure AI Search uses Reciprocal Rank Fusion (RRF) to merge these results. If a document matches an exact error code like 0x80040154 via keyword search, it is promoted to the top, even if the semantic similarity is lower.

In 2026, we also enable the Semantic Ranker as a second-pass (L2) re-ranker. This deep-learning model from Bing validates the top 50 results to ensure the most relevant procedure is actually the first context block the LLM reads.

6. Response Generation and Citations

The final step is instructing the LLM to act as a grounded DevOps assistant. Your system prompt must constrain the model to answer only from the provided context and, crucially, to cite its sources.

System Prompt Constraint

“You are a Platform Engineering assistant. Answer queries using only the provided runbooks. If the answer is not in the context, state that you do not know. Every answer must include a citation in the format [Source Name - Section].”

By including source URLs and last-modified dates in your index metadata, you can pass these to the LLM. This lets an engineer click directly into the original Confluence page if they need to see the full architecture diagram or verify the original author.

Hands-On Example: The platform “DevOps Brain”

To deploy a secure RAG stack for your team:

Provision AI Search with publicNetworkAccess: 'Disabled'.
Assign Managed Identity roles: grant your Function App Search Index Data Contributor and Storage Blob Data Reader.
Map Permissions: Create a lookup table (e.g., GitHub Team -> Entra Group ID) to populate the allowedGroups field.
Connect Sources: Use the GitHub API or Microsoft Graph to sync runbooks to Blob Storage.
Enable Streaming: Use stream=True in your Azure OpenAI chat completion call. Sub-second “time-to-first-token” is critical when an engineer is under pressure during a SEV1 incident.

Key Takeaways

Security at the Source: Use OData filters in AI Search to enforce document-level access. Never rely on the LLM to “ignore” unauthorized context.
Hybrid is the Standard: Combine BM25 keyword search with Vector search for technical documentation. Exact matches for procedure names matter.
Outbound Security: Use Shared Private Links to keep your ingestion pipeline (AI Search to OpenAI) on the private backbone.
Automate Deletion: Implement a deletion detection policy to ensure archived or dangerous runbooks are purged from the AI’s memory.

Next Steps:

Read [Cluster Post 6] to place an AI Gateway (APIM) in front of your RAG query endpoint for PII scrubbing and centralized audit logging.
Return to the [Pillar Post] to see how secure RAG fits into the overall AI DevOps Blueprint.