Your team completes a three-month migration to an air-gapped environment to satisfy new compliance requirements. Two security audits. Significant budget. Two weeks after go-live, someone discovers the developers are using personal phone hotspots to access GitHub Copilot — because the local alternatives weren’t fast enough and nobody provisioned one.
Shadow AI is not a behavior problem. It’s an infrastructure gap. And it’s a harder compliance violation than the one you just spent three months fixing.
The Hidden Risks of Cloud AI Coding
Contextual Exposure
When a cloud AI coding assistant sends a completion request, it doesn’t just send the current line — it sends context. The file header. The import block. The function signatures above and below. Sometimes an entire file. The model needs this context to generate relevant suggestions, and the IDE plugin collects it automatically, without developer action.
GitHub Copilot’s data retention policy varies: Enterprise plans offer zero retention for IDE completions, while Individual plans may retain data for up to 28 days depending on the feature. But regardless of the plan, the data traverses Microsoft’s infrastructure over HTTPS. For a financial institution protecting proprietary trading algorithms, a biotech firm working with pre-patent research, or any organization under export control regulations, “trusted third-party infrastructure” is still third-party infrastructure. The TLS session terminates at Microsoft’s ingestion layer, not at your machine.
The second risk is less visible: accidental secret transmission. When an IDE indexes a project for AI context, it may sweep .env files, configuration files containing API keys, or internal hostnames. You didn’t intend to share those — they were caught in the context sweep. By the time anyone notices, the data has already left the machine.
Compliance and Regulatory Exposure
GDPR, HIPAA, and the EU AI Act (enforceable August 2026) all create liability frameworks around AI-processed data. The specific failure mode: a developer pastes patient records into an AI tool to write an anonymization function. The records are transmitted to a cloud API. The organization is potentially in violation of a HIPAA Business Associate Agreement — one they may not have known they needed.
Cloud AI providers offer enterprise-tier plans with zero-retention policies and signed BAAs. These address many regulatory concerns. But compliance auditing becomes harder when inference happens in a vendor’s infrastructure. You can’t inspect what the model processed. You can’t verify data handling within the vendor’s systems. You’re trusting the policy document.
A local model running on 127.0.0.1 doesn’t require trust in a policy document. The packet never leaves the machine. You can prove this with a network capture.
The Local-First Solution: Zero-Trust AI
How Local Inference Works
When OpenCode sends a prompt to Ollama, the request goes to http://127.0.0.1:11434. The loopback interface never touches a network adapter. There’s no HTTPS, no DNS resolution — it’s a process-to-process socket connection on the local machine. The model runs on the local GPU, generates tokens, and returns the response to the same socket.
From a security architecture standpoint, this is as controlled as it gets. The data never leaves the machine. There is no vendor to audit. There is no retention policy to verify. The evidence of containment is a packet capture showing zero egress.
Diagram
Visualizes the zero-trust local AI inference architecture, showing how the IDE plugin and inference engine communicate exclusively over the loopback interface, physically preventing data egress.
Visual Notes:
- The entire process is contained within your workstation boundary.
- Communication between OpenCode and Ollama occurs over
127.0.0.1. - The absence of external network connections guarantees data privacy.
Verify Ollama is bound only to localhost:
# Windows
netstat -an | Select-String "11434"
# Expected output: TCP 127.0.0.1:11434 0.0.0.0:0 LISTENING
# If you see 0.0.0.0:11434, Ollama is exposed to the network — fix this immediately
# Linux
sudo ss -tlnp | grep 11434
# Or: sudo netstat -tlnp | grep 11434
# macOS
sudo lsof -i :11434
Deploying Local AI Across a Team
The provisioning challenge: getting Ollama and approved models onto developer machines without requiring internet access at setup time.
The pattern — while connected to a trusted internal network, pull approved models once, package them with the Ollama binary and a standardized configuration, and distribute via your existing software deployment tooling.
# Pull approved models while on a trusted internal network
ollama pull opencoder:8b
# Confirm the model is cached locally
ollama list
# Lock Ollama to localhost for the current session
$env:OLLAMA_HOST = "127.0.0.1:11434"
# Set this permanently for the user
[Environment]::SetEnvironmentVariable("OLLAMA_HOST", "127.0.0.1:11434", "User")
For Windows environments, the Ollama models directory (%USERPROFILE%\.ollama\models) is portable. Copy it from a provisioning machine to developer machines and Ollama recognizes the models immediately — no internet connection required. The model files are self-contained GGUF files.
For Linux, package the models directory into your standard dev container image or use an internal artifact repository to distribute the GGUF files alongside the Ollama binary and a standardized Modelfile.
Hands-On Example: The Air-Gapped Workflow
Scenario: You’re working on a financial application that processes PII. You need AI assistance for a payment processing module while guaranteeing zero network transmission during the coding session.
Step 1: Pre-provision the model on a trusted network
# On a secured internal network
ollama pull opencoder:8b
# Verify
ollama list
# Output: opencoder:8b sha256:... 4.5 GB 2 minutes ago
Step 2: Disconnect and enforce network isolation
Physically disconnect from the network. Optionally, enforce at the OS level:
# Windows: Change default outbound action to Block
Set-NetFirewallProfile -Profile Domain,Public,Private -DefaultOutboundAction Block
# Note: Changing default outbound action is disruptive; perform with caution
Step 3: Work with the sensitive module
# Start Ollama (Linux/macOS)
ollama serve &
# Windows: Ollama starts automatically as a background service
cd ~/projects/payment-service
opencode
In OpenCode:
/add src/payment_processor.py
Refactor the charge() function to validate card number format before calling the gateway. Implement the Luhn algorithm for validation. Do not log the card number at any point, not even partially.
The model generates the refactored function. No packets leave the machine.
Step 4: Verify zero egress
# Linux: capture all non-localhost traffic during generation
sudo tcpdump -i any -n 'not (src host 127.0.0.1 or dst host 127.0.0.1)'
# macOS: capture all non-localhost traffic (replace en0 with your active interface)
sudo tcpdump -i en0 -n 'not (src host 127.0.0.1 or dst host 127.0.0.1)'
# If nothing appears while OpenCode is generating, all traffic was local
On Windows, Wireshark with display filter not ip.addr == 127.0.0.1 shows the same — no packets during generation.
Best Practices
Bind Ollama to 127.0.0.1, not 0.0.0.0. The default is already 127.0.0.1, but verify after updates. An accidental OLLAMA_HOST=0.0.0.0 setting exposes the unauthenticated API to every machine on your local network.
Audit telemetry settings for every AI tool in the stack. OpenCode and some underlying frameworks send crash reports or usage analytics independently of where the inference runs. Check settings for each tool and disable telemetry explicitly. Several “local AI” tools in 2025 were found to send usage metadata to cloud services even when inference was fully local.
Scan AI-generated code with SAST before committing. A model running on localhost can still produce code with SQL injection vulnerabilities, hardcoded credentials, or insecure cryptographic implementations. Run the same static analysis pipeline on AI-generated code that you run on human-written code — preferably in CI before merge.
Standardize developer hardware. The shadow AI problem in regulated environments almost always traces to performance. If the approved local model generates at 3 tokens/sec on underpowered hardware, developers find a faster cloud alternative. MacBook Pros with 36GB unified memory or workstations with 16GB VRAM GPUs run 8B models at 50+ tokens/sec. The performance must compete with cloud alternatives or the policy fails in practice.
Troubleshooting
Problem: An AI tool is sending traffic even when the model is local.
Cause: IDE plugins, telemetry frameworks, and update checks often have network calls independent of the inference backend. The local model being local doesn’t make the wrapper tool local.
Fix: Capture all outbound traffic during a coding session with Wireshark or tcpdump. Inspect any non-localhost destinations. For each AI-related extension or tool, find the telemetry setting and disable it. Consider running the development environment in a container with explicit egress firewall rules to enforce zero external traffic at the infrastructure level.
Problem: Developers bypass the local AI policy because performance is too slow.
Cause: The hardware budget doesn’t support local inference at useful speeds. An 8B model on a machine with 4GB VRAM offloads most inference to system RAM and generates at 2–4 tokens/sec. That’s not fast enough to replace a cloud subscription in a developer’s daily workflow.
Fix: The minimum viable hardware for acceptable local AI performance is 8GB dedicated VRAM or Apple Silicon with 16GB+ unified memory. Below that, the policy will fail in practice regardless of enforcement. If hardware refresh isn’t immediately feasible, a shared local inference server (one high-spec Linux workstation running Ollama with OLLAMA_HOST=0.0.0.0 on a trusted subnet, accessible only to developer machines on that subnet) is a viable interim approach.
Key Takeaways
Cloud AI coding tools present a genuine compliance risk for organizations handling regulated data or proprietary IP — not because vendors are untrustworthy, but because the data leaves the machine. Local AI with Ollama and OpenCode eliminates that risk by design: inference runs on 127.0.0.1, packets never traverse a network adapter, and you can prove this empirically.
The operationalization challenge is hardware and provisioning. Developers need machines fast enough to run 8B models at 40+ tokens/sec, or they’ll route around the policy. Pre-provision models via internal distribution tooling. Lock Ollama to localhost. Audit telemetry on every tool in the stack. The EU AI Act’s enforceability deadline is August 2026 — if your organization uses cloud AI tools with regulated data and hasn’t evaluated the data processing agreement implications, that’s the timeline to work with.
