I use Opencode because it is open source, provider agnostic, and it speaks to local providers like Ollama or LM Studio.
For teams that need more control, there is a clean enterprise path with SSO and self-hosting. Rules live in the repo, not in my head.

Why enterprise setups matter

Running LLMs inside a company is not about chasing the newest model: it is about trust and control.
Enterprise use requires more than just an API key:

Data privacy: especially in the EU, teams must ensure code and prompts never leave the company boundary.
Model control: ability to pin model versions, roll back upgrades, and know what weights are running.
Custom training and fine-tuning: adapting to domain language or code conventions.
Authentication and SSO: align access with company identity systems.
Monitoring and observability: track usage, latency, and errors like any other service.
Support and continuity: someone needs to own updates, scaling, and security patches.
Integration with CI/CD: interactive coding sessions should map cleanly to automated checks.

These requirements are what distinguish an enterprise pattern from a weekend experiment.

Quick demo: Local edge box (Jetson Orin + Ollama + Qwen Coder)

This is the easiest way to try the setup without hosting big infrastructure. Think of it as a proof-of-concept or a hackathon box, not an enterprise solution.

Why use it

Fast feedback in the terminal.
No egress: code and prompts stay on the device.
Zero cluster work, so anyone can demo it.

Setup

Install Ollama on Jetson Orin (native or Docker).
Pull a coding model: qwen2.5-coder:7b or qwen3-coder:30b if VRAM allows.
Point Opencode to the local OpenAI-compatible endpoint.

Quick smoke test

opencode auth login
opencode models
opencode run "list all REST endpoints in this repo"

Limitations

Not for multi-team scale or compliance.
Context and runtime limits vary by device.
Best used to explore, not to deploy.

Enterprise: Internal hosted service (vLLM + Kubernetes + Opencode)

This is the pattern that scales: it mimics a hosted LLM API, but it runs inside your network. Security teams can review it, and engineering teams can share it.

Why use it

Consistent behavior across teams and CI.
Centralized quotas, logs, and upgrades.
Private DNS only: clean boundary for reviews.
Meets enterprise needs: privacy, SSO, monitoring, and model control.

Setup

Deploy vLLM with a GPU-backed Deployment and an internal Service.
Expose an OpenAI-compatible endpoint inside the network.
Point Opencode to that endpoint. Add SSO and a policy to disable sharing.

Design choices

One namespace per environment.
Token-based access. Rotate regularly.
Per-model quotas: keep requests predictable.
No public ingress by default.

Quick smoke test

opencode auth login
opencode models
opencode run "summarize the changes in this branch"

What I commit to the repo

AGENTS.md with a primary agent and one focused subagent (db-migrator). Tools are minimal. Rules are short.
.aider.conf.yml if the team also uses Aider. Model, file globs, and provider url go here.
A tiny loop at the end of every session: ask the agent what it learned and how it would change the system prompt. If it is useful, fold one or two lines back into AGENTS.md in a small PR.

My end-of-session prompt

From this session, what did you learn about this repo and our patterns?
Rewrite the system prompt for this project with those insights in under 200 words.
Name three concrete guardrails we should add to avoid mistakes next time.

Runbooks I actually use

Demo: Jetson Orin + Ollama + Qwen

Install Ollama. Start the server.
ollama pull qwen2.5-coder:7b or larger coder model if VRAM allows.
Increase context to match your workflow. Confirm tool use support.
opencode auth login → set local endpoint → opencode models
Commit AGENTS.md with minimal tools. Work in feature branches.

Enterprise: vLLM + Kubernetes + Opencode

Apply the Kubernetes manifest below or your own version.
Restrict the Service to internal networks. Add token auth and limits.
Point Opencode at the internal endpoint. Confirm from CI.
Add one read-only MCP server later if you need docs search. Start small.
Consider SSO and self-host options when more teams join.

Enterprise setups that work: Opencode with demos and hosted models