September 27, 2025

Enterprise setups that work: Opencode with demos and hosted models

Here is how I run CLI-first AI coding in real teams without drama. One realistic enterprise pattern, plus a simple demo setup anyone can try.

AICLIEnterpriseSoftware DevelopmentOpencode

I use Opencode because it is open source, provider agnostic, and it speaks to local providers like Ollama or LM Studio.
For teams that need more control, there is a clean enterprise path with SSO and self-hosting. Rules live in the repo, not in my head.


Why enterprise setups matter

Running LLMs inside a company is not about chasing the newest model: it is about trust and control.
Enterprise use requires more than just an API key:

  • Data privacy: especially in the EU, teams must ensure code and prompts never leave the company boundary.
  • Model control: ability to pin model versions, roll back upgrades, and know what weights are running.
  • Custom training and fine-tuning: adapting to domain language or code conventions.
  • Authentication and SSO: align access with company identity systems.
  • Monitoring and observability: track usage, latency, and errors like any other service.
  • Support and continuity: someone needs to own updates, scaling, and security patches.
  • Integration with CI/CD: interactive coding sessions should map cleanly to automated checks.

These requirements are what distinguish an enterprise pattern from a weekend experiment.


Quick demo: Local edge box (Jetson Orin + Ollama + Qwen Coder)

This is the easiest way to try the setup without hosting big infrastructure. Think of it as a proof-of-concept or a hackathon box, not an enterprise solution.

Why use it

  • Fast feedback in the terminal.
  • No egress: code and prompts stay on the device.
  • Zero cluster work, so anyone can demo it.

Setup

  1. Install Ollama on Jetson Orin (native or Docker).
  2. Pull a coding model: qwen2.5-coder:7b or qwen3-coder:30b if VRAM allows.
  3. Point Opencode to the local OpenAI-compatible endpoint.

Quick smoke test

opencode auth login
opencode models
opencode run "list all REST endpoints in this repo"

Limitations

  • Not for multi-team scale or compliance.
  • Context and runtime limits vary by device.
  • Best used to explore, not to deploy.

Enterprise: Internal hosted service (vLLM + Kubernetes + Opencode)

This is the pattern that scales: it mimics a hosted LLM API, but it runs inside your network. Security teams can review it, and engineering teams can share it.

Why use it

  • Consistent behavior across teams and CI.
  • Centralized quotas, logs, and upgrades.
  • Private DNS only: clean boundary for reviews.
  • Meets enterprise needs: privacy, SSO, monitoring, and model control.

Setup

  1. Deploy vLLM with a GPU-backed Deployment and an internal Service.
  2. Expose an OpenAI-compatible endpoint inside the network.
  3. Point Opencode to that endpoint. Add SSO and a policy to disable sharing.

Design choices

  • One namespace per environment.
  • Token-based access. Rotate regularly.
  • Per-model quotas: keep requests predictable.
  • No public ingress by default.

Quick smoke test

opencode auth login
opencode models
opencode run "summarize the changes in this branch"

What I commit to the repo

  • AGENTS.md with a primary agent and one focused subagent (db-migrator). Tools are minimal. Rules are short.
  • .aider.conf.yml if the team also uses Aider. Model, file globs, and provider url go here.
  • A tiny loop at the end of every session: ask the agent what it learned and how it would change the system prompt. If it is useful, fold one or two lines back into AGENTS.md in a small PR.

My end-of-session prompt

  1. From this session, what did you learn about this repo and our patterns?
  2. Rewrite the system prompt for this project with those insights in under 200 words.
  3. Name three concrete guardrails we should add to avoid mistakes next time.

Runbooks I actually use

Demo: Jetson Orin + Ollama + Qwen

  1. Install Ollama. Start the server.
  2. ollama pull qwen2.5-coder:7b or larger coder model if VRAM allows.
  3. Increase context to match your workflow. Confirm tool use support.
  4. opencode auth login → set local endpoint → opencode models
  5. Commit AGENTS.md with minimal tools. Work in feature branches.

Enterprise: vLLM + Kubernetes + Opencode

  1. Apply the Kubernetes manifest below or your own version.
  2. Restrict the Service to internal networks. Add token auth and limits.
  3. Point Opencode at the internal endpoint. Confirm from CI.
  4. Add one read-only MCP server later if you need docs search. Start small.
  5. Consider SSO and self-host options when more teams join.

Enjoyed this article?

Check out more articles or connect with me