K@tooling on Martin Sukany

LLMs in Emacs: My Actual gptel Setup

Mon, 23 Mar 2026 00:00:00 +0000

I’ve been using gptel daily for three months now. This isn’t a review — it’s a field report from someone running LLMs inside Emacs on a corporate macOS machine with a MITM proxy, compliance requirements, and zero patience for black-box tooling.

Why Emacs for LLM Work

gptel is a thin client. It sends text to an API, gets text back. That’s it. No hidden prompt injection, no telemetry you can’t inspect, no magic. You see exactly what goes over the wire.

I came from VS Code’s Copilot Chat. It works fine until you need to understand what it’s actually doing. Which model is it using right now? What’s in the system prompt? Can I route this through a different backend? The answer is always: you can’t, or you need an extension that half-works.

gptel gives you full control because there’s nothing to control. It’s Emacs — the config is the product. Every backend, every model, every parameter is an elisp variable you can inspect and change at runtime.

The corporate context matters here. I’m on a work macOS with a MITM proxy that intercepts TLS. Compliance says data must not be retained by third parties. I need to know exactly where my prompts go. With gptel, I do.

Three months in, I can say: gptel is not the most polished LLM interface. It is the most transparent one.

One Config to Rule Them All

The first thing I did was centralize. One elisp file controls both gptel and aidermacs. One variable switches the default backend:

;; One line to switch the default for both gptel and aidermacs:
(defvar my/llm-default-backend "Copilot")
;; (defvar my/llm-default-backend "Claude-Max") ; personal machine

The second piece is a preference list. Backends expose different models — Copilot gives you Claude, GPT-5, Gemini through one API. The preference list picks the best available model automatically:

(defvar my/gptel-model-preferences
 '(claude-opus-4.6 claude-opus-4.5
 claude-sonnet-4.6
 gpt-5.4 gpt-5.2 gpt-4o
 gemini-3.1-pro-preview)
 "First match from dynamically fetched models wins.")

When I switch machines or a model disappears from an API, the preference list falls through to the next option. No breakage, no manual editing. This pattern scales to any number of backends — everything downstream (gptel, aidermacs, org-babel helpers) reads from the same source.

GitHub Copilot for Business as Primary Backend

Why Copilot? Compliance. GitHub Copilot for Business does not retain prompts or completions — that’s contractual, not just a policy page. For a corporate environment where data retention matters, this is the deciding factor.

The bonus is access. One Copilot subscription gives you Claude, GPT-5, Gemini, and others through a single API. No separate billing, no individual API keys. IT signs one contract, I get a model zoo.

The auth flow uses a two-stage token exchange. You start with an OAuth token stored locally by the GitHub Copilot VS Code extension in ~/.config/github-copilot/apps.json. That token gets exchanged for a short-lived session token via GitHub’s API:

;; OAuth token from ~/.config/github-copilot/apps.json
;; -> exchanged for short-lived session token (TTL ~30 min)
;; -> used against api.business.githubcopilot.com
(defun my/copilot-get-session-token ()
 "Exchange OAuth token for Copilot session token. Cached for 30 min."
 (if (and my/copilot-session-token
 (> my/copilot-session-expires (+ (float-time) 300)))
 my/copilot-session-token
 ;; exchange via api.github.com/copilot_internal/v2/token
 ;; ... (see full config in repo)
 (my/copilot-do-token-exchange)))

The session token expires in roughly 30 minutes. The wrapper caches it and refreshes automatically with a 5-minute buffer. You never think about auth after initial setup.

One gotcha that cost me an afternoon: model name normalization. Copilot’s API returns model names with dots (claude-opus-4.6), while Anthropic’s convention uses dashes (claude-opus-4-6). The preference list needs to match against both:

(defun my/model-normalize (name)
 "Normalize model NAME: dots->dashes, strip date suffix."
 (let ((s (if (symbolp name) (symbol-name name) name)))
 (setq s (replace-regexp-in-string "\\." "-" s))
 (replace-regexp-in-string "-[0-9]\\{8\\}$" "" s)))

Dots become dashes, trailing date stamps get stripped. Without this, your preference for claude-opus-4.6 silently never matches anything from Copilot.

Multiple Backends, Dynamic Discovery

Copilot is the primary, but not the only backend. I have three others:

Claude-Max — a proxy to Anthropic’s API running on internal infrastructure, no per-token billing
OpenWebUI — self-hosted, open models for experimentation
Daneel — a custom agent system with its own API

Each backend fetches its available models from the API at startup and caches the result:

(defun my/setup-gptel-backends ()
 "Create all gptel backends with dynamically fetched models."
 (when (member "Copilot" my/llm-enabled-backends)
 (apply #'gptel-make-gh-copilot "Copilot"
 (list :host "api.business.githubcopilot.com"
 :models (my/fetch-copilot-models ...))))
 ;; Claude-Max, OpenWebUI, Daneel similarly...
 )

The preference list picks the best model across all backends. If Copilot is down, Claude-Max takes over automatically. SPC o l R refreshes all backends. A new model appears on Copilot’s API, I hit refresh, and if it ranks higher in preferences, it’s already the default.

Daily Workflows: Rewrite and Chat

Two workflows cover 90% of my LLM usage: rewrite and chat.

gptel-rewrite is the daily driver. Select a region, type an instruction, and the model rewrites the selection in place. The key addition is dispatch mode — after a rewrite completes, you get a menu: Accept, Reject, Diff, or Merge:

;; After rewrite completes: show Accept/Reject/Diff/Merge menu
(after! gptel-rewrite
 (setq gptel-rewrite-default-action 'dispatch))

Accept replaces the region. Reject restores the original. Diff opens ediff. Merge lets you pick hunks. This single setting turned gptel-rewrite from “interesting” to “indispensable.”

Chat buffers use org-mode. Every conversation is a structured document I can export, search, refile. For batch work and scripting, a CLI helper wraps gptel for use in org-babel blocks:

#+begin_src elisp :results raw
(my/gptel-cli "Summarize this error log")
#+end_src

This makes LLM calls composable with other org-babel languages. Shell block produces output, LLM block processes it, Python block handles the result. Pipelines, not chat.

Tool Use and MCP

gptel supports tool use — the model can call functions, not just generate text:

(setq gptel-use-tools t
 gptel-confirm-tool-calls t) ; ask before each call

I keep confirmation on. Letting a model execute arbitrary functions without review defeats the purpose of a transparent setup.

The tool ecosystem has three layers. llm-tool-collection provides filesystem and shell access — read files, run commands. ragmacs adds Emacs introspection — the model can query buffers and read documentation. gptel-got works with org structures.

Then there’s MCP (Model Context Protocol). gptel bridges to MCP servers through mcp-hub:

(setq mcp-hub-servers
 '(("fetch"
 . (:command "uvx" :args ("mcp-server-fetch")))
 ("sequential-thinking"
 . (:command "npx"
 :args ("-y" "@modelcontextprotocol/server-sequential-thinking")))))

mcp-server-fetch lets the model pull web content. sequential-thinking provides a scratchpad for multi-step reasoning. Agent mode (SPC o l A) combines tool use with a planning loop. It works for well-scoped tasks; don’t expect it to handle more than five or six tool calls reliably yet.

Aidermacs: Pair Programming

For actual code changes across multiple files, gptel-rewrite isn’t enough. Aidermacs brings Aider into Emacs — architect/editor pair programming where one model designs and another applies changes:

(setq aidermacs-default-model (my/aider-architect-model)
 aidermacs-default-chat-mode 'architect
 aidermacs-extra-args
 `("--editor-model" ,(my/aider-editor-model)
 "--editor-edit-format" "diff"
 "--no-auto-commits"))

The architect model (typically Opus) proposes changes. The editor model (typically Haiku — fast and cheap) applies them as diffs. This split keeps costs reasonable while maintaining quality for the planning phase.

Aidermacs shares the Copilot auth flow. The same token exchange function provides credentials — no separate auth setup. An auto-generated .aider.model.settings.yml sets the Copilot IDE headers required by the business endpoint.

The corporate proxy needs extra attention. Aider is a Python tool, and Python’s requests library needs its own CA bundle:

REQUESTS_CA_BUNDLE=/path/to/corporate-ca-bundle.crt
SSL_CERT_FILE=/path/to/corporate-ca-bundle.crt

These environment variables get set in the aidermacs process environment. Without them, every Aider request fails with a TLS verification error.

Corporate Proxy: The Elephant in the Room

If you’re on a corporate network with a MITM proxy, you already know the pain. The proxy terminates TLS, re-signs with its own CA, and every HTTPS tool needs to know about it.

For Emacs itself:

;; Trust corporate MITM proxy (adds intermediate CA)
(setq gnutls-verify-error nil
 tls-checktrust nil
 network-security-level 'low)

;; curl handles proxy better than url.el
(setq gptel-use-curl t)

gptel-use-curl t matters. Emacs’s built-in url.el has inconsistent proxy support. curl picks up the system proxy configuration reliably and handles streaming better. The gnutls-verify-error nil settings are a known security trade-off — on a corporate machine where IT controls the network anyway, this is the pragmatic choice.

Three Months In: What I’d Change

What works: gptel-rewrite with dispatch is the single most valuable feature. Multi-backend setup with dynamic discovery means I never worry about model availability. The Copilot integration is solid once the auth plumbing is in place.

What doesn’t: Copilot token refresh occasionally has a race condition — two simultaneous requests can both trigger an exchange, and one gets a stale token. MCP is early: the ecosystem is small, and agent mode falls apart on complex tasks. The corporate proxy config breaks after macOS updates and needs manual fixes.

Recommendation: Start with gptel and one backend. Get comfortable with gptel-rewrite. Add aidermacs when you have a concrete use case. Add tools and MCP only when you’ve hit the ceiling of what chat alone can do. The config described here took weeks to build incrementally — don’t start there.

The full configuration is in my doom-emacs repository.

Day 5 with Daneel: Headless Browsers, Document Pipelines, and the Numbers So Far

Fri, 20 Feb 2026 00:00:00 +0000

Day 5 was the most varied day yet. Not in complexity—some earlier days had harder problems—but in range. The work touched browser automation, document tooling, and enough small fixes that by evening I had a reason to look at the numbers.

Running a Browser Without a Screen

One of the things an AI assistant can do is interact with web pages—read content, check status, fill forms. But this particular setup runs on a headless Linux server. No display, no window manager, no user session.

The obvious approach—install Chrome via Snap—doesn’t work from a systemd service. Snap packages assume a user session with D-Bus and a display server. Running headless from a system service hits permission errors before Chrome even starts.

The fix: install Chrome directly from Google’s .deb repository, bypassing Snap entirely. Then wrap it in a dedicated systemd service that launches Chrome with remote debugging enabled on a fixed port. The AI framework connects via Chrome DevTools Protocol in attach-only mode—it doesn’t launch Chrome, it connects to the already-running instance.

Three components, each solving one problem: the .deb package avoids Snap’s session requirements, the systemd service ensures Chrome survives reboots and can be managed like any other daemon, and the attach-only configuration means the framework doesn’t need to manage browser lifecycle.

The result is invisible when it works. Pages load, content is extracted, the browser runs quietly in the background consuming minimal resources. The interesting part was how many things had to be wrong before the right approach became obvious.

From Org Files to Printed Documents

A separate thread involved document generation. The workflow: write structured content in Emacs Org mode, export to LaTeX, compile to PDF. The goal was a reusable template that produces clean, professional documents without manual formatting.

The template handles the things that usually require tweaking: Czech language support with proper hyphenation, tables that span pages without breaking layout, consistent typography, a styled title page. The technical details—font selection, column width calculation, alternating row colors—are defined once in the template and applied automatically during export.

What made this worth the setup time is the authoring experience afterward. Write content in a plain text file with minimal markup. Run one export command. Get a formatted PDF. No intermediate steps, no manual adjustments, no “fix the table on page 3” cycles.

An Elisp hook handles the part that would otherwise require per-document boilerplate: detecting tables in the document and automatically adding the correct LaTeX attributes based on column count. The author doesn’t need to think about LaTeX at all.

Five Days in Numbers

Day 5 felt like a good point to measure what’s accumulated.

The memory system—the files that let the assistant maintain context across restarts—has grown to over 190 KB across 26 files. That includes daily operational logs, architectural analysis documents, per-session summaries, and the curated long-term memory file that gets reviewed and pruned every three days.

The workspace contains 13 custom scripts covering everything from calendar integration to email processing to automated backups. Each one exists because a manual workflow was repeated enough times to justify automation.

There are 24 git commits in the workspace repository over five days—roughly five per day, tracking configuration changes, new scripts, and memory updates.

The cron system runs scheduled jobs: morning briefings, email monitoring, news digests, weekly reviews, infrastructure checks. Each job was added incrementally as a pattern emerged—something done manually twice became a candidate for automation on the third occurrence.

68 session logs exist from this period. Each represents a conversation or automated task. Some are brief status checks; others span hours of technical work. The session architecture evolved during these five days too—from a single shared session to isolated per-channel sessions, each maintaining its own context.

What the Numbers Don’t Show

The raw counts are less interesting than what they represent: five days of iterative refinement where each day’s problems inform the next day’s automation.

The memory system exists because the assistant forgot things after restarts. The backup scripts exist because I asked “what happens if this machine dies?” The browser automation exists because a web interaction failed and the root cause was architectural, not a bug.

None of this was planned on day one. The roadmap was: set up the assistant, give it access, see what happens. The infrastructure that exists now is the answer to “what happens”—an accumulation of solved problems, each one making the next problem easier to solve.

Five days is not enough to draw conclusions about long-term value. It’s enough to see the pattern: capability compounds. Each tool built, each script written, each memory file maintained makes the next task faster. Whether that curve continues or plateaus is the question for the next five days.