K@emacs on Martin Sukany

LLMs in Emacs: My Actual gptel Setup

Mon, 23 Mar 2026 00:00:00 +0000

I’ve been using gptel daily for three months now. This isn’t a review — it’s a field report from someone running LLMs inside Emacs on a corporate macOS machine with a MITM proxy, compliance requirements, and zero patience for black-box tooling.

Why Emacs for LLM Work

gptel is a thin client. It sends text to an API, gets text back. That’s it. No hidden prompt injection, no telemetry you can’t inspect, no magic. You see exactly what goes over the wire.

I came from VS Code’s Copilot Chat. It works fine until you need to understand what it’s actually doing. Which model is it using right now? What’s in the system prompt? Can I route this through a different backend? The answer is always: you can’t, or you need an extension that half-works.

gptel gives you full control because there’s nothing to control. It’s Emacs — the config is the product. Every backend, every model, every parameter is an elisp variable you can inspect and change at runtime.

The corporate context matters here. I’m on a work macOS with a MITM proxy that intercepts TLS. Compliance says data must not be retained by third parties. I need to know exactly where my prompts go. With gptel, I do.

Three months in, I can say: gptel is not the most polished LLM interface. It is the most transparent one.

One Config to Rule Them All

The first thing I did was centralize. One elisp file controls both gptel and aidermacs. One variable switches the default backend:

;; One line to switch the default for both gptel and aidermacs:
(defvar my/llm-default-backend "Copilot")
;; (defvar my/llm-default-backend "Claude-Max") ; personal machine

The second piece is a preference list. Backends expose different models — Copilot gives you Claude, GPT-5, Gemini through one API. The preference list picks the best available model automatically:

(defvar my/gptel-model-preferences
 '(claude-opus-4.6 claude-opus-4.5
 claude-sonnet-4.6
 gpt-5.4 gpt-5.2 gpt-4o
 gemini-3.1-pro-preview)
 "First match from dynamically fetched models wins.")

When I switch machines or a model disappears from an API, the preference list falls through to the next option. No breakage, no manual editing. This pattern scales to any number of backends — everything downstream (gptel, aidermacs, org-babel helpers) reads from the same source.

GitHub Copilot for Business as Primary Backend

Why Copilot? Compliance. GitHub Copilot for Business does not retain prompts or completions — that’s contractual, not just a policy page. For a corporate environment where data retention matters, this is the deciding factor.

The bonus is access. One Copilot subscription gives you Claude, GPT-5, Gemini, and others through a single API. No separate billing, no individual API keys. IT signs one contract, I get a model zoo.

The auth flow uses a two-stage token exchange. You start with an OAuth token stored locally by the GitHub Copilot VS Code extension in ~/.config/github-copilot/apps.json. That token gets exchanged for a short-lived session token via GitHub’s API:

;; OAuth token from ~/.config/github-copilot/apps.json
;; -> exchanged for short-lived session token (TTL ~30 min)
;; -> used against api.business.githubcopilot.com
(defun my/copilot-get-session-token ()
 "Exchange OAuth token for Copilot session token. Cached for 30 min."
 (if (and my/copilot-session-token
 (> my/copilot-session-expires (+ (float-time) 300)))
 my/copilot-session-token
 ;; exchange via api.github.com/copilot_internal/v2/token
 ;; ... (see full config in repo)
 (my/copilot-do-token-exchange)))

The session token expires in roughly 30 minutes. The wrapper caches it and refreshes automatically with a 5-minute buffer. You never think about auth after initial setup.

One gotcha that cost me an afternoon: model name normalization. Copilot’s API returns model names with dots (claude-opus-4.6), while Anthropic’s convention uses dashes (claude-opus-4-6). The preference list needs to match against both:

(defun my/model-normalize (name)
 "Normalize model NAME: dots->dashes, strip date suffix."
 (let ((s (if (symbolp name) (symbol-name name) name)))
 (setq s (replace-regexp-in-string "\\." "-" s))
 (replace-regexp-in-string "-[0-9]\\{8\\}$" "" s)))

Dots become dashes, trailing date stamps get stripped. Without this, your preference for claude-opus-4.6 silently never matches anything from Copilot.

Multiple Backends, Dynamic Discovery

Copilot is the primary, but not the only backend. I have three others:

Claude-Max — a proxy to Anthropic’s API running on internal infrastructure, no per-token billing
OpenWebUI — self-hosted, open models for experimentation
Daneel — a custom agent system with its own API

Each backend fetches its available models from the API at startup and caches the result:

(defun my/setup-gptel-backends ()
 "Create all gptel backends with dynamically fetched models."
 (when (member "Copilot" my/llm-enabled-backends)
 (apply #'gptel-make-gh-copilot "Copilot"
 (list :host "api.business.githubcopilot.com"
 :models (my/fetch-copilot-models ...))))
 ;; Claude-Max, OpenWebUI, Daneel similarly...
 )

The preference list picks the best model across all backends. If Copilot is down, Claude-Max takes over automatically. SPC o l R refreshes all backends. A new model appears on Copilot’s API, I hit refresh, and if it ranks higher in preferences, it’s already the default.

Daily Workflows: Rewrite and Chat

Two workflows cover 90% of my LLM usage: rewrite and chat.

gptel-rewrite is the daily driver. Select a region, type an instruction, and the model rewrites the selection in place. The key addition is dispatch mode — after a rewrite completes, you get a menu: Accept, Reject, Diff, or Merge:

;; After rewrite completes: show Accept/Reject/Diff/Merge menu
(after! gptel-rewrite
 (setq gptel-rewrite-default-action 'dispatch))

Accept replaces the region. Reject restores the original. Diff opens ediff. Merge lets you pick hunks. This single setting turned gptel-rewrite from “interesting” to “indispensable.”

Chat buffers use org-mode. Every conversation is a structured document I can export, search, refile. For batch work and scripting, a CLI helper wraps gptel for use in org-babel blocks:

#+begin_src elisp :results raw
(my/gptel-cli "Summarize this error log")
#+end_src

This makes LLM calls composable with other org-babel languages. Shell block produces output, LLM block processes it, Python block handles the result. Pipelines, not chat.

Tool Use and MCP

gptel supports tool use — the model can call functions, not just generate text:

(setq gptel-use-tools t
 gptel-confirm-tool-calls t) ; ask before each call

I keep confirmation on. Letting a model execute arbitrary functions without review defeats the purpose of a transparent setup.

The tool ecosystem has three layers. llm-tool-collection provides filesystem and shell access — read files, run commands. ragmacs adds Emacs introspection — the model can query buffers and read documentation. gptel-got works with org structures.

Then there’s MCP (Model Context Protocol). gptel bridges to MCP servers through mcp-hub:

(setq mcp-hub-servers
 '(("fetch"
 . (:command "uvx" :args ("mcp-server-fetch")))
 ("sequential-thinking"
 . (:command "npx"
 :args ("-y" "@modelcontextprotocol/server-sequential-thinking")))))

mcp-server-fetch lets the model pull web content. sequential-thinking provides a scratchpad for multi-step reasoning. Agent mode (SPC o l A) combines tool use with a planning loop. It works for well-scoped tasks; don’t expect it to handle more than five or six tool calls reliably yet.

Aidermacs: Pair Programming

For actual code changes across multiple files, gptel-rewrite isn’t enough. Aidermacs brings Aider into Emacs — architect/editor pair programming where one model designs and another applies changes:

(setq aidermacs-default-model (my/aider-architect-model)
 aidermacs-default-chat-mode 'architect
 aidermacs-extra-args
 `("--editor-model" ,(my/aider-editor-model)
 "--editor-edit-format" "diff"
 "--no-auto-commits"))

The architect model (typically Opus) proposes changes. The editor model (typically Haiku — fast and cheap) applies them as diffs. This split keeps costs reasonable while maintaining quality for the planning phase.

Aidermacs shares the Copilot auth flow. The same token exchange function provides credentials — no separate auth setup. An auto-generated .aider.model.settings.yml sets the Copilot IDE headers required by the business endpoint.

The corporate proxy needs extra attention. Aider is a Python tool, and Python’s requests library needs its own CA bundle:

REQUESTS_CA_BUNDLE=/path/to/corporate-ca-bundle.crt
SSL_CERT_FILE=/path/to/corporate-ca-bundle.crt

These environment variables get set in the aidermacs process environment. Without them, every Aider request fails with a TLS verification error.

Corporate Proxy: The Elephant in the Room

If you’re on a corporate network with a MITM proxy, you already know the pain. The proxy terminates TLS, re-signs with its own CA, and every HTTPS tool needs to know about it.

For Emacs itself:

;; Trust corporate MITM proxy (adds intermediate CA)
(setq gnutls-verify-error nil
 tls-checktrust nil
 network-security-level 'low)

;; curl handles proxy better than url.el
(setq gptel-use-curl t)

gptel-use-curl t matters. Emacs’s built-in url.el has inconsistent proxy support. curl picks up the system proxy configuration reliably and handles streaming better. The gnutls-verify-error nil settings are a known security trade-off — on a corporate machine where IT controls the network anyway, this is the pragmatic choice.

Three Months In: What I’d Change

What works: gptel-rewrite with dispatch is the single most valuable feature. Multi-backend setup with dynamic discovery means I never worry about model availability. The Copilot integration is solid once the auth plumbing is in place.

What doesn’t: Copilot token refresh occasionally has a race condition — two simultaneous requests can both trigger an exchange, and one gets a stale token. MCP is early: the ecosystem is small, and agent mode falls apart on complex tasks. The corporate proxy config breaks after macOS updates and needs manual fixes.

Recommendation: Start with gptel and one backend. Get comfortable with gptel-rewrite. Add aidermacs when you have a concrete use case. Add tools and MCP only when you’ve hit the ceiling of what chat alone can do. The config described here took weeks to build incrementally — don’t start there.

The full configuration is in my doom-emacs repository.

Bridging reMarkable and Emacs Org-mode

Thu, 12 Mar 2026 00:00:00 +0000

Why I still write on paper

There’s something that happens when I pick up a pen that doesn’t happen when I open a new buffer. The thinking is different — less filtered, less structured, more honest. Ideas that would never survive the friction of forming a heading and choosing a tag actually get written down. First drafts of decisions, rough task lists, things I’m trying to work out, all of it lands on paper before it’s ready to be digital.

I’ve tried replacing this with digital tools. Org-capture is excellent for structured input, but for capturing a fleeting thought mid-meeting or sketching out a problem while commuting, I still reach for paper. The reMarkable is my compromise: it’s close enough to writing on paper that it doesn’t disrupt the thinking, and close enough to a computer that the notes don’t stay trapped on dead wood.

The problem is that notes on a reMarkable and notes in Org-mode don’t naturally talk to each other. This post is about the pipeline I built to close that gap.

The two tools and why they complement each other

The reMarkable is good at one thing: letting you write without getting in the way. The e-ink display doesn’t glow, doesn’t notify you, doesn’t tempt you to check anything. The battery lasts days. The pen latency is low enough to feel like paper. Cloud sync happens automatically in the background — you don’t think about it. For first-pass capture of any kind of thinking, it’s hard to beat.

Org-mode is good at different things. It’s plain text, version-controllable, programmable. It integrates with agenda, GTD-style workflows, time tracking, archiving. When information is in an .org file, the full Emacs ecosystem is available — you can schedule it, tag it, refile it, clock time on it, link it to other notes. For organizing and acting on information, it’s where I want everything to end up.

The gap is obvious. reMarkable is where I write things down. Org-mode is where things become actionable. Without a bridge, I was manually transcribing notes — which defeats most of the point of capturing on the device in the first place.

How the sync pipeline works

The pipeline has three stages: download, recognize, and structure.

Download is handled by rmapi, a command-line client for the reMarkable cloud API. It downloads notebooks as .rmdoc files — the native reMarkable format, which is essentially a zip archive containing per-page binary stroke data and metadata. I run this for each notebook I want to sync.

Recognition is where handwriting becomes text. I use the MyScript Cloud API, which accepts raw reMarkable page data and returns recognized text. The API is HMAC-authenticated, accepts batched requests, and handles Latin-script handwriting well enough for practical use. The free tier covers 2000 pages per month, which is more than I need.

One piece of engineering worth calling out: hash-based deduplication. Before sending a page to the API, the script computes a hash of the page content and compares it against a local cache. If the page hasn’t changed since the last run, it’s skipped. This matters in practice — a 20-page notebook where you’ve written 2 new pages today sends 2 pages to the API, not 20. Quota is preserved, and the run takes seconds.

Structuring is handled by a Python script that takes the raw recognized text from MyScript and organizes it into Org format. Each notebook becomes an .org file; pages become headings or entries under headings, depending on how the notebook is organized. The results land in emacs-org/remarkable/, organized by notebook name. A notebook called “Projects” on the device becomes emacs-org/remarkable/Projects.org on disk.

The whole thing runs nightly via cron at 02:00. By the time I open Emacs in the morning, yesterday’s handwritten notes are already there.

Notebook structure on the device

I keep three notebooks on the reMarkable that feed into this pipeline: Projects, Areas, and Quick Sheets.

Projects holds working notes tied to specific projects — meeting notes, rough plans, things I’m working through. Areas holds reference material and ongoing concerns that don’t have a deadline. Quick Sheets is the capture inbox: whatever I write when I don’t want to think about where it belongs yet. Random ideas, to-do items, things to look up, fragments of thought.

After nightly sync, Quick Sheets becomes the Org file I triage first. Something like “Call Jan re: contract renewal — deadline Friday” appears as a plain text entry. Turning it into a scheduled TODO in Emacs takes ten seconds. The capture happened on paper; the action lives in the agenda.

What works well

HWR accuracy for clean, reasonably paced handwriting is high enough to be useful without heavy editing. Not perfect — there are occasional word-level errors — but close enough that the recognized text is readable and the context is always restorable even when individual words are off.

The automation itself is reliable. Once the cron job is set up, I don’t think about it. Notes appear. The hash deduplication means I can re-run the script manually without worrying about duplicates accumulating.

The Org integration is the payoff. Once notes are in .org files, everything Emacs offers is available. Tags, scheduling, refiling into project files, linking to related notes — none of that required any special work on the reMarkable side. The device just needed to get the text into a file.

Limitations and honest trade-offs

The Czech diacritics problem is real. Fast or slightly sloppy handwriting, especially with Czech-specific characters like ě, š, č, ř, produces more recognition errors than clean Latin script. “Přečíst knihu o Kafkovi” might come out as “Precist knihu o Kafkovi”. Readable and context-restorable, but not clean. For notes that matter, a proofread pass is necessary.

Symbols don’t transfer. Diagrams, arrows, flowcharts, mathematical notation — anything that isn’t text is either garbled or empty in the output. A page with a hand-drawn architecture diagram produces only the text labels, if anything. This is a known constraint of the HWR approach, not a bug in the implementation.

The pipeline is one-directional and always will be. reMarkable is a capture device. You write on it; the notes flow to Org. Nothing flows back. This is fine for my workflow, but worth stating clearly.

There are two external dependencies worth noting. rmapi requires reMarkable cloud credentials — if you don’t want to use the cloud, this pipeline doesn’t work as described. MyScript is a third-party API that requires registration and could change pricing or availability. The free tier has been stable, but it’s not self-hosted.

Sync is nightly, not real-time. If I write something at 10pm and need it in Emacs immediately, I run the script by hand. But the default cadence is once a night, and for most of what I write, that’s enough.

How to replicate this

The setup requires some comfort with Python, command-line tools, and cron. It’s not complex, but it’s also not a one-click install.

Tools you need:

rmapi — CLI for reMarkable cloud. Handles authentication and .rmdoc download. (GitHub: juruen/rmapi)
MyScript Cloud API — Handwriting recognition. Register at developer.myscript.com for a free API key. The batch endpoint accepts reMarkable page data directly.
Python 3 — For the post-processing script that structures recognized text into Org format. Standard library is sufficient; no exotic dependencies.
cron — For nightly scheduling.

The rough flow: configure rmapi with your reMarkable credentials, register for a MyScript API key, wire up the Python script to call both, point it at your Org directory, and schedule it. The deduplication cache is a simple JSON file mapping page hashes to recognized text — straightforward to implement.

I’m not publishing the script as a ready-made package because it’s too tied to my specific notebook structure and Org conventions. But the components are all documented, the API is straightforward, and the overall architecture is simple enough to re-implement in an afternoon.

The result: I write on paper and my agenda knows about it by morning. Not magic — just a cron job and a working API key.

Fixing macOS Zoom "Follow Keyboard Focus" in GNU Emacs

Mon, 23 Feb 2026 00:00:00 +0000

I run macOS Accessibility Zoom at 16× magnification. Not occasionally — all the time, every day. Apple’s built-in screen magnifier has a mode called “Follow keyboard focus” that’s supposed to track your text cursor as you type, keeping it visible on screen. Every app I use does this correctly. Terminal, VS Code, Safari, iTerm2 — they all work. Emacs did not.

For years.

I type something. The cursor moves. The Zoom viewport doesn’t follow. I have to scroll manually to find it again. Then type another character. Repeat. If you’re reading this without needing magnification, the description might sound like a minor inconvenience. It isn’t. It’s the kind of friction that makes a tool feel broken — and I use Emacs all day.

So I finally decided to fix it properly.

The Obvious First Attempt

The “Follow keyboard focus” feature in macOS Zoom is event-driven. When a focused UI element changes, Zoom picks it up via the Accessibility API and moves the viewport to where that element is on screen. The standard mechanism for announcing these changes is NSAccessibilityPostNotification().

Seemed straightforward: after each cursor draw, post a notification telling Zoom the selection changed.

NSAccessibilityPostNotification(view, NSAccessibilitySelectedTextChangedNotification);
NSAccessibilityPostNotification(view, NSAccessibilityFocusedUIElementChangedNotification);

I added this to ns_draw_window_cursor() in nsterm.m, rebuilt, tested.

Nothing.

The viewport didn’t move at all.

Here’s why. When Zoom receives AXSelectedTextChanged or AXFocusedUIElementChanged, it doesn’t just accept the notification and move on — it queries back. It calls AXBoundsForRange on the focused element to find out where the cursor actually is. To answer that query, the view needs to conform to the NSAccessibility protocol and implement accessibilityBoundsForRange:.

EmacsView — the main Emacs drawing surface in nsterm.m — is a subclass of NSView. It does not declare NSAccessibility protocol conformance. There’s no @interface EmacsView () <NSAccessibility>, no implementation of accessibilityBoundsForRange:. So when Zoom posts the query, it gets nothing back. No bounds. Zoom shrugs and does nothing.

The notification fires. Zoom hears it. Zoom asks “okay, so where is the cursor?” Emacs cannot answer. The viewport stays put.

I could have gone down the road of implementing proper NSAccessibility conformance on EmacsView. That would technically work. It would also be a massive undertaking — you’d need a full accessibility tree, element hierarchy, all the associated protocol methods. A multi-month project, not a patch. I needed something more surgical.

Finding the Real Answer

When you’re stuck on an obscure macOS API problem, the most useful thing you can do is read the source code of other apps that solved the same problem. iTerm2 is open source. So is Chromium.

Both of them have exactly the same situation as Emacs: a custom NSView for terminal or browser rendering that doesn’t expose a full accessibility tree. And both of them needed Zoom to follow the text cursor. I went looking for how they handled it.

In iTerm2’s PTYTextView.m, there’s a method called refreshAccessibility. It calls a function I hadn’t seen before: UAZoomChangeFocus().

Chromium’s render_widget_host_view_mac.mm does the same thing in its cursor tracking code.

UAZoomChangeFocus() is part of HIServices/UniversalAccess.h, accessible via Carbon/Carbon.h. It’s a Carbon-era API that speaks directly to the Zoom subsystem — bypassing the Accessibility notification infrastructure entirely. No protocol conformance required. No callback. No “where is the cursor?” query. You just call it with the cursor’s screen coordinates and it moves the viewport.

The signature:

OSStatus UAZoomChangeFocus(const CGRect *focusedItemBounds,
 const CGRect *caretBounds,
 UAZoomFocusType focusType);

The focusType argument is kUAZoomFocusTypeInsertionPoint, which tells Zoom this is a text cursor — triggering exactly the keyboard focus tracking behavior I needed.

This was the real fix. Not a notification, not an accessibility protocol — a direct API call with explicit coordinates.

The Fix, and a Coordinate Problem

The implementation goes into ns_draw_window_cursor() in nsterm.m, inside the #ifdef NS_IMPL_COCOA block. When Emacs draws the cursor, we know exactly where it is in view-local coordinates. From there it’s a coordinate conversion chain:

Convert cursor rect from view-local to window coordinates
Convert window coordinates to screen coordinates (AppKit convention)
Convert to CGRect
Call UAZoomChangeFocus()

Steps 1–3 are standard AppKit. Step 4 would have been trivial — except for a coordinate system mismatch that took me a while to sort out.

macOS has two coordinate conventions. AppKit (NSView, NSWindow) uses bottom-left as the origin, with y increasing upward. CoreGraphics (CGRect, HIServices) uses top-left as the origin, with y increasing downward. UAZoomChangeFocus() expects CoreGraphics screen coordinates.

The natural way to do this conversion is accessibilityConvertScreenRect:, which handles the y-flip for you. But — and here’s the catch — that method is declared on objects that conform to the NSAccessibility protocol. Which, as we’ve established, EmacsView does not.

I tried calling it anyway. Compilation error.

So: manual y-flip. The primary screen height is the key reference point:

CGFloat primaryH = [[[NSScreen screens] firstObject] frame].size.height;
cgRect.origin.y = primaryH - cgRect.origin.y - cgRect.size.height;

This converts the AppKit screen y-coordinate to the CoreGraphics y-coordinate using simple arithmetic. No protocol conformance needed.

The full implementation:

if (UAZoomEnabled()) {
 NSRect windowRect = [view convertRect:r toView:nil];
 NSRect screenRect = [[view window] convertRectToScreen:windowRect];
 CGRect cgRect = NSRectToCGRect(screenRect);
 CGFloat primaryH = [[[NSScreen screens] firstObject] frame].size.height;
 cgRect.origin.y = primaryH - cgRect.origin.y - cgRect.size.height;
 UAZoomChangeFocus(&cgRect, &cgRect, kUAZoomFocusTypeInsertionPoint);
}

The UAZoomEnabled() check at the top is important — it avoids any overhead when Zoom isn’t active, so there’s no performance cost for the common case.

One build note: after patching and rebuilding Emacs.app, you need to re-grant Accessibility permission in System Settings. The binary hash changes, macOS treats it as a new application, and Accessibility permissions are keyed to the binary.

It Works

After the rebuild, I enabled Zoom at 16×, opened Emacs, started typing. The viewport followed the cursor. Every character, every line movement, every jump across the file — Zoom tracked it.

I typed in Emacs for ten minutes just to make sure I wasn’t imagining it.

The fix itself is small — about fifteen lines of Objective-C in ns_draw_window_cursor(). The journey to find it was longer: trying the notification approach, understanding why it failed, going through the Accessibility API documentation, reading iTerm2 and Chromium source, finding UAZoomChangeFocus(), working through the coordinate system issue, hitting the compilation error on accessibilityConvertScreenRect:, figuring out the manual y-flip. That’s the actual work. The patch is just the result of it.

The patch has been submitted to the GNU Emacs developers. Hopefully it lands in a future release so nobody else has to track this down. This was a long-standing problem — Emacs being the one editor on macOS that didn’t work with “Follow keyboard focus” — and it’s finally resolved. Everything works beautifully now.

If you’re building a custom NSView on macOS and need Zoom compatibility, skip the accessibility notification approach and go straight to UAZoomChangeFocus(). That’s the right tool for the job.