Editing Video by Talking to Hermes

No timeline to drag. No Premiere or Resolve to learn. You describe what you want in plain language, and Hermes probes the video, finds the dead air, checks the frames, and renders the final file.

"One sentence to edit a video."

1 What You Need to Get Started

Two free tools do all the heavy lifting. Hermes checks for them automatically, so you only install them once.

Install the two engines

Open Terminal once and run this. After that, you never touch the terminal again.

brew install ffmpeg mlt
  • ffmpeg — fast cutting and silence detection, great for short clips.
  • mlt (the melt command) — a timeline engine for long videos that won't drift out of sync.

The whole idea

Everything happens inside the Hermes Desktop app chat. You type what you want — "clean up this recording" — and Hermes does the rest. No commands to memorize, no copy-paste, no terminal scrolling by.

The cutting and rendering run locally and cost no tokens. Only the visual frame analysis does — and that can run on your Claude Code or Codex subscription. More on that below.

2 How Hermes Actually Edits

Behind every request, Hermes follows the same proven recipe. You don't run any of this yourself — it's just useful to know what's happening.

Raw Recording → Find the Silence → Keep the Good Parts → Stitch Together → Final Video
                                    ↓
                        Visual Frame Check (at every cut point)

Hermes loads a video editing skill — a built-in instruction set with all the commands, common mistakes, and verification steps. It isn't guessing; it's following a recipe that's already been tested.

Which engine for which job?

ToolWhen it's usedWhy
ffmpegShort clips (under 2 min), quick trims, frame grabsFast, with built-in silence detection and hardware acceleration
meltLong-form, screen recordings, variable frame rateKeeps audio in sync and handles tricky source files cleanly

3 Quick Trims — Short Clips under 2 minutes

Best for quick cleanups and one-off trims. Just point Hermes at the file.

Trim the dead air from this recording: ~/Desktop/demo-clip.mp4

What Hermes does

  1. Inspects the video to read its length, size, and frame rate.
  2. Listens for silent stretches.
  3. Flags any gap of 2 seconds or longer.
  4. Builds a list of the parts worth keeping.
  5. Renders the clean version.

You'll see something like: "53 seconds of dead air, gone. One sentence."

4 Long Recordings — Full Tutorials over 2 minutes

Best for full tutorials and screen recordings. Same request, but Hermes switches to the sturdier engine automatically.

Clean up this screen recording: ~/Desktop/hermes-demo-full.mp4

Why a different engine for long video?

On long screen recordings — especially high-frame-rate 4K captures — the fast method can hang or let the audio slowly drift out of sync. You won't spot it in the file details; you'll hear it on playback. The melt engine avoids both problems, so Hermes reaches for it on anything substantial.

The #1 mistake to know about

This is the single most important gotcha. When telling melt where to cut, time must be written as hh:mm:ss.ms. If you pass a bare decimal like 15.826, melt reads it as frame 15 — and your 12-minute video collapses into 6 seconds. Hermes handles this for you, but it's the thing people trip on most.

✗ Wrong
in=15.826
read as frame 15 — video destroyed
✓ Right
in=00:00:15.826
read as 15.8 seconds — correct

Quality settings, in plain terms

SettingWhat it means
Visually losslessLooks identical to the source (the default)
BalancedGood trade-off between speed and quality
Faster draftQuicker render, slightly larger file — good for previews
Maximum qualitySlowest render, smallest high-quality file

Just say what you want ("render a quick draft" or "max quality, take your time") and Hermes picks the right settings.

Long renders don't block you

Hermes runs big renders in the background and notifies you when they're done. Keep working on other things in the meantime.

5 Frame Analysis — Hermes Can Actually See

This is the part a traditional editor can't do. Silence detection finds where the gaps are; vision analysis confirms whether the cut actually looks clean.

What costs tokens, and what doesn't

The cutting itself — trimming, stitching, rendering — runs locally on your machine and costs no tokens at all. The only step that uses tokens is vision analysis (looking at frames and scenes). And even that can be handed off to a subscription you already pay for: point the skill at your Claude Code plan ("use Claude to analyze") or your Codex plan. Set it once in the skill and the frame-checking runs on your subscription instead of per-token billing.

Check that your cuts are clean

Check the cut points — are they clean?

Hermes grabs a frame at each cut and looks at it, then reports back in the chat:

"Frame at 2:20 — terminal window, command finished, cursor blinking. Clean cut point. ✓"

This catches cuts that land mid-word or on a loading screen that's about to finish — things audio detection alone would miss.

Break down a whole video without watching it

Analyze this cooking video — give me the scene breakdown with timestamps: 2026-5-18-Cooking Video.mp4

For a 44-minute video, Hermes samples a frame roughly every minute, identifies what's happening in each, cross-checks it against the spoken audio, and builds a timeline:

00:00 — Ingredients setup, strawberries on counter
01:00 — Gelatin blooming in bowl
03:00 — Real strawberry closeup
05:00 — Blender assembly
07:00 — Transfer mixture to pot
09:00 — Stovetop cooking
11:00 — Filling bear molds

44 minutes of video, broken into scenes, without watching a second of it.

Get your publishing metadata too

The same pass can generate a title, description, and timestamped chapters for YouTube — written straight to a file on your Desktop. One conversation.

6 Edit From Anywhere

The same agent and the same skills work on Desktop and through Telegram. You don't have to be at your desk.

Hermes Desktop

You watch the agent's reasoning in the chat — what it found, the plan it made, frame previews appearing inline, and a final report like "Done. Output is 9:25, removed 4 gaps totaling 53s." No terminal window anywhere on screen.

Telegram

Someone sends you a video on your phone? Forward it to Hermes and say "clean this up." Same agent, same pipeline, same result — no app to open, no desk required.

Why this beats running commands yourself

On your ownWith Hermes
You remember the commandsThe agent knows the workflow
You debug errors aloneThe agent avoids the known pitfalls
You check frames manuallyVisual checks happen inline
You write the metadataTitles, descriptions, and chapters are generated

7 The Three Things to Remember

1. Time format on long edits

Always hh:mm:ss.ms, never a bare decimal — a stray decimal turns minutes into seconds. Hermes handles it, but it's the most common manual error.

2. Long recordings need the sturdier engine

The fast method drifts out of sync on long, high-frame-rate screen recordings. Hermes switches engines automatically for anything substantial.

3. Vision-check your cuts

Silence detection is audio-only. Your eyes — and Hermes's frame checks — verify that a cut actually looks clean.

Quick Reference

SituationWhat to say to Hermes
Short clip, quick trim"Trim the dead air from this clip: [file]"
Long tutorial or screen recording"Clean up this screen recording: [file]"
Verify the cuts look right"Check the cut points — are they clean?"
Understand a long video fast"Give me a scene breakdown with timestamps: [file]"
Get publishing info"Write a YouTube title, description, and chapters for this."

One-time setup checklist

  • Run brew install ffmpeg mlt once in Terminal
  • Open the Hermes Desktop app (or use Telegram)
  • Have your video file ready (Desktop is easiest)
  • Describe what you want in plain language — that's it