No timeline to drag. No Premiere or Resolve to learn. You describe what you want in plain language, and Hermes probes the video, finds the dead air, checks the frames, and renders the final file.
Two free tools do all the heavy lifting. Hermes checks for them automatically, so you only install them once.
Open Terminal once and run this. After that, you never touch the terminal again.
brew install ffmpeg mlt
melt command) — a timeline engine for long videos that won't drift out of sync.Everything happens inside the Hermes Desktop app chat. You type what you want — "clean up this recording" — and Hermes does the rest. No commands to memorize, no copy-paste, no terminal scrolling by.
The cutting and rendering run locally and cost no tokens. Only the visual frame analysis does — and that can run on your Claude Code or Codex subscription. More on that below.
Behind every request, Hermes follows the same proven recipe. You don't run any of this yourself — it's just useful to know what's happening.
Raw Recording → Find the Silence → Keep the Good Parts → Stitch Together → Final Video
↓
Visual Frame Check (at every cut point)
Hermes loads a video editing skill — a built-in instruction set with all the commands, common mistakes, and verification steps. It isn't guessing; it's following a recipe that's already been tested.
| Tool | When it's used | Why |
|---|---|---|
| ffmpeg | Short clips (under 2 min), quick trims, frame grabs | Fast, with built-in silence detection and hardware acceleration |
| melt | Long-form, screen recordings, variable frame rate | Keeps audio in sync and handles tricky source files cleanly |
Best for quick cleanups and one-off trims. Just point Hermes at the file.
You'll see something like: "53 seconds of dead air, gone. One sentence."
Best for full tutorials and screen recordings. Same request, but Hermes switches to the sturdier engine automatically.
On long screen recordings — especially high-frame-rate 4K captures — the fast method can hang or let the audio slowly drift out of sync. You won't spot it in the file details; you'll hear it on playback. The melt engine avoids both problems, so Hermes reaches for it on anything substantial.
This is the single most important gotcha. When telling melt where to cut, time must be written as hh:mm:ss.ms. If you pass a bare decimal like 15.826, melt reads it as frame 15 — and your 12-minute video collapses into 6 seconds. Hermes handles this for you, but it's the thing people trip on most.
in=15.826in=00:00:15.826| Setting | What it means |
|---|---|
| Visually lossless | Looks identical to the source (the default) |
| Balanced | Good trade-off between speed and quality |
| Faster draft | Quicker render, slightly larger file — good for previews |
| Maximum quality | Slowest render, smallest high-quality file |
Just say what you want ("render a quick draft" or "max quality, take your time") and Hermes picks the right settings.
Hermes runs big renders in the background and notifies you when they're done. Keep working on other things in the meantime.
This is the part a traditional editor can't do. Silence detection finds where the gaps are; vision analysis confirms whether the cut actually looks clean.
The cutting itself — trimming, stitching, rendering — runs locally on your machine and costs no tokens at all. The only step that uses tokens is vision analysis (looking at frames and scenes). And even that can be handed off to a subscription you already pay for: point the skill at your Claude Code plan ("use Claude to analyze") or your Codex plan. Set it once in the skill and the frame-checking runs on your subscription instead of per-token billing.
Hermes grabs a frame at each cut and looks at it, then reports back in the chat:
"Frame at 2:20 — terminal window, command finished, cursor blinking. Clean cut point. ✓"
This catches cuts that land mid-word or on a loading screen that's about to finish — things audio detection alone would miss.
For a 44-minute video, Hermes samples a frame roughly every minute, identifies what's happening in each, cross-checks it against the spoken audio, and builds a timeline:
00:00 — Ingredients setup, strawberries on counter
01:00 — Gelatin blooming in bowl
03:00 — Real strawberry closeup
05:00 — Blender assembly
07:00 — Transfer mixture to pot
09:00 — Stovetop cooking
11:00 — Filling bear molds
44 minutes of video, broken into scenes, without watching a second of it.
The same pass can generate a title, description, and timestamped chapters for YouTube — written straight to a file on your Desktop. One conversation.
The same agent and the same skills work on Desktop and through Telegram. You don't have to be at your desk.
You watch the agent's reasoning in the chat — what it found, the plan it made, frame previews appearing inline, and a final report like "Done. Output is 9:25, removed 4 gaps totaling 53s." No terminal window anywhere on screen.
Someone sends you a video on your phone? Forward it to Hermes and say "clean this up." Same agent, same pipeline, same result — no app to open, no desk required.
| On your own | With Hermes |
|---|---|
| You remember the commands | The agent knows the workflow |
| You debug errors alone | The agent avoids the known pitfalls |
| You check frames manually | Visual checks happen inline |
| You write the metadata | Titles, descriptions, and chapters are generated |
Always hh:mm:ss.ms, never a bare decimal — a stray decimal turns minutes into seconds. Hermes handles it, but it's the most common manual error.
The fast method drifts out of sync on long, high-frame-rate screen recordings. Hermes switches engines automatically for anything substantial.
Silence detection is audio-only. Your eyes — and Hermes's frame checks — verify that a cut actually looks clean.
| Situation | What to say to Hermes |
|---|---|
| Short clip, quick trim | "Trim the dead air from this clip: [file]" |
| Long tutorial or screen recording | "Clean up this screen recording: [file]" |
| Verify the cuts look right | "Check the cut points — are they clean?" |
| Understand a long video fast | "Give me a scene breakdown with timestamps: [file]" |
| Get publishing info | "Write a YouTube title, description, and chapters for this." |
brew install ffmpeg mlt once in Terminal