This website uses cookies

Read our Privacy policy and Terms of use for more information.

Welcome to Today’s AIography!

Good afternoon, AI filmmakers.

The text-to-video arena ran a fresh round of blind voting this week, and the standings shifted in a way most of the AI press is going to miss. xAI's Grok Imagine is now in the top five. Veo 3.1 fell to seventeenth. The four-way fight at the top of the board is HappyHorse-1.0, Seedance 2.0, Kling 3.0, and Grok.

That is a different leaderboard than most people in this audience are tooling for.

The other thing that happened this month: agentic video editors stopped being one product and became a category. Two of them shipped, with completely different architectures, in the same window. That is what category formation looks like.

This week's letter is a tools letter. New leaderboard, new CLI, new agentic-editing category, new workflow case study, new venue for the conference that matters. One closing thought on the cost of staying loyal to the wrong tool.

Let me walk you through what I saw.

In today’s AIography:

  • What I’m Thinking About

    • The Arena Leaderboard moved. Here’s what it actually says

    • nbcraft just dropped one CLI, four video and image backends.

    • Two agentic editors dropped this month. The pattern is the news.

    • A 50-minute film built on a phone strapped to a bike helmet.

    • AI on the Lot 2026 is on a real studio lot now.

  • Action Item

  • Short Takes

  • Essential Tools

  • Final Thoughts

Read time: About 8 minutes

WHAT I’M THINKING ABOUT

The Artificial Analysis Video Arena uses blind comparisons and Elo scoring. Two videos from the same prompt, two anonymous models, the human votes. As of this week:

  1. HappyHorse-1.0 (Alibaba). Elo 1,354.

  2. Dreamina Seedance 2.0 720p. Elo 1,271.

  3. Kling 3.0 1080p Pro. Elo 1,249.

  4. Kling 3.0 Omni 1080p Pro. Elo 1,233.

  5. Grok Imagine. Elo 1,233.

Veo 3.1 sits at seventeen. Runway Gen-4.5 sits at eleven.

If you came up reading the same press cycles I did, your gut model of the leaderboard is probably Veo on top, Kling close behind, Runway hanging in there. That model is wrong now. Half the AI filmmaking creators on X are still treating Veo as the gold standard. The blind-comparison data says viewers prefer the Alibaba model to the Google One by a 146-point Elo gap. That gap is enormous.

The Grok entry is the part that should make you sit up. xAI was not in the top twenty last month. They are tied for fourth this week. Their first video model debuted strong and the audience voted with it. Whether you like Elon's company or not is beside the point. The model is competitive on output, and the audience saw it without knowing the brand.

Here is what to do this week. Pull the same prompt you trust most, the one you use to test whether a new model is worth your time. Run it through Grok Imagine, HappyHorse, and Seedance side by side. Watch the result the way you watch dailies. The model that wins your eye on your prompt is the model you should be routing to for that kind of shot. Stop treating model selection as a brand decision. It is a frame decision. The arena is telling us the brand answers are stale.

If you have ever sat in front of three browser tabs, three API keys, and three slightly different prompt boxes trying to do the same thing across Gemini, Seedance, and gpt-image-2, this is for you.

nbcraft is a small Python CLI that fronts four backends from a single command line. Google Gemini for stills. Alibaba DashScope for Wan image models. ByteDance Volcengine Ark for both Seedream stills and Seedance video. OpenAI for gpt-image-2. One install, one config, one syntax.

The flag set covers what most working creators actually use. Aspect ratio. Resolution up to 4K where the backend supports it. Reference images, with limits that vary by backend (Gemini and Seedream allow up to 14, OpenAI goes to 16, DashScope caps at 9). Repeat batches. Prompt-from-file with the @prompt.txt notation.

Here is the honest part. nbcraft does not expose explicit seed flags or universal negative-prompt flags in its current docs. Negative prompts are documented as backend-specific. DashScope handles them natively, the others fold them into the main prompt or ignore them. If you depend on seed-locked iterations for shot consistency across a sequence, this CLI is not yet your tool. If you depend on running the same brief across multiple backends to compare output quickly, it absolutely is.

The license is MIT. Install is one pip command. The repo crossed fifty stars in a week. Worth a Wednesday afternoon to test.

For working creators, the question is whether the wrapper saves more time than it costs. The honest answer is: yes for prompt comparison passes, no for production-locked shots. Use it the way you would use a digital asset manager. To find the version you want before you commit to the version you ship.

Image generated with ChatGPT Image 2

This is the section I have been waiting to write.

Two agentic video editors shipped on GitHub inside the last month, with completely different design choices, from completely different teams. When two architecturally opposite solutions land in the same window, you are not watching a tool launch. You are watching a category form.

Product one is video-use, from the team behind browser-use. The browser-use crew is a known shipping team whose browser-automation agent framework has become real infrastructure inside the agent-tools community. Their video editor sits on top of any coding agent with shell access. Claude Code, Codex, Hermes, OpenClaw. The agent reads your raw footage, runs FFmpeg under the hood, transcribes through ElevenLabs Scribe, removes filler words, applies color grading, adds thirty-millisecond audio fades at cut points, burns in subtitles, generates animation overlays through HyperFrames or Remotion or Manim or PIL. It self-evaluates rendered output at cut boundaries. You feed it a folder, you give it a free-form prompt, you get a final.mp4. Six thousand five hundred stars in twenty-four days. That is roughly two hundred and seventy stars a day, which puts it in the top tier of agent-tooling repos this year. The framework move.

Product two is agentic-video-editor, from a solo developer named poseljacob. Different architecture, same goal. agentic-video-editor is a hard-coded four-agent pipeline using Gemini specifically. A Director agent picks shots from your footage and produces an EditPlan. A Trim Refiner agent tightens shot boundaries. An Editor agent (FFmpeg and MoviePy) renders the EditPlan. A Reviewer agent scores the output across five dimensions and triggers a retry if the score is below threshold. The output is versioned. Name_v1.mp4, name_v2.mp4, until the Reviewer is satisfied. The input is a structured JSON brief: product name, target audience, tone, duration, optional style reference. Around four hundred stars. Specifically built for commercial ad cutting. The vertical-app move.

Same problem. Opposite architectures. Both shipping in the same month.

What is common to both is what working filmmakers should be paying attention to. Both lean on FFmpeg as the substrate, which is the open-source video toolkit that has been the spine of post for twenty years. Both use a Reviewer-loop pattern where the agent scores its own work and reruns when it falls short. Both treat the editor's craft as a sequence of decisions an agent can be trained to imitate, hand back to a human, and iterate on.

The category emerging here is not "the model generates the cut." It is "the agent operates the editor." That is the architectural piece that matters. The model that generates the shot is upstream. The agent that operates the editor is the layer that working post pros are actually going to integrate into their pipelines.

Here is what to do with this week. Pick the one whose framing fits your work. video-use if you cut a wide variety of content and want a flexible agent layer. agentic-video-editor if you cut commercial ads and want a structured brief format. Run a recent project through it. Watch the cut the agent gives you. Read the work it did and the work it skipped. Then think about which parts of your assistant-editor pipeline this could plausibly cover, and which parts it cannot touch yet.

The next year is going to be the year these patterns land inside Premiere and Resolve as panels. When that happens, the conversation about AI in editorial gets very different. It stops being "the model generated something." It starts being "the agent stack delivered a passing first cut." Both repos worth bookmarking now so you understand the shape of the panel before it ships.

Image generated with ChatGPT Image 2

While you were watching the arena fight, a small studio called Inferstudio published the workflow for "Daughter of the Inner Stars" on the Unreal Engine spotlight blog. Fifty minutes of symphonic orchestral content with live narration, animation, and a cinematic score. A young girl, Fia, searching for her father in a cosmic war.

The production is the story. Not the score, not the score's emotional arc, not the visual identity, though all of those land. The production.

For facial performance capture, the team used Live Link Face. That is the free iPhone app that ships with Unreal Engine. They mounted the phone on a bike helmet using a selfie stick, ankle weights, and what Production Designer Nathan Su calls "a bit too much duct tape." For body capture, Autodesk Flow Studio. For character creation, MetaHuman and MetaHuman Animator. For mesh customization (eye scaling, custom hair groom on the lead character), Blender. For final stylization, a custom Stable Diffusion AnimateDiff pass at 30 to 50 percent opacity over the Unreal Engine render.

That last step is the craft layer. The Stable Diffusion pass is not a transformation. It is a controlled blending operation designed to remove the digital tells of a real-time render (aliasing, blurring artifacts, texture seams) without overhauling the image. Painterly diffusion brushed onto cinematic geometry. Restraint applied to a tool that begs for excess.

Nour Hassoun, the Founder and CEO of IMPERSONAS (the digital human studio collaborating with Inferstudio), said the line that sums up the moment. "Performance capture was something I never would have dreamed a team of our size and skillset could tackle, even only a year before we took on this project."

That is a workflow you can lift. The bike-helmet rig produces facial capture data MetaHuman Animator can read. You probably own the parts. The Stable Diffusion light-pass at 30 to 50 percent opacity is a technique you can implement with any local Stable Diffusion install and your own AnimateDiff config. The Blender modifications to MetaHuman geometry are the craft signature that elevates the result above default-look.

If you have been waiting for a hybrid pipeline case study to study before committing to one of your own, this is the one.

Image credit: AI on the Lot

The fourth annual AI on the Lot conference returns May 27 and 28 in Culver City. Keynotes and panels at the Culver Theater. Workshops, demos, and lounging at Culver Labs. The Culver Studios lot, which Amazon MGM owns, is the host environment.

Last year was a hotel ballroom production. This year is a working studio lot. That move is not cosmetic.

Amazon MGM Studios is presenting on the integration of generative AI into active production workflows. That is the first time Amazon MGM has spoken publicly about its internal use of these tools. The session list also includes a startup showcase (ten companies pitching investors), a panel on the economics of synthetic character creators, a session on AI studio approaches across six production houses, and a closing session on user interface for creativity.

For a working filmmaker, the read is simple. AI in production stopped being conference-circuit material the moment a real studio lot opened its gates to host the conference. The buildings vote on the seriousness of an idea. The buildings just voted.

If you can get there May 27 to 28, the panel I am watching for is the Amazon MGM session. Not because Amazon MGM is going to spill the receipts. They will not. Because the room they assemble for that panel will be the people in working production who are integrating AI inside the tentpole pipeline. Watch the room as much as the stage.

If you cannot get there, the live coverage and the post-conference recap from competitors who attend will be your input. Curious Refuge, GenHQ, and a handful of working creators will publish their post-mortems within seventy-two hours. That secondary coverage is genuinely useful this year because the conference is genuinely different.

ACTION ITEM

Image credit XPRIZE

Google and XPRIZE opened the Future Vision film competition this week. Three and a half million dollars in prize money, in partnership with Range Media Partners. Submission deadline August 15, 2026. The brief is short films and trailers about an "optimistic, technology-forward future" produced with live-action, animation, AI tools, or any mix.

The grand prize is the part to read carefully. The winner gets production support to develop their three-minute submission into a full-length feature, with Google providing creative and production support through 100 ZEROS.

Fourteen weeks. Three minutes of finished work. A real partnership for a feature. If you have a story sitting on your desk, this is the venue.

SHORT TAKES


Short Takes

  • Stereoscopic Fusion prompt template (LudovicCreator on X). Two-color depth cue is doing the heavy lifting. Most stereoscopic prompts fail because the model cannot compose dimensions it cannot see. Color contrast is the proxy that works. Useful for mood-board and concept-art passes.

  • Seedance 2 fifteen-second MMA cage fight test. Sustained action geometry past second eight, where most models break. Worth studying which prompt structure preserved the chain-link geometry across the full fifteen seconds.

  • GPT-Image-2 Driven Prompt Engine. Pure documentation repo. Two hundred stars in four days. A community-curated reference library for the model most of us are already routing to for hero stills.

  • parlor on-device multimodal. Voice and vision running entirely local. No cloud calls. Sixteen hundred stars in thirty days. Worth a test if you work in environments where cloud round-trips are not acceptable.

  • AI on the Lot Seedance commute (gabemichael_ai). A short Seedance 2.0 piece previewing the trip to the conference. Light, well-cut.

  • Matt Wolfe's Coinbase comment. The vibe-coded-bank meme made a real point. Vibe-coding is fine for prototypes and not fine for the conform pass on a delivery deadline.

ESSENTIAL TOOLS

AI Filmmaking & Content Creation Tools Database

Check out the Alpha version of our AI Tools Database. We will be adding to it on a regular basis.

Got a tip about a great new tool? Send it along to us at: [email protected]

FINAL THOUGHTS


Final Thoughts: The cost of supervising your tools

On X this week, Matt Wolfe posted what a lot of us are quietly thinking. After four months of using OpenClaw, he wrote: "I feel like lately I'm spending more time troubleshooting issues with it and telling it what it's doing wrong than I am actually getting valuable use from it."

The post landed. As of this morning's read, it had over five hundred likes, nine retweets, two hundred and fifty replies, and one hundred and eighty thousand views.

The reason it landed is not the specific tool. The reason it landed is the experience. A lot of us are sitting at our desks running a tool that we adopted six or twelve months ago because it solved a real problem, and we are now spending more time supervising the tool than working through the problem.

That is the moment to stop.

When the time you spend telling a tool what it is doing wrong exceeds the time it saves you, the tool is not pulling its weight anymore. It is no longer a tool. It is a teammate, and not a great one. Tools are supposed to compress your work. Teammates expand it.

For a working creator, this is a permission slip to put a tool down. You are not a worse craftsperson if Cursor or OpenClaw or your favorite AI assistant has stopped earning its keep on your projects this month. You are a better craftsperson for noticing.

Here is what I would do this week. Pick one tool you currently use that you have been quietly frustrated with. Pull thirty minutes of time you would have spent inside it and instead spend that time working without it on the same problem. See which version of you produced the better cut, the better still, the better edit. The answer is not always going to be the human-only version. Sometimes the AI tool wins. Sometimes you win. Either way, the test is the test.

The tools that pull their weight stay. The tools that drain you go on the bench. That has been the deal with every editing system, plugin, and accelerator since the linear bays gave way to digital. AI is not exempt. The leaderboard moved this week. Your toolkit can move too.

Stay sharp. Keep creating.

— Larry

10x the context. Half the time.

Speak your prompts into ChatGPT or Claude and get detailed, paste-ready input that actually gives you useful output. Wispr Flow captures what you'd cut when typing. Free on Mac, Windows, and iPhone.

What did you think of today's newsletter?

Vote to help us make it better for you.

Login or Subscribe to participate

If you have specific feedback or anything interesting you’d like to share, please let us know by replying to this email.

AIography may earn a commission for products purchased through some links in this newsletter. This doesn't affect our editorial independence or influence our recommendations—we're just keeping the AI lights on!

Reply

Avatar

or to participate

Keep Reading