MODELS
The 22 mainstream AI models PromptCraft tracks across 5 categories (image / video / music / chat / audio-transcription). Current versions, prompt counts, and version-change notes. AI models ship a generation every six months — when your prompt stops working, this is the page to check first.
◆ ImageImage(6)
Midjourney
v7LatestMidjourney, Inc. · 2026-03The dominant image-generation model. v7 sharpens photorealism and layout control. Still runs via Discord bot + web app.
Recent changev7 hardens --style raw, makes --sref style reference more stable, mostly grammar-compatible with v6.Flux Pro
1.1 UltraLatestBlack Forest Labs · 2026-02Natural-language friendly, no weight syntax needed, strong photographic photorealism. Ultra pushes resolution to 4MP.
Recent change1.1 Ultra is much more stable on fine details (eyes / hands / text) vs 1.0. Schnell remains the free fast tier.Stable Diffusion
3.5 LargeLatestStability AI · 2026-01The only mainstream model with negative-prompt + full CFG / steps / sampler control. Pick SD for customisation, local runs, or ComfyUI pipelines.
Recent change3.5 Large supersedes 3 Medium as the flagship. SDXL 1.0 remains the community's main base model.Ideogram
3.0LatestIdeogram · 2026-04Currently the only model that renders Chinese + English text 100% reliably. Best for quote cards, poster headlines, wordmark logos.
Recent change3.0 is a major leap on CJK text rendering quality. Magic Prompt auto-adds photoreal modifiers (toggle off if not wanted).Recraft V3
V3StableRecraft · 2025-10Best pick for vector style, icon sets, and illustration series (where style consistency matters).
Adobe Firefly
Image 4 UltraStableAdobe · 2026-02Cleanest commercial-licensing story among image models (training data is Adobe Stock). Built into Photoshop / Illustrator.
◆ VideoVideo(6)
Sora
2LatestOpenAI · 2026-0320s clips + physics consistency + image-to-video. Gated behind ChatGPT Plus / Pro / API access.
Recent changev2 fixes v1's character-consistency drift; camera-motion phrasing actually lands now.Veo
3.1LatestGoogle DeepMind · 2026-038s clips with built-in audio + ambient sound. Callable from Gemini App / Vertex AI / Flow.
Recent change3.1 adds reference-image support; lip-sync is improved.Kling
2.0 MasterLatestKuaishou · 2026-03China-built, Chinese prompts welcome. 10s clips with fine-grained camera control.
Recent change2.0 Master narrows the motion-smoothness gap with Sora.Runway Gen-4
Gen-4StableRunway · 2025-12Veteran creative-pro pick, deepest tool-chain (green-screen, compositing, motion-track control).
Pika 2.2
2.2StablePika Labs · 2025-11Short-form specialist; sketch-to-video under 10s is a sweet spot. Discord bot + web app.
Seedance 2.0
2.0 ProLatestByteDance · 2026-021080p / 5s clips; motion + character consistency leads its price tier.
◆ MusicMusic(2)
Suno
v5.5LatestSuno · 2026-04The dominant music-generation model. 4-minute length + custom mode for lyrics, structure, instruments, vocal style.
Recent changev5.5 improves Chinese-lyric pronunciation; cover-song mode is more stable. Stem separation is a Pro feature.Udio
v1.5StableUdio · 2026-01Other major music-gen tool. Some genres (jazz, classical, world music) are more nuanced than Suno.
◆ ChatChat / Writing / Coding(3)
ChatGPT (GPT-5)
GPT-5LatestOpenAI · 2026-02General chat, writing, coding, agentic. Voice mode, Canvas, Custom GPTs all available.
Claude 4.5
Sonnet 4.5LatestAnthropic · 2026-03Strongest at long-form / editing / nuanced-instruction following. Claude Code, Projects, and Computer Use all available.
Gemini 2.5 Pro
2.5 Pro Deep ThinkLatestGoogle · 2026-041M token context + multimodal (image, audio, video, code) + 3 access tiers (Gemini App / AI Studio / Vertex).
◆ AudioAudio / Transcription (LRC Sync)(4)
Gemini 2.5 Flash (Audio)
2.5 Flash multimodalLatestGoogle · 2026-04Powers the ☆ Gemini mode in LRC Sync v2. Trained on music understanding — does what OpenAI's entire audio family can't: actually listens to AI vocoder / Suno-generated vocals. $1/M audio tokens (40x cheaper than gpt-4o-audio).
Recent changeSupports up to 9.5h audio, 20MB inline, generateContent + audioTimestamp structured JSON output.GPT-4o Audio Preview
gpt-4o-audio-previewStableOpenAI · 2025-03Powers the ★ HD mode in LRC Sync v2. Multimodal chat completions, accepts audio input + reasons about timestamps. But hallucinates evenly-spaced timestamps for AI vocoder songs (CV<5% detection auto-refunds). $40/M audio tokens.
GPT-4o Mini Transcribe
gpt-4o-mini-transcribeStableOpenAI · 2025-03In LRC Sync v2 standard mode, runs in parallel with whisper-1 (dual-model hybrid) to get more accurate text. Lower WER than whisper-1 but no word-level timestamps (text-only). $0.003/min.
OpenAI Whisper
whisper-1LegacyOpenAI · 2022-09Timing source for LRC Sync v2 standard mode. The only OpenAI model with word-level timestamps (newer models dropped this feature). Accurate on human recordings, poor on AI vocoder / Suno-generated vocals (industry-known blind spot). $0.006/min.
Recent changeOld but the only word-level-timestamp ASR in the industry; OpenAI keeps it but unlikely to update.
◆ Don't see the model you use?
PromptCraft tracks 7 major AI model families. If you use Lumalabs Dream Machine, Hailuo, Kling Pro Plus or similar, email promptcraft@prompt.luvai.net — enough requests and we'll add it.