Skip to content
Model tracker

MODELS

The 22 mainstream AI models PromptCraft tracks across 5 categories (image / video / music / chat / audio-transcription). Current versions, prompt counts, and version-change notes. AI models ship a generation every six months — when your prompt stops working, this is the page to check first.

Last updated: 2026-04 · Next planned review: 2026-05

ImageImage(6)

  • Midjourney

    v7Latest
    Midjourney, Inc. · 2026-03

    The dominant image-generation model. v7 sharpens photorealism and layout control. Still runs via Discord bot + web app.

    Recent changev7 hardens --style raw, makes --sref style reference more stable, mostly grammar-compatible with v6.
  • Flux Pro

    1.1 UltraLatest
    Black Forest Labs · 2026-02

    Natural-language friendly, no weight syntax needed, strong photographic photorealism. Ultra pushes resolution to 4MP.

    Recent change1.1 Ultra is much more stable on fine details (eyes / hands / text) vs 1.0. Schnell remains the free fast tier.
  • Stable Diffusion

    3.5 LargeLatest
    Stability AI · 2026-01

    The only mainstream model with negative-prompt + full CFG / steps / sampler control. Pick SD for customisation, local runs, or ComfyUI pipelines.

    Recent change3.5 Large supersedes 3 Medium as the flagship. SDXL 1.0 remains the community's main base model.
  • Ideogram

    3.0Latest
    Ideogram · 2026-04

    Currently the only model that renders Chinese + English text 100% reliably. Best for quote cards, poster headlines, wordmark logos.

    Recent change3.0 is a major leap on CJK text rendering quality. Magic Prompt auto-adds photoreal modifiers (toggle off if not wanted).
  • Recraft V3

    V3Stable
    Recraft · 2025-10

    Best pick for vector style, icon sets, and illustration series (where style consistency matters).

  • Adobe Firefly

    Image 4 UltraStable
    Adobe · 2026-02

    Cleanest commercial-licensing story among image models (training data is Adobe Stock). Built into Photoshop / Illustrator.

VideoVideo(6)

  • Sora

    2Latest
    OpenAI · 2026-03

    20s clips + physics consistency + image-to-video. Gated behind ChatGPT Plus / Pro / API access.

    Recent changev2 fixes v1's character-consistency drift; camera-motion phrasing actually lands now.
  • Veo

    3.1Latest
    Google DeepMind · 2026-03

    8s clips with built-in audio + ambient sound. Callable from Gemini App / Vertex AI / Flow.

    Recent change3.1 adds reference-image support; lip-sync is improved.
  • Kling

    2.0 MasterLatest
    Kuaishou · 2026-03

    China-built, Chinese prompts welcome. 10s clips with fine-grained camera control.

    Recent change2.0 Master narrows the motion-smoothness gap with Sora.
  • Runway Gen-4

    Gen-4Stable
    Runway · 2025-12

    Veteran creative-pro pick, deepest tool-chain (green-screen, compositing, motion-track control).

  • Pika 2.2

    2.2Stable
    Pika Labs · 2025-11

    Short-form specialist; sketch-to-video under 10s is a sweet spot. Discord bot + web app.

  • Seedance 2.0

    2.0 ProLatest
    ByteDance · 2026-02

    1080p / 5s clips; motion + character consistency leads its price tier.

MusicMusic(2)

  • Suno

    v5.5Latest
    Suno · 2026-04

    The dominant music-generation model. 4-minute length + custom mode for lyrics, structure, instruments, vocal style.

    Recent changev5.5 improves Chinese-lyric pronunciation; cover-song mode is more stable. Stem separation is a Pro feature.
  • Udio

    v1.5Stable
    Udio · 2026-01

    Other major music-gen tool. Some genres (jazz, classical, world music) are more nuanced than Suno.

ChatChat / Writing / Coding(3)

  • ChatGPT (GPT-5)

    GPT-5Latest
    OpenAI · 2026-02

    General chat, writing, coding, agentic. Voice mode, Canvas, Custom GPTs all available.

  • Claude 4.5

    Sonnet 4.5Latest
    Anthropic · 2026-03

    Strongest at long-form / editing / nuanced-instruction following. Claude Code, Projects, and Computer Use all available.

  • Gemini 2.5 Pro

    2.5 Pro Deep ThinkLatest
    Google · 2026-04

    1M token context + multimodal (image, audio, video, code) + 3 access tiers (Gemini App / AI Studio / Vertex).

AudioAudio / Transcription (LRC Sync)(4)

  • Gemini 2.5 Flash (Audio)

    2.5 Flash multimodalLatest
    Google · 2026-04

    Powers the ☆ Gemini mode in LRC Sync v2. Trained on music understanding — does what OpenAI's entire audio family can't: actually listens to AI vocoder / Suno-generated vocals. $1/M audio tokens (40x cheaper than gpt-4o-audio).

    Recent changeSupports up to 9.5h audio, 20MB inline, generateContent + audioTimestamp structured JSON output.
  • GPT-4o Audio Preview

    gpt-4o-audio-previewStable
    OpenAI · 2025-03

    Powers the ★ HD mode in LRC Sync v2. Multimodal chat completions, accepts audio input + reasons about timestamps. But hallucinates evenly-spaced timestamps for AI vocoder songs (CV<5% detection auto-refunds). $40/M audio tokens.

  • GPT-4o Mini Transcribe

    gpt-4o-mini-transcribeStable
    OpenAI · 2025-03

    In LRC Sync v2 standard mode, runs in parallel with whisper-1 (dual-model hybrid) to get more accurate text. Lower WER than whisper-1 but no word-level timestamps (text-only). $0.003/min.

  • OpenAI Whisper

    whisper-1Legacy
    OpenAI · 2022-09

    Timing source for LRC Sync v2 standard mode. The only OpenAI model with word-level timestamps (newer models dropped this feature). Accurate on human recordings, poor on AI vocoder / Suno-generated vocals (industry-known blind spot). $0.006/min.

    Recent changeOld but the only word-level-timestamp ASR in the industry; OpenAI keeps it but unlikely to update.

◆ Don't see the model you use?

PromptCraft tracks 7 major AI model families. If you use Lumalabs Dream Machine, Hailuo, Kling Pro Plus or similar, email promptcraft@prompt.luvai.net — enough requests and we'll add it.