deerflow2/skills/public/video-generation/SKILL.md

4.4 KiB
Executable File

name description
video-generation Use this skill when the user requests to generate, create, or imagine videos. Supports structured prompts and reference image for guided generation.

Video Generation Skill

Overview

This skill generates high-quality videos using structured prompts and a Python script. The workflow includes creating JSON-formatted prompts and executing video generation through RunningHub API.

Core Capabilities

  • Create structured JSON prompts for AIGC video generation
  • Generate videos through RunningHub Vidu model (text-to-video-q3-turbo)
  • Support up to 16 seconds video generation with audio
  • Automatic camera switching and dialogue generation

Workflow

Step 1: Understand Requirements

When a user requests video generation, identify:

  • Subject/content: What should be in the video
  • Style preferences: Art style, mood, color palette
  • Technical specs: Aspect ratio, resolution, duration
  • Audio requirements: Background music, dialogue, sound effects

Step 2: Create Structured Prompt

Generate a structured JSON file in /mnt/user-data/workspace/ with naming pattern: {descriptive-name}.json

The prompt should include visual descriptions, camera movements, and audio specifications in a natural language format.

Step 3: Execute Generation

Call the Python script:

python /mnt/skills/public/video-generation/scripts/generate.py \
  --prompt-file /mnt/user-data/workspace/prompt-file.json \
  --output-file /mnt/user-data/outputs/generated-video.mp4 \
  --aspect-ratio 16:9

Parameters:

  • --prompt-file: Absolute path to JSON prompt file (required)
  • --output-file: Absolute path to output video file (required)
  • --aspect-ratio: Aspect ratio of the generated video (optional, default: 16:9)

[!NOTE] Do NOT read the python file, instead just call it with the parameters.

Environment Variables

Set the following environment variable before running the script:

  • RUNNINGHUB_API_KEY: Your RunningHub API key

Example:

export RUNNINGHUB_API_KEY=a73d0e93afb4432c978e5bff30b7517e

Video Generation Example

User request: "Generate a short video clip depicting the opening scene from "The Chronicles of Narnia: The Lion, the Witch and the Wardrobe"

Step 1: Create a JSON prompt file with the following content:

{
  "title": "The Chronicles of Narnia - Train Station Farewell",
  "background": {
    "description": "World War II evacuation scene at a crowded London train station. Steam and smoke fill the air as children are being sent to the countryside to escape the Blitz.",
    "era": "1940s wartime Britain",
    "location": "London railway station platform"
  },
  "characters": ["Mrs. Pevensie", "Lucy Pevensie"],
  "camera": {
    "type": "Close-up two-shot",
    "movement": "Static with subtle handheld movement",
    "angle": "Profile view, intimate framing",
    "focus": "Both faces in focus, background soft bokeh"
  },
  "dialogue": [
    {
      "character": "Mrs. Pevensie",
      "text": "You must be brave for me, darling. I'll come for you... I promise."
    },
    {
      "character": "Lucy Pevensie",
      "text": "I will be, mother. I promise."
    }
  ],
  "audio": [
    {
      "type": "Train whistle blows (signaling departure)",
      "volume": 1
    },
    {
      "type": "Strings swell emotionally, then fade",
      "volume": 0.5
    },
    {
      "type": "Ambient sound of the train station",
      "volume": 0.5
    }
  ]
}

Step 2: Use the generate.py script to generate the video

python /mnt/skills/public/video-generation/scripts/generate.py \
  --prompt-file /mnt/user-data/workspace/narnia-farewell-scene.json \
  --output-file /mnt/user-data/outputs/narnia-farewell-scene.mp4 \
  --aspect-ratio 16:9

Do NOT read the python file, just call it with the parameters.

Output Handling

After generation:

  • Videos are typically saved in /mnt/user-data/outputs/
  • Share generated videos with user using present_files tool
  • Provide brief description of the generation result
  • Offer to iterate if adjustments needed

Notes

  • Always use English for prompts regardless of user's language
  • JSON format ensures structured, parsable prompts
  • RunningHub Vidu model supports up to 16 seconds video generation
  • Audio is automatically generated including dialogue and sound effects
  • The model has "director thinking" capability for automatic camera switching
  • Iterative refinement is normal for optimal results