FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

Reimagine existing images with Ideogram V2's remix feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.
FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
Default parameters with automated optimizations and quality improvements.
An endpoint for re-lighting photos and changing their backgrounds per a given description
Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
Utilize Flux.1 [pro] Controlnet with a fine-tuned LoRA to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms.
Clone voice of any person and speak anything in their voise using zonos' voice cloning.
Upscale your images with DRCT-Super-Resolution.
A fast and expressive Hindi text-to-speech model with clear pronunciation and accurate intonation.
Generate high-speed text-to-speech audio using ElevenLabs TTS Turbo v2.5.
Generate high quality images from text prompts using MiniMax. Longer text prompts will result in better quality images.
Swap faces of one or two people at once, while preserving user and scene details!
Create depth maps using Marigold depth estimation.
Automatically retouches faces to smooth skin and remove blemishes.
Predict the probability of an image being NSFW.
Run Any Stable Diffusion model with customizable LoRA weights.
Run SDXL at the speed of light
MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech).
Create illusions conditioned on image.
Generate short video clips from your images using SVD v1.1
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
FLUX LoRA Image-to-Image is a high-performance endpoint that transforms existing images using FLUX models, leveraging LoRA adaptations to enable rapid and precise image style transfer, modifications, and artistic variations.
SAM 2 is a model for segmenting images automatically. It can return individual masks or a single mask for the entire image.
Recraft V3 Create Style is capable of creating unique styles for Recraft V3 based on your images.
Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. This endpoint generates videos from text descriptions.
Enhances a given raster image using 'clarity upscale' tool, increasing image resolution, making the image sharper and cleaner.
Kling Kolors Virtual TryOn v1.5 is a high quality image based Try-On endpoint which can be used for commercial try on.
DeepSeek Janus-Pro is a novel text-to-image model that unifies multimodal understanding and generation through an autoregressive framework
Bria Background Replace allows for efficient swapping of backgrounds in images via text prompts or reference image, delivering realistic and polished results. Trained exclusively on licensed data for safe and risk-free commercial use
This endpoint delivers seamlessly localized videos by generating lip-synced dubs in multiple languages, ensuring natural and immersive multilingual experiences
MoonDreamNext is a multimodal vision-language model for captioning, gaze detection, bbox detection, point detection, and more.
FLUX.1 [pro] Fill Fine-tuned is a high-performance endpoint for the FLUX.1 [pro] model with a fine-tuned LoRA that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
Wan-2.1 1.3B is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from text promptsat faster speeds.
Generate video clips more accurately with respect to natural language descriptions and using camera movement instructions for shot control.
A fast and high quality model for image background removal.
Veo 2 creates videos with realistic motion and high quality output. Explore different styles and find your own with extensive camera controls.
Wan-2.1 Pro is a premium text-to-video model that generates high-quality 1080p videos at 30fps with up to 6 seconds duration, delivering exceptional visual quality and motion diversity from text prompts
Hyper-charge SDXL's performance and creativity.
Animate Your Drawings with Latent Consistency Models!
Run Stable Diffusion XL with customizable LoRA weights.
A LLaVA model fine-tuned from microsoft/Phi-3-mini-4k-instruct and CLIP-ViT-Large-patch14-336 with ShareGPT4V-PT and InternVL-SFT by XTuner.
A multimodal conversational AI model that can chat with users about images and text. It's optimized for multi-image reasoning, where interleaved text and images can be used fed as the input to generate responses.
A model fine-tuned from meta-llama/Meta-Llama-3-8B-Instruct and CLIP-ViT-Large-patch14-336 with LLaVA-Pretrain and LLaVA-Instruct by XTuner.
Generate short video clips from your prompts
FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.
Super fast endpoint for the FLUX.1 [dev] inpainting model with LoRA support, enabling rapid and high-quality image inpaingting using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
FLUX.1 [dev] Fill is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
Transform existing images with Ideogram V2's editing capabilities. Modify, adjust, and refine images while maintaining high fidelity and realistic outputs with precise prompt control.
Bria's Text-to-Image model, trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us
Bria's Text-to-Image model with perfect harmony of latency and quality. Trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us
Bria RMBG 2.0 enables seamless removal of backgrounds from images, ideal for professional editing tasks. Trained exclusively on licensed data for safe and risk-free commercial use. Model weights for commercial use are available here: https://share-eu1.hsforms.com/2GLpEVQqJTI2Lj7AMYwgfIwf4e04?utm_campaign=RMBG%202.0&utm_source=RMBG%20image%20and%20video%20page&utm_medium=button&utm_content=rmbg%20image%20pricing%20form
A powerful image to novel multiview model with normals.
Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
MuseTalk is a real-time high quality audio-driven lip-syncing model. Use MuseTalk to animate a face with your own audio.
FLUX.1 [dev] Redux is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
Generate videos from images using LTX Video
Generate high quality video clips from text and image prompts using PixVerse v3.5
Lumina-Image-2.0 is a 2 billion parameter flow-based diffusion transforer which features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
Compose videos from multiple media sources using FFmpeg API.
Imagen3 is a high-quality text-to-image model that generates realistic images from text prompts.
GOT-OCR2 works on a wide range of tasks, including plain document OCR, scene text OCR, formatted document OCR, and even OCR for tables, charts, mathematical formulas, geometric shapes, molecular formulas and sheet music.
Use NAFNet to fix issues like blurriness and noise in your images. This model specializes in image restoration and can help enhance the overall quality of your photography.
Accelerated image generation with Ideogram V2A Turbo. Create high-quality visuals, posters, and logos with enhanced speed while maintaining Ideogram's signature quality.
Automatically generates text captions for your videos from the audio as per text colour/font specifications
Transform text into hyper-realistic videos with Haiper 2.5. Experience industry-leading resolution, fluid motion, and rapid generation for stunning AI videos.
Add custom LoRAs to Wan-2.1 is a image-to-video model that generates high-quality videos with high visual quality and motion diversity from images
DiffRythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds.
Kokoro is a lightweight text-to-speech model that delivers comparable quality to larger models while being significantly faster and more cost-efficient.
Generate video clips more accurately with respect to initial image, natural language descriptions, and using camera movement instructions for shot control.
Zero-shot Identity-Preserving Generation in Seconds
Generate short video clips from your prompts using SVD v1.1
Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size.
Collection of SDXL Lightning models.
FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.
Stable Diffusion 3 Medium (Text to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency.
Create animated videos from keyframe images.
Use any vision language model from our selected catalogue (powered by OpenRouter)
Interpolate between image frames
Default parameters with automated optimizations and quality improvements.
Generate video clips from your prompts using MiniMax model
FLUX1.1 [pro] Redux is a high-performance endpoint for the FLUX1.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
Generate high-quality images from depth maps using Flux.1 [dev] depth estimation model. The model produces accurate depth representations for scene understanding and 3D visualization.
FLUX LoRA training optimized for portrait generation, with bright highlights, excellent prompt following and highly detailed results.
Multimodal vision-language model for video understanding
Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.
Image to Video for the Hunyuan Video model using a custom trained LoRA.
MoonDreamNext Detection is a multimodal vision-language model for gaze detection, bbox detection, point detection, and more.
Get waveform data from audio files using FFmpeg API.
SkyReels V1 is the first and most advanced open-source human-centric video foundation model. By fine-tuning HunyuanVideo on O(10M) high-quality film and television clips
Transform text into stunning videos with TransPixar - an AI model that generates both RGB footage and alpha channels, enabling seamless compositing and creative video effects.