Readme
Shap-E
This is the official code and model release for Shap-E: Generating Conditional 3D Implicit Functions from https://github.com/openai/shap-e .

Generating Conditional 3D Implicit Functions
Remove images background
openai/clip-vit-large-patch14 with Transformers
ZoeDepth: Combining relative and metric depth
Anime-themed text-to-image stable diffusion model
high-quality, highly detailed anime style stable-diffusion with better VAE
high-quality, highly detailed anime-style Stable Diffusion models
Real-ESRGAN: Real-World Blind Super-Resolution
powerful open-source visual language model
Dream Shaper stable diffusion
Stable Diffusion on Danbooru images
Colorization using a Generative Color Prior for Natural Images
Real-ESRGAN super-resolution model from ruDALL-E
Robust Monocular Depth Estimation
high-quality, highly detailed anime style stable-diffusion
sd-v2 with diffusers, test version!
a dreambooth model trained on a diverse set of analog photographs
Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild. This version uses LLaVA-13b for captioning.
Demucs Music Source Separation
Generate t-shirt logos with stable-dfffusion
Advanced text-image comprehension and composition based on InternLM
Multi-stage text-to-video generation
Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder
Stylized Audio-Driven Single Image Talking Face Animation
stable-diffusion-v2-inpainting
SeamlessM4T—Massively Multilingual & Multimodal Machine Translation
Detailed, higher-resolution images from Stable Diffusion
VideoCrafter2: Text-to-Video and Image-to-Video Generation and Editing
stable-diffusion with negative prompts, more scheduler
Background removal model developed by BRIA.AI, trained on a carefully selected dataset and is available as an open-source model for non-commercial use.
with large-v2 checkpoint
Unsupervised Night Image Enhancement
Text-to-Image Diffusion Models are Zero-Shot Video Generators
stable-diffusion with v1-5 checkpoint
Tuning-Free Multi-Subject Image Generation with Localized Attention
high-quality highly detailed anime stylized latent diffusion model
mixed stable diffusion model
Adding semantic labels for segment anything
Portraits with stable-diffusion
VQ-Diffusion for Text-to-Image Synthesis
Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild. This is the SUPIR-v0Q model and does NOT use LLaVA-13b.
Image Manipulatinon with Diffusion Autoencoders
High-Quality Video Generation with Cascaded Latent Diffusion Models
Audio-Driven Synthesis of Photorealistic Portrait Animations
stable-diffusion models for high quality and detailed anime images
Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild. This is the SUPIR-v0F model and does NOT use LLaVA-13b.
Highly practical solution for robust monocular depth estimation by training on a combination of 1.5M labeled images and 62M+ unlabeled images
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Pose-Invariant Hairstyle Transfer
Point-E: A System for Generating 3D Point Clouds from Complex Prompts
A linear estimator on top of clip to predict the aesthetic quality of pictures
Decoding Micromotion in Low-dimensional Latent Spaces from StyleGAN
fine-tuned Stable Diffusion model trained on the game art from Elden Ring
Zero-shot Image-to-Image Translation
Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Van Gough on Stable Diffusion via Dreambooth
Finte-tuned Stable Diffusion on high quality 3D images with a futuristic Sci-Fi theme
generate pixel art sprite sheets from four different angles with Stable-diffusion
GTA5 Artwork Diffusion via Dreambooth
Generates pokemon sprites from prompt
face alignment using stylegan-encoding
Clip-Guided Diffusion Model for Image Generation
Efficient Pretraining of Text-to-Image Models
text-to-image with latent diffusion
Separate Anything You Describe
Inpainting using Denoising Diffusion Probabilistic Models
Learning Adapters towards Controllable for Text-to-Image Diffusion Models
End-to-End Document Image Enhancement Transformer
Kandinsky Image Generation with ControlNet Conditioning
dreambooth trained on a very diverse dataset ranging from photographs to paintings
Disco Diffusion style on Stable Diffusion via Dreambooth
Efficient Diffusion Model for Image Super-resolution by Residual Shifting
Real-Time High-Resolution Background Matting
Stable Diffusion v2-1-unclip Model
App Icons Generator V1 (DreamBooth Model)
Prompt-to-prompt image editing with cross-attention control
Training-free Controllable Text-to-Video Generation
Added downloadable subtitles for openai/whisper
lightweight text-to-speech (TTS) model, trained on 10.5K hours of audio data
A Visual Language Model for GUI Agents
herge_style on Stable Diffusion via Dreambooth
Controlling Vision-Language Models for Universal Image Restoration
stable-diffusion-textual-inversion fine-tuned with ugly sonic
Consistent Diffusion Features for Consistent Video Editing
Finetuned Stable-diffusion from Gerry Anderson Supermarionation
text-to-image generation
Diffusion Models as Text Painters
Open-source Distilled Stable Diffusion 100% speedup
High-quality multilingual text-to-speech library
High-quality multilingual text-to-speech library
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
High-quality multilingual text-to-speech library
Panoptic Scene Graph Generation
text-to-video generation model