Stable Diffusion Deluxe Features

Advanced AI All-in-One App to Create Multi-Media Magic…

Β  Feature “Short” List:

  • Enhanced HuggingFace Diffusers in Material UI Flutter/Flet GUI with Themes & SoundFX
  • Advanced Prompts List with overrides:Β  Batch list with the ability to override any parameter per prompt in queue
  • Stable Diffusion v2.1 & lower, plus many bonus Pipelines.
  • Finetuned Community Models: Thousands to try, more added regularly, add your own.
  • Dreambooth Library: A community collection filled with pre-trained models to explore and use.
  • Long Prompt Weighting: Emphasize (positive) & [negative] word strengths with more tokens.
  • Walk Interpolation: Create transitions the latent space between different prompts.
  • Centipede Prompts as init-images: Feed your prompts list as initial images down the line.
  • CLIP-Guided: Uses alternative LAION & OpenAI ViT diffusion.
  • Textual-Inversion Conceptualizer: Loads specially trained concept models to include in prompt with token.
  • Dual Guided Versatile Diffusion: Multi-flow model that provides both image and text data streams and conditioned on both text and image.
  • Image Variation: Creates a new version of your picture, without a prompt.
  • iMagic: Edit your image according to the prompted instructions like magic.
  • Depth2Image: Uses Depth-map of init image for text-guided image to image generation.
  • Composable: Craft your prompts with | precise | weights AND composed together components | with AND NOT negatives.
  • Self-Attention Guidance: Intelligent guidance that can plugged into any diffusion model using their self-attention map.
  • Attend & Excite: Provides textual Attention-Based Semantic Guidance control over the image generation.
  • Panorama: Generate panorama-like wide images, Fusing Diffusion Paths for Controlled Image Generation
  • Safe Pipeline: Use a content quality tuned safety model, providing levels of NSFW protection.
  • Stable Cascade: The latest Stability.ai diffusion technique using Wurstchen 3 for coherent generation.
  • DeepFloyd-IF: State-of-the-art text-to-image model with a high degree of photorealism and language understanding.
  • unCLIP Generator: Hierarchical Text-Conditional Image Generation with CLIP Latents.
  • unCLIP Image Variations: Generate Variations from an input image using unCLIP.
  • unCLIP Interpolation & Image Interpolation: Takes two prompts and interpolates using spherical interpolation.
  • Text-to-Video: Modelscope’s Text-to-video-synthesis Model to Animate Diffusion.
  • Text2Video-Zero: Text-to-Image Diffusion Models for Zero-Shot Video Generators.
  • AnimateDiff: Create smooth coherent videos with an SD Motion-Model animator.
  • Potat1 Text-to-Video: CamenDuru’s Open-Source 1024×576 Text-To-Video Model.
  • ControlNet Video2Video: Apply Stable Diffusion to a video, while maintaining frame-to-frame consistency.
  • Video-to-Video: Init video-synthesis Model to Reanimate short video clips.
  • ROOP Face Swapper: Take a Video or Image and Replace the Face in it with a face of your choice.
  • Stable Animation: Use Stability.ai API Credits for Advanced Video Generation, similar to Deforum & Disco Diffusion..
  • Stability-API: Use DreamStudio.com servers without your GPU to create images on CPU.
  • Stable Horde-API: Use free AIHorde.net Crowdsourced cloud without your GPU to create images on CPU.
  • SD2 4X Upscale: Allows you to enlarge images with prompts for greater details.
  • Real-ESRGAN Upscaling: Recommended to enlarge & sharpen all images as they’re made.
  • Prompt Writer: Construct your Art descriptions with random artists & styles, with all the extras you need to engineer perfect prompts faster.
  • GPT-3, ChatGPT, Gemini & Claude 3 Prompt Generator: Enter a phrase each prompt should start with and the amount of prompts to generate.
  • Prompt Remixer: Enter a complete prompt you’ve written that is well worded and descriptive, and get variations of it.
  • Prompt Brainstormer: Get new ideas from prompts you’ve written with various descriptive modes.
  • Negatives Builder: Create well structured Negative Prompt Text with common categories you don’t want.
  • Prompt Styler: Provide a base prompt and create many stylized prompts with well-formed artistic descriptors.
  • Image2Text: Interrogate an image with Fuyu, Moondream2, GPT-4 Vision, Gemini Vision, Claude 3 Vision, AIHorde or BLIP.
  • GPT-2 Magic Prompt: Generates new Image Prompts made for Stable Diffusion with a specially trained GPT-2 Text AI.
  • Distil GPT-2: Generates new Image Prompts with a model trained on 2,470,000 descriptive Stable Diffusion prompts.
  • DreamBooth Trainer: Provide a collection of images to conceptualize into your personalized model.
  • LoRA & Texual Inversion Trainer: Training with Low-Rank Adaptation of Large Language Models.
  • Model Converter & Merger: Lets you Convert Format of Model Checkpoints to work with Diffusers..
  • SD2 Image Variations: Creates a new version of your picture, without a prompt.
  • EDICT Editor: Text-guided image editing. Exact Diffusion Inversion via Coupled Transformations.
  • DiffEdit: Zero-shot Diffusion-based Semantic Image Editing with Mask Guidance.
  • MagicMix: Semantic mixing of an image and a text prompt.
  • RePainter: Fills in masked areas of picture with what it thinks it should be, without a prompt.
  • Paint-by-Example: Image-guided Inpainting using an Example Image to Transfer Subject to Masked area.
  • Instruct Pix2Pix: Text-Based Image Editing – Learning to Follow Image Editing Instructions.
  • ControlNet Multi: Add Multiple Input Conditions To Pretrained Text-to-Image Diffusion Control Models.
  • ControlNet QRCode: Img2Img for Inpainting QR Code with Prompt and/or Init Image.
  • Reference-Only: ControlNet Pipeline for Transfering Ref Subject to new images.
  • Re-Segment Anything: ControlNet on Meta’s Segment-Anything to write a prompt, and generate images from segments.
  • CLIP-Styler: Transfers a Text Guided Style onto your Image From Prompt Description.
  • Semantic Guidance: Latent Editing prompts to apply or remove multiple concepts from an image with advanced controls.
  • Material Diffusion: Create Seamless Tiled Textures with your Prompt with Replicate.com API.
  • DreamFusion 3D: Create experimental 3D Rendered Models and Videos from prompt.
  • Point-E 3D: Provide a Prompt or Image to render Point Cloud from a CLIP ViT-L/14 diffusion model.
  • Shap-E 3D: Provide a Prompt or Image to Generate Conditional 3D PLY Models.
  • TripoSR: Generate realistic 3D meshes from an image with good details.
  • Meshy.ai: Make detailed 3D meshes from prompts or input images.
  • InstantNGR: Convert series of images into 3D Models with Multiresolution Hash Encoding.
  • Tortoise TTS: Voice Modeling that Reads your text in a realistic AI voice, train your own to mimic vocal performances.
  • HarmonAI Dance Diffusion: Create experimental music or sounds with HarmonAI trained audio models.
  • Audio Diffusion: Converts Audio Samples to and from Mel Spectrogram Images and tweek it.
  • Bark: Text-to-Audio Generation for Multilingual Speech, Music and Sound Effects.
  • Riffusion: Spectrogram Sound Modeling with Stable Diffusion for real-time music generation
  • AudioLDM Text2Sound: Text-to-Audio Generation with Latent Diffusion Model.
  • AudioCraft MusicGen: Simple and Controllable Music Generation with Audio tokenization model.
  • ZETA Editing: Provide music or audio clip and ask to make audio edits with source & target prompts.
  • Mubert Music: AI music is generated by Mubert API. Pretty good grooves.
  • Whisper STT: Generate Text Transcriptions from Speech Recordings, then optionally process text with GPT.
  • VoiceFixer: Speech Restoration with Neural Vocoder to clean up bad vocals and fixes the unwanted noise.
  • DALL-E 2 & 3 API: Generates Images using your OpenAI API Key.
  • Kandinsky 2.1 & Fuser: Latent Diffusion model with two Multilingual text encoders, Mix multiple Images and Prompts together.
  • DeepDaze: An alternative method using OpenAI’s CLIP and Siren. Older but still facinating.
  • Background Remover: A deep learning approach to clear the background of most images to isolate subject.
  • Metadata in png, smart filenames: Inclusion of Parameters metadata in files and filenames from prompt or time.
  • Batch Upscaler: Real-ESRGAN AI Upscale Enlarging onΒ one or more files, or give path to image or folder..
  • Prompt Retriever: Retrieve previously used Prompts from Image Metadata.
  • Cache Manager: Manage your Cache Directory Saved Models, so you can trim the fat as needed.
  • Init Images from Folder or Video: Generate Prompts with initial images from a folder or a video.

The list is extensive and the app is continuously updated to add more features.

Detailed Feature List:

  • Prompt Helpers:
    • πŸ“œ Advanced Prompt Writer with Noodle Soup Prompt random variables – Construct your Art descriptions easier, with all the extras you need to engineer perfect prompts faster. Randomly add hundreds of artists and styles.
    • 🧠 OpenAI GPT-3/4/Gemini Prompt Generator – Enter a phrase each prompt should start with and the amount of prompts to generate. Just experiment, AI will continue to surprise.
    • πŸ”„ Prompt Remixer – GPT-3/4/Gemini AI Helper – Enter a complete prompt you’ve written that is well worded and descriptive, and get variations of it with our AI Friend. Experiment.
    • πŸ€” Prompt Brainstormer – TextSynth GPT-J-6B, OpenAI GPT, Gemini & HuggingFace Bloom AI – Get Inspiration on Prompt Engineering with Rewrite, Edit, Story, Description, Details, etc.
    • πŸ‘“ Prompt Styler – Generate your Prompts with Premade Style Templates.
    • 🚫 Negative Prompt Builder – Generate your Negatives with ease to subtract what you don’t want in your images.
    • πŸ˜Άβ€πŸŒ«οΈ Image2Text CLIP-Interrogator – Create text prompts by describing input images…
    • 🎩 Magic Prompt Generator – GPT-2 AI Helper – Generates new Image Prompts made for Stable Diffusion with a specially trained GPT-2 Text AI by Gustavosta…
    • 🦸 SuperPrompt v1 Detailer – Generates more detailed prompts in a 77M Parameter custom trained Google Flan-T5…
    • βš—οΈ Distilled GPT-2 Generator – GPT-2 AI Helper – Generates new Image Prompts with a model trained on 2,470,000 descriptive Stable Diffusion prompts…
    • πŸ“° Retrieve Prompts from Image Metadata – Give it images made here and gives you all parameters used to recreate it. Either upload png file(s) or paste path to image or folder or config.json to revive your dreams..
    • πŸ“‚ Generate Prompts from Folder as Init Images – Provide a Folder with a collection of images that you want to automatically add to prompts list with init_image overrides…
    • πŸŽ₯ Generate Prompts from Video File Frames – Provide a short video clip to automatically add sequence to prompts list with init_image overrides…
    • 🀳 BLIP2 Image2Text Examiner – Create prompts by describing input images…
  • Image AIs:
    • 🏜️ Instruct-Pix2Pix – Text-Based Image Editing – Follow Image Editing Instructions…
    • πŸ•ΈοΈ ControlNet Image+Text-to-Image+Video-to-Video – Adding Input Conditions To Pretrained Text-to-Image Diffusion Models…
    • πŸ•·Β  ControlNet SDXL Image+Text-to-Image – Adding Input Conditions To Pretrained Text-to-Image Diffusion Models…
    • πŸ•‹ ControlNet Stable Diffusion 3 Image+Text-to-Image – Adding Input Conditions To Pretrained Text-to-Image Diffusion Models…
    • πŸ•ΈΒ  ControlNet-XS Image+Text-to-Image – Faster & Smaller Controlnet Image Conditioning Text-to-Image Diffusion Models…
    • 🎎 Kandinsky 3.0 – A Latent Diffusion model with two Multilingual text encoders, supports 100+ languages…
    • πŸ’£ Kandinsky 2.2 Fuse – Mix multiple Images and Prompts together to Interpolate. A Latent Diffusion model with two Multilingual text encoders, supports 100+ languages…
    • 🏩 Kandinsky 2.2 ControlNet Text+Image-to-Image – Image-to-Image Generation with ControlNet Conditioning and depth-estimation from transformers.
    • πŸŒ€ FLUX.1 by Black Forest Labs – 12B param rectified flow transformer distilled from FLUX.1 [pro]…
    • πŸ”— ControlNet QRCode Art Generator – ControlNet Img2Img for Inpainting QR Code with Prompt and/or Init Image…
    • πŸ‡ OpenAI DALLβ€’E 3 – Generates Images using your OpenAI API Key. Note: Uses same credits as official website.
    • 🌷 Stable Cascade – Efficient Text-to-Image Synthesis using WΓΌrstchen 3 Enhanced… Excelent prompt understanding & text writing.
    • 🌭 WΓΌrstchen – Text-to-Image Synthesis uniting competitive performance, cost-effectiveness and ease of training on constrained hardware.
    • πŸ‰οΈ Tencent Hunyuan-DiT – Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding from Tencent Hunyuan….
    • 🌞 Lumina-Next-DiT (under construction) – Next-generation Diffusion Transformer that Enhances Text-to-Image Generation, Multilingual, Multitasked Performance…
    • 🎑 aMUSEd Open-MUSE – Lightweight and Fast vqVAE Masked Generative Transformer Model to make many images quickly at once…
    • 🧚 PixArt-Ξ£ Sigma – Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation… Note: Uses a lot of RAM & Space, may run out.
    • 🧚 PixArt-Ξ±lpha – Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis… Note: Uses a lot of RAM & Space, may run out.
    • πŸ«₯ Layer Diffusion SDXL – Transparent Image Layer Generation using Latent Transparency…
    • πŸ™ˆ Differential Diffusion SDXL & SD3 Image2Image – Modifies an image according to a text prompt, and according to a map that specifies the amount of change in each region…
    • πŸ’£ DemoFusion – Democratising High-Resolution Image Generation With No $$$. SDXL with Clean Upscaling, 3 Phase Denoising/Decoding, slow but real quality…
    • 🌈 DeepFloyd IF (under construction, may not work) – A new AI image generator that achieves state-of-the-art results on numerous image-generation tasks…
    • πŸ† LMD+ LLM-grounded Diffusion – Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models.
    • πŸ’» Latent Consistency Model (LCM) – Synthesizing High-Resolution Images with Few-Step Inference.
    • πŸ‘ͺ LCM Interpolation – Transition the Latent Consistency between multiple text prompts… Good fast results in only 4 Steps!
    • ⚑️ InstaFlow One-Step – Ultra-Fast One-Step High-Quality Diffusion-Based Text-to-Image Generation…
    • πŸ’« Perturbed-Attention Guidance (PAG) – Self-Rectifying Diffusion Sampling. Uses SD Model in Installation settings…
    • 🌐 unCLIP Text-to-Image Generator – Hierarchical Text-Conditional Image Generation with CLIP Latents. Similar results to DALL-E 2…
    • 🌌 unCLIP Text Interpolation Generator – Takes two prompts and interpolates between the two input prompts using spherical interpolation…
    • πŸ€– unCLIP Image Interpolation Generator – Pass two images and produces in-betweens while interpolating between their image-embeddings…
    • πŸŽ† unCLIP Image Variation Generator – Generate Variations from an input image using unCLIP…
    • πŸͺ© Image Variations of any Init Image – Creates a new version of your picture, without a prompt…
    • πŸ“‘ BLIP-Diffusion by Salesforce – Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing…
    • πŸ’… HD-Painter by Picsart AI-Research – Prompt-Faithful and High-Resolution (up to 2k) Text-Guided Image Inpainting with Diffusion Models…
    • 🦊 IP-Adapter – Image Prompting capabilities to Transfer Subject with or without Prompt…
    • 🎩 Reference-Only Image with Prompt – ControlNet Pipeline for Transfering Ref Subject to new images…
    • 🧱 Replicate Material Diffusion – Create Seamless Tiled Textures with your Prompt. Requires account at Replicate.com and your Key.
    • πŸ₯Έ ControlNet on Meta’s Segment-Anything – Upload an Image, Segment it with Segment Anything, write a prompt, and generate images…
    • πŸ”€ LEDITS++ (under construction) – Limitless Image Editing using Text-to-Image Models…
    • πŸ˜‘ Null-Text Inversion Image Editing – Editing Real Images using Guided Diffusion Models… Exact Diffusion Inversion via Coupled Transformations. Prompt-to-prompt image editing with cross attention control.
    • 🀹 EDICT Image Editing – Diffusion pipeline for text-guided image editing… Exact Diffusion Inversion via Coupled Transformations.
    • πŸ’ DiffEdit Image Editing – Zero-shot Diffusion-based Semantic Image Editing with Mask Guidance…
    • πŸ”€ AnyText (under construction) – Multilingual Visual Text Generation and Text Editing…
    • πŸ§‘β€πŸ’»οΈ TaskMatrix Visual ChatGPT (under construction) – Talking, Drawing and Editing with Visual Foundation Models. Conversational requests for image editing & creating using OpenAI brain…
    • πŸ’… RePainter masked areas of an image – Fills in areas of picture with what it thinks it should be, without a prompt…
    • 🧚 MagicMix Init Image with Prompt – Diffusion Pipeline for semantic mixing of an image and a text prompt…
    • 🦁 Paint-by-Example – Image-guided Inpainting using an Example Image to Transfer Subject to Masked area…
    • 😎 CLIP-Styler – Transfers a Text Guided Style onto your Image From Prompt Description…
    • 🧩 Semantic Guidance for Diffusion Models – SEGA – Text-to-Image Generation with Latent Editing to apply or remove multiple concepts from an image with advanced controls….
    • ⚧️ DiT Models with Transformers Class-to-Image Generator – Scalable Diffusion Models with Transformers…
    • πŸ‘€ DeepDaze Text-to-Image Generator – An alternative method using OpenAI’s CLIP and Siren. Made a few years ago but still fascinating results….
  • Video AIs:
    • πŸ‘« AnimateDiff Enhanced Text-to-Video – Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning…
    • πŸ₯ Stable Animation SDK – Use Stability.ai API Credits for Advanced Video Generation, similar to Deforum & Disco Diffusion.
    • πŸƒ Stable Video Diffusion Image-To-Video (uses a lot of VRAM) – Generate high resolution (576×1024) 2-4 second videos conditioned on the input image…
    • πŸ‰ AnimateDiff Image/Video-to-Video – Bring an Image or Video Clip to Life! Similar to Stable Video Diffusion, with more control…
    • 🀯 AnimateDiff SDXL – Create Video Clips from Text Prompt with SDXL Model and Beta Motion Module…
    • πŸ”₯ DiffSynth Studio –Diffusion engine with multiple optimized modes, restructured architectures including Text Encoder, UNet, VAE, among others…
    • πŸ“· EasyAnimate v3 Text/Image-to-Video – End-to-End Solution for High-Resolution and Long Video Generation, by Alibaba…
    • 🌴 Open-Sora-Plan v1.2 Text-To-Video Synthesis – Transformer-based Text-to-Video Diffusion trained at 720p 93 Frames on Text Embeddings from mT5-xxl… (uses A LOT of VRAM & drive space)
    • 🍿 Cinemo Image-to-Video – Consistent and Controllable Image Animation with Motion Diffusion Models…
    • 🧬 I2VGen-XL Image-to-Video – High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models…
    • 🎬 Personalized Image Animator Image-to-Video – Image-to-Video Generation with PIA via Plug-and-Play Modules in Text-to-Image Models…
    • πŸŽ₯ Text-To-Video Synthesis – Modelscope’s Text-to-video-synthesis Model to Animate Diffusion
    • πŸŽ₯ Text-To-Video Zero – Text-to-Image Diffusion Models for Zero-Shot Video Generators
    • β˜•οΈ Latte-1 Text-To-Video Synthesis – Latent Diffusion Transformer for Video Generation…
    • πŸ₯” Potat1️⃣ Text-To-Video Synthesis – CamenDuru’s Open-Source 1024×576 Text-To-Video Model
    • 🎭 ROOP Face Swapper – Take a Video or Image and Replace the Face in it with a face of your choice, no dataset, no training needed…
    • πŸ‘„ Video ReTalking – Audio-based Lip Synchronization for Talking Head Video Editing in the Wild…
    • πŸ‘§ Β LivePortrait – Portrait Animation with Stitching and Retargeting Control… Transfers Face Movements to Source Image.
    • πŸ” Infinite Zoom Text-to-Video – Animate your Keyframe Prompts with an Endless Zooming Effect…
    • 🌿 FRESCO Video-To-Video – Spatial-Temporal Correspondence for Zero-Shot Video Translation…
    • πŸ‘— StyleCrafter Text-to-Video-or-Image – Enhancing Stylized Video or Image Generation with Style Adapter…
    • πŸ¦β€β¬› RAVE Video-to-Video – Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models…
    • πŸ“Ή Rerender-a-Video – Zero-Shot Text-Guided Video-to-Video Translation… (Note: May need 24GB VRAM to run)
    • 🌞 TokenFlow Video-To-Video (under construction) – Consistent Diffusion Features for Consistent Video Editing…
    • πŸ”₯ Hotshot-XL Text-To-GIF with SDXL – Generate Animated GIFs with any fine-tuned SDXL model… (Work in Progress)
    • πŸ€ͺ ControlNet Video2Video – Apply Stable Diffusion to a video, while maintaining frame-to-frame consistency with motion estimator & compensator…
    • πŸ“½Β  Video-To-Video Synthesis – Note: Uses more than 16GB VRAM, may crash session. Video-to-video-synthesis Model to Reanimate Video Clips
    • βŒ› Controlnet TemporalNet-XL – Video2Video ControlNet model designed to enhance the temporal consistency of video frames…
    • πŸ•ΈοΈ ControlNet Image+Text-to-Image+Video-to-Video – Adding Input Conditions To Pretrained Text-to-Image Diffusion Models…
  • 3D AIs:
    • πŸ—ΏDreamFusion 3D Model and Video – Provide a prompt to render a model. Warning: May take over an hour to run the training…
    • πŸ‘† Point-E 3D Point Clouds – Provide a Prompt or Image to render from a CLIP ViT-L/14 diffusion model…
    • 🧊 Shap-E 3D Mesh – Provide a Prompt or Image to Generate Conditional 3D PLY Models…
    • πŸ₯ ZoeDepth 3D Depth Model from Init Image – Zero-shot Transfer by Combining Relative and Metric Depth…
    • πŸͺ· Marigold Depth Estimation – Monocular depth estimator that delivers accurate & sharp predictions in the wild… Based on SD.
    • πŸ–οΈ Tripo.ai Image-to-3D – State-of-the-art open-source model for fast feedforward 3D reconstruction from a single image…
    • ⚑️ InstantMesh Image-to-3D (under construction) – Single Image to 3D Textured Mesh with LRM/Instant3D architecture…
    • πŸ—Ώ Splatter Image 3D (under construction) – Ultra-Fast Single-View 3D Reconstruction…
    • πŸˆβ€β¬› CRM Image-to-3D (under construction) – Single Image to 3D Textured Mesh with Convolutional Reconstruction Model…
    • πŸ‹ Latent Diffusion Model for 3D (LDM3D) – Generate RGB Images and 3D Depth Maps given a text prompt… Made with Intel.
    • πŸŽ‘ Instant Neural Graphics Primitives by NVidia – Convert series of images into 3D Models with Multiresolution Hash Encoding…
    • πŸ„ Meshy.ai 3D Generation API – Uses credits from their servers to create quality mesh models. Can take 3-15 minutes per, but gives great results…
    • πŸŒ” LumaLabs Video-to-3D API – Costs $1 per Model, takes ~30min, but well worth it for these NeRF and meshing models in their cloud…
  • Audio AIs:
    • 🐒 Tortoise Text-to-Speech Voice Modeling – Reads your text in a realistic AI voice, train your own to mimic vocal performances…
    • 🎸 MusicLDM Song Modeling – Text-to-Music Generation: Enhancing Novelty in Beat-Synchronous Mixup…
    • 🫒 Stable Audio Diffusers – Generate Variable-Length Stereo Audio at 44.1kHz from Text Prompts using StabilityAI Open Model…
    • 🦻 Audio LDM Modeling – Text-to-Audio Generation with Latent Diffusion Model…
    • πŸ“’ Audio LDM-2 Modeling – Holistic Audio Generation with Self-supervised Pretraining…
    • 🎧 ZETA Audio Editing with LDM-2 – Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion…
    • 🐢 Bark AI – Text-to-Audio Generation for Multilingual Speech, Music and Sound Effects…
    • πŸ’½ Riffusion Spectrogram Sound Modeling – Stable Diffusion for real-time music generation…
    • 🎢 Audio Diffusion Modeling – Converts Audio Samples to and from Mel Spectrogram Images…
    • 🎹 MusicLang Predict – Controllable Symbolic Music Generation with MusicLang Predict…
    • πŸͺ— Meta Audiocraft MusicGen (might be broken) – Simple and Controllable Music Generation with Audio tokenization model…
    • 🧏 OpenAI Whisper-AI Speech-To-Text – Generate Text Transcriptions from Speech Recordings, then optionally process text with GPT…
    • 🦜 OpenAI Text-to-Speech Voice Modeling – Turn text into lifelike spoken audio… Uses your OpenAI credits.
    • πŸ’¬ Voice Fixer – Speech Restoration with Neural Vocoder – Cleans up bad vocals and fixes the unwanted noise…
    • πŸ‘― Dance Diffusion – Create experimental music or sounds with HarmonAI trained audio models. Tools to train a generative model on arbitrary audio samples…
    • 🎼 Mubert Music Generator – AI music is generated by Mubert API. Pretty good grooves…(may not work if API maxed)
  • AI Trainers:
    • πŸŒ‡ Training with Low-Rank Adaptation of Large Language Models (LoRA DreamBooth) – Provide a collection of images to train. Adds on to the currently loaded Model Checkpoint…
    • 🌫️ Training Text-to-Image Low-Rank Adaptation of Large Language Models (LoRA) – Provide a collection of images to train. Smaller sized. Adds on to the currently loaded Model Checkpoint…
    • πŸ˜Άβ€πŸŒ«οΈ Create Custom DreamBooth Concept Model – Provide a collection of images to conceptualize.
    • πŸ˜Άβ€πŸŒ«οΈ Create Cusom Textual-Inversion Concept Model – Provide a collection of images to conceptualize.
    • πŸ”€ Model Converter Tool – Lets you Convert Format of Model Checkpoints to work with Diffusers…
    • πŸ‘₯ Checkpoint Merger Tool – Combine together two or more custom models to create a mixture of weights…
  • Extras:
    • ↕️  Real-ESRGAN AI Upscale Enlarging – Select one or more files, or give path to image or folder. Save to your Google Drive and/or Download.
    • 🧧 Manage your Saved Custom Models – Add or Edit your favorite models from HuggingFace, URL or Local Path
    • πŸ—‚οΈ Manage your Cache Directory Saved Models – If you’re cacheing your model files, it can fill up your drive space quickly, so you can trim the fat as needed… Redownloads when used.
    • πŸ–Ό MODNet Background Remover – A deep learning approach to clear the background of most images to isolate subject…
    • β›ˆοΈ AI-Horde Worker reGen Server – Share your GPU in the AI-Horde SD Cloud and earn Kudos… Give back to the Stable Horde, thanks db0.