Huggingface Model Matrix

The Huggingface models supported by SoloDesk AI are the ones that have been tested on a specific reference machine (REF2026) and found to be compatible with consumer grade hardware.


The model information matrix gets updated periodically as new models are added and new testing is performed.

The last update was on March 9, 2026.

Model Name Disk (GB) GPU Memory (GB) Source Parameters Description Supported Notes
Stable Diffusion 1.5 5.1 3.2 stable-diffusion-v1-5/stable-diffusion-v1-5 983M Text+image+mask to image. Yes The foundational Stable Diffusion model. The basis of many derivatives
Stable Diffusion XL 1.0 Base 13.2 8.2 stabilityai/stable-diffusion-xl-base-1.0 2.6B Text+image+mask to image. Yes The foundational Stable Diffusion XL model. The basis of many derivatives
Amused 3.27 3.4 amused/amused-512 Lightweight text+image+mask to image. Yes Fast and light weight. Marginal quality. Deprecated library support.
Lumina 2 19.7 12.5 Alpha-VLLM/Lumina-Image-2.0 2B High quality text-to-image generator. Yes Use this model when text is desired.
Sana 15 12.4 Efficient-Large-Model/Sana_1600M_1024px_diffusers 1.6B Fast text-to-image model from Nvidia Yes Among fastest of 1024x1024 renders. Marginal quality.
PixArt Alpha 20.3 11.6 PixArt-alpha/PixArt-XL-2-1024-MS 1.2B High quality text-to-image generator. Yes Seems unnecessary when Pixart Sigma exists.
PixArt Sigma 20.3 11.6 PixArt-alpha/PixArt-Sigma-XL-2-1024-MS 1.2B High quality text-to-image generator. Yes High quality 1536x768 renders.
Kolors 16.5 6.1 Kwai-Kolors/Kolors-diffusers Text/image-to-image generator. Yes Now runs smoothly with recent memory management changes.
Illustrious 6.8 6.7 OnomaAIResearch/Illustrious-XL-v2.0 3.5B Text+image to image. Yes Based on SDXL architecture. Used for cartoon-style illustrations.
Pony Diffusion V6 6.5 8.3 stablediffusionapi/Pony-Diffusion-V6-XL 3.5B Text+image to image. Yes Based on SDXL architecture.
Ultra Epic AI Realism 6.5 8.2 stablediffusionapi/ultraepicairealism-v10 2.6B Text+image+mask-to-image. Yes A realistic and uncensored Stable Diffusion SDXL derivative.
Z-Image Base 19.1 14.8 Tongyi-MAI/Z-Image 6B A high quality text+image-to-image model from Alibaba. Yes Huggingface version works. Excellent text rendering. Civitai versions are fragmented.
Z-Image Turbo 30.5 14.8 Tongyi-MAI/Z-Image-Turbo 6B A high quality text+image-to-image model from Alibaba. Yes Huggingface version works. Civitai versions are fragmented.
Stable Video Diffusion 4.2 6-23 stabilityai/stable-video-diffusion-img2vid 1.7B 14-frame image-to-video Yes Loads about 6GB, runs about 11GB, and has a big memory spike at finish.
Stable Video Diffusion XT 4.2 6-23 stabilityai/stable-video-diffusion-img2vid-xt 25-frame image-to-video Yes User’s machine should meet the full specs of REF2026 to run without issues.
Stable Video Diffusion XT 1.1 stabilityai/stable-video-diffusion-img2vid-xt-1-1 Image-to-video. No Gated model. Not accessible from downloader. Might work, but never tested.
Stable Video 3D (SV3D) stabilityai/sv3d Image-to-3D. No Gated model. Not accessible from downloader.
LTX Video 26.4 5.5 Lightricks/LTX-Video 2B Text+image to video. Yes Renders 49 frames in 33 seconds on REF2025, 22 seconds on REF2026.
LTX-2 Video >26.2 Lightricks/LTX-2 19B Text+image to video. No Extra large model will require special hacks to run on consumer grade hardware.
Wan Video 2.1 27 13.1 Wan-AI/Wan2.1-T2V-1.3B-Diffusers 1.3B Text-to-video. Yes Renders 640x480 at about 4 seconds/frame on REF2026.
Wan Video 2.2 Wan-AI/Wan2.2-TI2V-5B-Diffusers 5B Text+Image to video. Yes Uses an image prompt and a text prompt to guide the motion.
AnimateDiff 1.6 guoyww/animatediff-motion-adapter-v1-5-3 Motion adapter. Makes SD 1.5 models do short videos. Yes Designed for 16 frame renders.
AnimateDiffXL guoyww/animatediff-motion-adapter-sdxl-beta Motion adapter. Makes SDXL models do short videos. Yes Designed for 16 frame renders.
Sky Reels V2 Skywork/SkyReels-V2-I2V-1.3B-540P 1.3B Image-to-Video No Planned for testing.
Sky Reels V2 Skywork/SkyReels-V2-DF-1.3B-540P 1.3B Text-to-video No Planned for testing.
Audio LDM2 4.2 3.1 cvssp/audioldm2 1.1B Text-to-audio. Yes Light weight and fast render.
Audio LDM2 Large 5.9 3.9 cvssp/audioldm2-large 1.5B Text-to-audio. Yes Light weight and fast render despite “large” model characterization.
Music LDM2 4.2 3.1 cvssp/audioldm2-music 1.1B Text-to-audio music model. Yes Can generate 20 seconds of audio in 26 seconds on REF2025, 13 seconds on REF2026.
Stable Audio Open 1.0 stabilityai/stable-audio-open-1.0 Text-to-audio No Gated model. Not accessible from downloader.
MusicGen-Melody facebook/musicgen-melody 1.5B Text-to-audio No To be tested.
MusicGen – Medium 16 14.9 facebook/musicgen-medium 1.5B Text-to-audio music model. Yes Renders 30 seconds of audio in 49 seconds on REF2026.
MusicGen-Small 2.3 3.9 facebook/musicgen-small 300M A fast text-to-audio music model. Yes Renders 30 seconds of audio in 17 seconds on REF2026.
MusicGen-Stereo-Medium 4.1 9.4 facebook/musicgen-stereo-medium 1.5B A stereophonic version of MusicGen-Medium and faster. Yes Renders 30 seconds of audio in 30 seconds on REF2026.
MusicGen-Stereo-Small 1.2 2.4 facebook/musicgen-stereo-small 300M A stereophonic version of MusicGen-Small Yes Renders 30 seconds of audio in 19 seconds on REF2026.
CSM 1B sesame/csm-1b 1B Text-to-speech No Gated model. Not accessible from downloader.
Marvis TTS 2.9 3.5 Marvis-AI/marvis-tts-250m-v0.2-transformers 250M Text-to-speech Yes Based on the gated Sesame CSM-1B model. Currently supports only default speaker.
Suno Bark 4.2 5.7 suno/bark Text-to-speech Yes Supports default speaker as well as speaker embedding files.
VibeVoice microsoft/VibeVoice-1.5B 1.5B Text-to-voice No Planned for testing pending Transformers integration.
ShapE Text 3.3 3.3 – 4.6 openai/shap-e Text-to-3D mesh. Yes Fast and light weight. Marginal quality.
ShapE Image 4 3.6 openai/shap-e-img2img Image-to-3D mesh. Yes Fast and light weight. Marginal quality.
Hunyuan3D 2.0 Single View 4.9 7.1 tencent/Hunyuan3D-2 Image to 3D Yes 3D mesh only. Textures not yet supported.
Hunyuan3D 2.0 Multiview 4.9 7.1 tencent/Hunyuan3D-2mv Image to 3D Yes 3D mesh only. Textures not yet supported.


For the Hugginface models, the user can downloadload the model files directly from the Huggingface repositories or use a downloader utility to obtain models. The downloader utility is invoked from a menu in the SoloDesk user interface. The models are downloaded in the Huggingface diffusers format that exist as a Windows folder with multiple files and subfolders. Alternatively, the user can use a third-party downloader of their choice since SoloDesk is capable of using the Huggingface diffusers format. This format allows users to store their model archive on a drive that is separate from the Windows operating system drive.


Some of the models on the site are "gated" models. This means that the user will have to give some personal information to the owner of the model repository before they can access the model. The SoloDesk model downloader tool is not able to access gated models at this time due to technical limitations. Therefore, gated models are currently not supported.




View Supported Civitai Models