Huggingface Model Matrix

The Huggingface models supported by SoloDesk AI are the ones that have been tested on a specific reference machine (REF2026) and found to be compatible with consumer grade hardware.


The model information matrix gets updated periodically as new models are added and new testing is performed.

The last update was on January 5, 2026.

Model Name Disk (GB) GPU Memory (GB) Source Parameters Description Supported Issues
Stable Diffusion 1.5 5.1 3.2 stable-diffusion-v1-5/stable-diffusion-v1-5 983M Text+image+mask-to-image. Yes One of the primary base models for many other derivatives.
Stable Diffusion XL 1.0 Base 13.2 8.2 stabilityai/stable-diffusion-xl-base-1.0 3.5B Text+image+mask-to-image. Yes One of the primary base models for many other derivatives.
Amused 3.27 3.4 amused/amused-512 Lightweight text+image+mask-to-image Yes Fast and light weight. Marginal quality. Deprecated library support.
Lumina 2 19.7 11.7 Alpha-VLLM/Lumina-Image-2.0 2B High quality text-to-image generator. Yes Use this model when text is desired.
Sana 15 11.7 Efficient-Large-Model/Sana_1600M_1024px_diffusers 1.6B Fast text-to-image model from Nvidia Yes Tested.
PixArt Alpha 20.3 11.6 PixArt-alpha/PixArt-XL-2-1024-MS High quality text-to-image generator. Yes Seems unnecessary when Pixart Sigma exists.
PixArt Sigma 20.3 11.6 PixArt-alpha/PixArt-Sigma-XL-2-1024-MS High quality text-to-image generator. Yes High quality 1536x768 renders.
Kolors 16.5 Kwai-Kolors/Kolors-diffusers Text/image-to-image generator. No Fails to load with newest Diffusers library.
Ultra Epic AI Realism 6.5 8.2 stablediffusionapi/ultraepicairealism-v10 Text+image+mask-to-image. Yes A realistic and uncensored Stable Diffusion SDXL derivative. Use fp16 file variant.
ZimageTurbo Tongyi-MAI/Z-Image-Turbo A new English/Chinese T2I model from Alibaba. No Planned for testing.
Stable Video Diffusion 4.2 6-23 stabilityai/stable-video-diffusion-img2vid 1.7B 14-frame image-to-video Yes Loads about 6GB, runs about 11GB, and has a big memory spike at finish.
Stable Video Diffusion XT 4.2 6-23 stabilityai/stable-video-diffusion-img2vid-xt 25-frame image-to-video Yes User’s machine should meet the full specs of REF2026 to run without issues.
Stable Video Diffusion XT 1.1 stabilityai/stable-video-diffusion-img2vid-xt-1-1 Image-to-Video No Gated model. Not accessible from downloader.
Stable Video 3D (SV3D) stabilityai/sv3d Image to 3D No Gated model. Not accessible from downloader.
LTX Video 26.4 5.5 Lightricks/LTX-Video 2B Text/image to video. Yes Renders 49 frames in 33 seconds on REF2025, 22 seconds on REF2026.
Wan Video 27 11.7 Wan-AI/Wan2.1-T2V-1.3B-Diffusers 1.3B Text-to-video. Yes Renders 640x480 at about 4 seconds/frame on REF2026.
AnimateDiff 1.6 guoyww/animatediff-motion-adapter-v1-5-3 Motion adapter. Makes SD 1.5 models do short videos. Yes Designed for 16 frame renders.
Sky Reels V2 Skywork/SkyReels-V2-I2V-1.3B-540P 1.3B Image-to-Video No Planned for testing.
Sky Reels V2 Skywork/SkyReels-V2-DF-1.3B-540P 1.3B Text-to-video No Planned for testing.
Audio LDM2 4.2 3.1 cvssp/audioldm2 1.1B Text-to-audio. Yes Light weight and fast render.
Audio LDM2 Large 5.9 3.9 cvssp/audioldm2-large 1.5B Text-to-audio. Yes Light weight and fast render despite “large” model characterization.
Music LDM2 4.2 3.1 cvssp/audioldm2-music 1.1B Text-to-audio music model. Yes Can generate 20 seconds of audio in 26 seconds on REF2025, 13 seconds on REF2026.
Stable Audio Open 1.0 stabilityai/stable-audio-open-1.0 Text-to-audio No Gated model. Not accessible from downloader.
MusicGen-Melody facebook/musicgen-melody 1.5B Text-to-audio No To be tested.
MusicGen – Medium 16 14.9 facebook/musicgen-medium 1.5B Text-to-audio music model. Yes Renders 30 seconds of audio in 49 seconds on REF2026.
MusicGen-Small 2.3 3.9 facebook/musicgen-small 300M A fast text-to-audio music model. Yes Renders 30 seconds of audio in 17 seconds on REF2026.
MusicGen-Stereo-Medium 4.1 5.9 facebook/musicgen-stereo-medium 1.5B A stereophonic version of MusicGen-Medium and faster. Yes Renders 30 seconds of audio in 30 seconds on REF2026.
MusicGen-Stereo-Small 1.2 2.4 facebook/musicgen-stereo-small 300M A stereophonic version of MusicGen-Small Yes Renders 30 seconds of audio in 19 seconds on REF2026.
VibeVoice microsoft/VibeVoice-1.5B 1.5B Text-to-voice No Planned for testing.
ShapE Text 3.3 3.3 – 4.6 openai/shap-e Text-to-3D mesh. Yes Fast and light weight. Marginal quality.
ShapE Image 4 3.6 openai/shap-e-img2img Image-to-3D mesh. Yes Fast and light weight. Marginal quality.
Hunyuan3D 2.0 Single View 4.9 6.4 tencent/Hunyuan3D-2 Image to 3D Yes 3D mesh only. Textures not yet supported.
Hunyuan3D 2.0 Multiview 4.9 6.6 tencent/Hunyuan3D-2mv Image to 3D Yes 3D mesh only. Textures not yet supported.
LDM3D Intel/ldm3d No Outputs only a depth map image. No actual 3D mesh.


For the Hugginface models, the user runs the Huggingface downloader utility to obtain models. The models are downloaded in the Huggingface Diffusers format that exist as a Windows folder with multiple files and subfolders. The downloader utility is invoked from a menu in the SoloDesk user interface.


Some of the models on the site are "gated" models. This means that the user will have to give some personal information to the owner of the model repository before they can access the model. The SoloDesk model downloader tool is not able to access gated models at this time due to technical limitations. Therefore, gated models are currently not supported.




View Supported Civitai Models