Huggingface Model Matrix

The Huggingface models supported by SoloDesk AI are the ones that have been tested on a specific reference machine (REF2026) and found to be compatible with consumer grade hardware.

The model information matrix gets updated periodically as new models are added and new testing is performed.

The last update was on April 28, 2026.

Model Name	Disk (GB)	GPU Memory (GB)	Source	Parameters	Description	Supported	Notes
Stable Diffusion 1.5	5.1	3.2	stable-diffusion-v1-5/stable-diffusion-v1-5	983M	Text+image+mask to image.	Yes	The foundational Stable Diffusion 1.5 model. The basis of many derivatives
Stable Diffusion XL 1.0 Base	13.2	8.2	stabilityai/stable-diffusion-xl-base-1.0	2.6B	Text+image+mask to image.	Yes	The foundational Stable Diffusion XL model. The basis of many derivatives
Amused	3.27	3.4	amused/amused-512		Lightweight text+image+mask to image.	Yes	Fast and light weight. Marginal quality. Deprecated library support.
Lumina 2	19.7	12.5	Alpha-VLLM/Lumina-Image-2.0	2B	High quality text-to-image generator.	Yes	Use this model when text is desired.
Sana	15	12.4	Efficient-Large-Model/Sana_1600M_1024px_diffusers	1.6B	Fast text-to-image model from Nvidia	Yes	Among fastest of 1024x1024 renders. Marginal quality.
PixArt Alpha	20.3	11.6	PixArt-alpha/PixArt-XL-2-1024-MS	1.2B	High quality text-to-image generator.	Yes	Seems unnecessary when Pixart Sigma exists.
PixArt Sigma	20.3	11.6	PixArt-alpha/PixArt-Sigma-XL-2-1024-MS	1.2B	High quality text-to-image generator.	Yes	High quality 1536x768 renders.
Kolors	16.5	6.1	Kwai-Kolors/Kolors-diffusers		Text/image-to-image generator.	Yes	Now runs smoothly with recent memory management changes.
Illustrious	6.8	6.7	OnomaAIResearch/Illustrious-XL-v2.0	2.6B	Text+image to image.	Yes	Based on SDXL architecture. Used for cartoon-style illustrations.
Pony Diffusion V6	6.5	8.3	stablediffusionapi/Pony-Diffusion-V6-XL	2.6B	Text+image to image.	Yes	Based on SDXL architecture.
Ultra Epic AI Realism	6.5	8.2	stablediffusionapi/ultraepicairealism-v10	2.6B	Text+image+mask-to-image.	Yes	A realistic and uncensored Stable Diffusion SDXL derivative.
Z-Image Base	19.1	14.8	Tongyi-MAI/Z-Image	6B	A high quality text+image-to-image model from Alibaba.	Yes	Huggingface version works. Excellent text rendering. Civitai versions are fragmented.
Z-Image Turbo	30.5	14.8	Tongyi-MAI/Z-Image-Turbo	6B	A high quality text+image-to-image model from Alibaba.	Yes	Huggingface version works. Civitai versions are fragmented.
Stable Video Diffusion	4.2	6 (23 spike)	stabilityai/stable-video-diffusion-img2vid	1.7B	14-frame image-to-video	Yes	Loads about 6GB, runs about 11GB, and has a big memory spike at finish.
Stable Video Diffusion XT	4.2	6 (23 spike)	stabilityai/stable-video-diffusion-img2vid-xt		25-frame image-to-video	Yes	User’s machine should meet the full specs of REF2026 to run without issues.
Stable Video Diffusion XT 1.1			stabilityai/stable-video-diffusion-img2vid-xt-1-1		Image-to-video.	No	Gated model. Not accessible from downloader. Might work, but never tested.
Stable Video 3D (SV3D)			stabilityai/sv3d		Image-to-3D.	No	Gated model. Not accessible from downloader.
LTX Video	26.4	5.5	Lightricks/LTX-Video	2B	Text+image to video.	Yes	Renders 49 frames in 33 seconds on REF2025, 22 seconds on REF2026.
LTX-2 Video	>26.2		Lightricks/LTX-2	19B	Text+image to video.	No	Extra large model will require special hacks to run on consumer grade hardware.
Wan Video 2.1	27	13.1	Wan-AI/Wan2.1-T2V-1.3B-Diffusers	1.3B	Text-to-video.	Yes	Renders 640x480 at about 4 seconds/frame on REF2026.
Wan Video 2.2			Wan-AI/Wan2.2-TI2V-5B-Diffusers	5B	Text+image to video.	Yes	Uses an image prompt and a text prompt to guide the motion.
SANA-Video 480p	13.1	8.1 (16.8 spike)	Efficient-Large-Model/SANA-Video_2B_480p_diffusers	2B	Text+image-to-video	Yes	Renders 49 frames at 832x480 in 150 seconds.
AnimateDiff	1.6		guoyww/animatediff-motion-adapter-v1-5-3		Motion adapter. Makes SD 1.5 models do short videos.	Yes	Designed for 16 frame renders.
AnimateDiffXL			guoyww/animatediff-motion-adapter-sdxl-beta		Motion adapter. Makes SDXL models do short videos.	Yes	Designed for 16 frame renders.
Sky Reels V2			Skywork/SkyReels-V2-I2V-1.3B-540P	1.3B	Image-to-Video	No	Planned for testing.
Sky Reels V2			Skywork/SkyReels-V2-DF-1.3B-540P	1.3B	Text-to-video	No	Planned for testing.
Audio LDM2	4.2	3.1	cvssp/audioldm2	1.1B	Text-to-audio.	Yes	Light weight and fast render.
Audio LDM2 Large	5.9	3.9	cvssp/audioldm2-large	1.5B	Text-to-audio.	Yes	Light weight and fast render despite “large” model characterization.
Music LDM2	4.2	3.1	cvssp/audioldm2-music	1.1B	Text-to-audio music model.	Yes	Can generate 20 seconds of audio in 26 seconds on REF2025, 13 seconds on REF2026.
Stable Audio Open 1.0			stabilityai/stable-audio-open-1.0		Text-to-audio	No	Gated model. Not accessible from downloader.
MusicGen-Melody			facebook/musicgen-melody	1.5B	Text-to-audio	No	To be tested.
MusicGen – Medium	16	14.9	facebook/musicgen-medium	1.5B	Text-to-audio music model.	Yes	Renders 30 seconds of audio in 49 seconds on REF2026.
MusicGen-Small	2.3	3.9	facebook/musicgen-small	300M	A fast text-to-audio music model.	Yes	Renders 30 seconds of audio in 17 seconds on REF2026.
MusicGen-Stereo-Medium	4.1	9.4	facebook/musicgen-stereo-medium	1.5B	A stereophonic version of MusicGen-Medium and faster.	Yes	Renders 30 seconds of audio in 30 seconds on REF2026.
MusicGen-Stereo-Small	1.2	2.4	facebook/musicgen-stereo-small	300M	A stereophonic version of MusicGen-Small	Yes	Renders 30 seconds of audio in 19 seconds on REF2026.
CSM 1B			sesame/csm-1b	1B	Text-to-speech	No	Gated model. Not accessible from downloader.
Marvis TTS	2.9	3.5	Marvis-AI/marvis-tts-250m-v0.2-transformers	250M	Text-to-speech	Yes	Based on the gated Sesame CSM-1B model. Currently supports only default speaker.
Suno Bark	4.2	5.7	suno/bark		Text-to-speech	Yes	Supports default speaker as well as speaker embedding files.
VibeVoice			microsoft/VibeVoice-1.5B	1.5B	Text-to-voice	No	Planned for testing pending Transformers integration.
ShapE Text	3.3	3.3 – 4.6	openai/shap-e		Text-to-3D mesh.	Yes	Fast and light weight. Marginal quality.
ShapE Image	4	3.6	openai/shap-e-img2img		Image-to-3D mesh.	Yes	Fast and light weight. Marginal quality.
Hunyuan3D 2.0 Single View	4.9	7.1	tencent/Hunyuan3D-2		Image to 3D	Yes	3D mesh only. Textures not yet supported.
Hunyuan3D 2.0 Multiview	4.9	7.1	tencent/Hunyuan3D-2mv		Image to 3D	Yes	3D mesh only. Textures not yet supported.

For the Hugginface models, the user can downloadload the model files directly from the Huggingface repositories or use a downloader utility to obtain models. The downloader utility is invoked from a menu in the SoloDesk user interface. The models are downloaded in the Huggingface diffusers format that exist as a Windows folder with multiple files and subfolders. Alternatively, the user can use a third-party downloader of their choice since SoloDesk is capable of using the Huggingface diffusers format. This format allows users to store their model archive on a drive that is separate from the Windows operating system drive.

Some of the models on the site are "gated" models. This means that the user will have to give some personal information to the owner of the model repository before they can access the model. The SoloDesk model downloader tool is not able to access gated models at this time due to technical limitations. Therefore, gated models are currently not supported.

View Supported Civitai Models