Huggingface Model Matrix
The Huggingface models supported by SoloDesk AI are the ones that have been tested on a specific reference machine (REF2026) and found to be compatible with consumer grade hardware.
The model information matrix gets updated periodically as new models are added and new testing is performed.
The last update was on January 5, 2026.
| Model Name | Disk (GB) | GPU Memory (GB) | Source | Parameters | Description | Supported | Issues |
|---|---|---|---|---|---|---|---|
| Stable Diffusion 1.5 | 5.1 | 3.2 | stable-diffusion-v1-5/stable-diffusion-v1-5 | 983M | Text+image+mask-to-image. | Yes | One of the primary base models for many other derivatives. |
| Stable Diffusion XL 1.0 Base | 13.2 | 8.2 | stabilityai/stable-diffusion-xl-base-1.0 | 3.5B | Text+image+mask-to-image. | Yes | One of the primary base models for many other derivatives. |
| Amused | 3.27 | 3.4 | amused/amused-512 | Lightweight text+image+mask-to-image | Yes | Fast and light weight. Marginal quality. Deprecated library support. | |
| Lumina 2 | 19.7 | 11.7 | Alpha-VLLM/Lumina-Image-2.0 | 2B | High quality text-to-image generator. | Yes | Use this model when text is desired. |
| Sana | 15 | 11.7 | Efficient-Large-Model/Sana_1600M_1024px_diffusers | 1.6B | Fast text-to-image model from Nvidia | Yes | Tested. |
| PixArt Alpha | 20.3 | 11.6 | PixArt-alpha/PixArt-XL-2-1024-MS | High quality text-to-image generator. | Yes | Seems unnecessary when Pixart Sigma exists. | |
| PixArt Sigma | 20.3 | 11.6 | PixArt-alpha/PixArt-Sigma-XL-2-1024-MS | High quality text-to-image generator. | Yes | High quality 1536x768 renders. | |
| Kolors | 16.5 | Kwai-Kolors/Kolors-diffusers | Text/image-to-image generator. | No | Fails to load with newest Diffusers library. | ||
| Ultra Epic AI Realism | 6.5 | 8.2 | stablediffusionapi/ultraepicairealism-v10 | Text+image+mask-to-image. | Yes | A realistic and uncensored Stable Diffusion SDXL derivative. Use fp16 file variant. | |
| ZimageTurbo | Tongyi-MAI/Z-Image-Turbo | A new English/Chinese T2I model from Alibaba. | No | Planned for testing. | |||
| Stable Video Diffusion | 4.2 | 6-23 | stabilityai/stable-video-diffusion-img2vid | 1.7B | 14-frame image-to-video | Yes | Loads about 6GB, runs about 11GB, and has a big memory spike at finish. |
| Stable Video Diffusion XT | 4.2 | 6-23 | stabilityai/stable-video-diffusion-img2vid-xt | 25-frame image-to-video | Yes | User’s machine should meet the full specs of REF2026 to run without issues. | |
| Stable Video Diffusion XT 1.1 | stabilityai/stable-video-diffusion-img2vid-xt-1-1 | Image-to-Video | No | Gated model. Not accessible from downloader. | |||
| Stable Video 3D (SV3D) | stabilityai/sv3d | Image to 3D | No | Gated model. Not accessible from downloader. | |||
| LTX Video | 26.4 | 5.5 | Lightricks/LTX-Video | 2B | Text/image to video. | Yes | Renders 49 frames in 33 seconds on REF2025, 22 seconds on REF2026. |
| Wan Video | 27 | 11.7 | Wan-AI/Wan2.1-T2V-1.3B-Diffusers | 1.3B | Text-to-video. | Yes | Renders 640x480 at about 4 seconds/frame on REF2026. |
| AnimateDiff | 1.6 | guoyww/animatediff-motion-adapter-v1-5-3 | Motion adapter. Makes SD 1.5 models do short videos. | Yes | Designed for 16 frame renders. | ||
| Sky Reels V2 | Skywork/SkyReels-V2-I2V-1.3B-540P | 1.3B | Image-to-Video | No | Planned for testing. | ||
| Sky Reels V2 | Skywork/SkyReels-V2-DF-1.3B-540P | 1.3B | Text-to-video | No | Planned for testing. | ||
| Audio LDM2 | 4.2 | 3.1 | cvssp/audioldm2 | 1.1B | Text-to-audio. | Yes | Light weight and fast render. |
| Audio LDM2 Large | 5.9 | 3.9 | cvssp/audioldm2-large | 1.5B | Text-to-audio. | Yes | Light weight and fast render despite “large” model characterization. |
| Music LDM2 | 4.2 | 3.1 | cvssp/audioldm2-music | 1.1B | Text-to-audio music model. | Yes | Can generate 20 seconds of audio in 26 seconds on REF2025, 13 seconds on REF2026. |
| Stable Audio Open 1.0 | stabilityai/stable-audio-open-1.0 | Text-to-audio | No | Gated model. Not accessible from downloader. | |||
| MusicGen-Melody | facebook/musicgen-melody | 1.5B | Text-to-audio | No | To be tested. | ||
| MusicGen – Medium | 16 | 14.9 | facebook/musicgen-medium | 1.5B | Text-to-audio music model. | Yes | Renders 30 seconds of audio in 49 seconds on REF2026. |
| MusicGen-Small | 2.3 | 3.9 | facebook/musicgen-small | 300M | A fast text-to-audio music model. | Yes | Renders 30 seconds of audio in 17 seconds on REF2026. |
| MusicGen-Stereo-Medium | 4.1 | 5.9 | facebook/musicgen-stereo-medium | 1.5B | A stereophonic version of MusicGen-Medium and faster. | Yes | Renders 30 seconds of audio in 30 seconds on REF2026. |
| MusicGen-Stereo-Small | 1.2 | 2.4 | facebook/musicgen-stereo-small | 300M | A stereophonic version of MusicGen-Small | Yes | Renders 30 seconds of audio in 19 seconds on REF2026. |
| VibeVoice | microsoft/VibeVoice-1.5B | 1.5B | Text-to-voice | No | Planned for testing. | ||
| ShapE Text | 3.3 | 3.3 – 4.6 | openai/shap-e | Text-to-3D mesh. | Yes | Fast and light weight. Marginal quality. | |
| ShapE Image | 4 | 3.6 | openai/shap-e-img2img | Image-to-3D mesh. | Yes | Fast and light weight. Marginal quality. | |
| Hunyuan3D 2.0 Single View | 4.9 | 6.4 | tencent/Hunyuan3D-2 | Image to 3D | Yes | 3D mesh only. Textures not yet supported. | |
| Hunyuan3D 2.0 Multiview | 4.9 | 6.6 | tencent/Hunyuan3D-2mv | Image to 3D | Yes | 3D mesh only. Textures not yet supported. | |
| LDM3D | Intel/ldm3d | No | Outputs only a depth map image. No actual 3D mesh. |
For the Hugginface models, the user runs the Huggingface downloader utility to obtain models. The models are downloaded in the Huggingface Diffusers format that exist as a Windows folder with multiple files and subfolders. The downloader utility is invoked from a menu in the SoloDesk user interface.
Some of the models on the site are "gated" models. This means that the user will have to give some personal information to the owner of the model repository before they can access the model. The SoloDesk model downloader tool is not able to access gated models at this time due to technical limitations. Therefore, gated models are currently not supported.
View Supported Civitai Models