Generative AI from the desk. Not from the cloud.

Your machine. Your data. Your business.

Generate Images from Text Prompts

Use text prompts to generate images with Stable Diffusion, Sana, Lumina 2, PixArt Sigma and more from a local personal computer offline. AI Models can be obtained from their corresponding repositories from the Huggingface website using a built-in downloader or directly from the Civitai website.

After a model has been downloaded, the internet can be disconnected. The user then edits the file path to the downloaded model in the model grid. Once configured, the model is simply dragged and dropped from the model grid into the parameters tab. The model is then initialized and ready to generate an unlimited number of images from the privacy of a local desktop machine.

While it is free to generate unlimited images, the user's machine must meet a required minimum performance specification. For SoloDesk version 26.x, a reference configuration REF2026 has been defined. This defines a machine with an Nvidia RTX 5060 Ti 16GB, 64GB system RAM, and at least a 2TB SSD if the user chooses to use all of the supported Huggingface 26.x models.

Generate Images from other Images

Some AI models support generating images from other images. For models that support this capability, this option will appear in a drop-down list when initializing the model. Once initialized, the user can browse for an image on their local machine or paste an image from the system clipboard. The image, in conjunction with a text prompt, can then be used to generate the image by clicking the generate button.

Perform Inpainting on Images

In some cases the user might wish to use only part of an image to generate another image. This is where inpainting (or conversely, outpainting) comes into play. This is done using a mask image. A mask image is typically a black and white image that defines the region(s) of inpainting.

SoloDesk has a utility to create mask images (the Mask Editor). Once created, both images (the prompt image and the mask image) will be sent to the generator to generate an inpainted image. Only certain models support inpainting. Models that support this feature will have an inpaint option to select during model initialization.

Create Masks with the Mask Editor

In order to use inpainting, a mask must be generated. This can be done quickly and easily in SoloDesk with the Mask Editor. Once a prompt image has been loaded, a button on the image prompt toolbar can be clicked and a new window is loaded with the prompt image populated as a ghost image with 50% opacity. From there the user can utilize the dabber tools to draw the desired mask image. Once completed, the user clicks the “Create Mask” button to generate the mask and close the Mask Editor.

Generate Videos from Text Prompts

Some models can use text prompts to generate video. The user loads a video-capable model, specifies the video file to be created, and generates the video based on the configured settings. Generating video requires more computational resources than generating images. While the supported video models have been tested and found to run on REF2026, the user will likely get a better experience on more capable hardware with more video RAM.

As of version 26.1, the supported video models are Stable Video Diffusion, LTX Video and Wan Video. There is also a motion adapter called AnimateDiff that can create 16-frame animations from Stable Diffusion 1.5 models.

Generate Audio from Text Prompts

Some models can generate audio from text prompts. Similar to video, the user loads a model, specifies the audio file to be created, and generates the audio based on the configured settings.

As of version 26.1, the supported text-to-audio models are Audio LDM2, Music LDM2 and MusicGen (small, medium, small-stereo, medium-stereo).

Generate 3D Objects from Text Prompts

SoloDesk has a 3D viewport for models that can generate a 3D mesh from a text prompt or an image prompt. As of SoloDesk version 26.1, the models supported are Open AI Shap-E text-to-3D, Shap-E image-to-3D, Hunyuan3D single view and Hunyuan3D multi-view.

A rendered 3D model can be rotated on all three axis. The viewport can be switched between perspective and orthographic view modes. The model can be viewed up close or at distance by adjusting the camera distance slider.

View Supported Huggingface Models