5 Open-source Local AI Tools for Image Generation I Found Interesting

Ever since I realized that AI was shaping the future, I’ve been fascinated by its endless possibilities.

I’m someone who enjoys testing large language models (LLMs) on my devices, and the open-source approach to data has always been my preference.

Why? Because open-source projects empower us to have control, privacy, and customization, which is essential in today's data-driven world.

When I decided to explore AI image generation, it felt like a natural extension of this mindset. Why rely on proprietary models when open-source alternatives offer powerful features and flexibility?

Now, I’ll admit - I don’t have the ideal hardware to run these models locally at blazing speeds, but where there’s a will, there’s a way! Sure, CPU inference is painfully slow, but it gets the job done eventually (and hey, patience builds character, right?).

During my research, I stumbled upon several fascinating projects. Some are fully ripe and ready to use, while others are still budding and need more time to mature.

This article is a combined list of some of the best open-source AI image generators that you can run locally. If I’ve missed any gems, feel free to let me know in the comments!

1. Stable diffusion 1.5 (paired with stable-diffusion webui)

Stable Diffusion v1.5 is a powerful latent text-to-image diffusion model designed to generate photo-realistic images from textual prompts.

Developed as an evolution of earlier versions, it was fine-tuned on a large-scale dataset, "LAION-Aesthetics v2 5+", to enhance its capabilities.

This model is particularly well-suited for artistic, creative, and research purposes, offering impressive results with minimal computational requirements.

Key features

Unlock high-quality text-to-image generation with its latent diffusion process, achieving impressive results with reduced computational overhead.
Fine-tuned on a large-scale dataset to improve its ability to generate visually appealing images.
Supports multiple platforms and tools, including Diffusers Library for seamless integration into Python workflows, ComfyUI, Automatic1111, SD.Next, and InvokeAI for local usage.
Enjoy efficient weight options like EMA-only weights for inference or EMA + non-EMA weights for fine-tuning tasks.
Leverage the Pretrained Text Encoder, inspired by Google's Imagen model, to robustly understand text prompts.
Generate artwork, design prototypes, and educational visuals with its creative applications, ideal for artistic and research purposes.

Stable Diffusion 1.5

2. Invoke AI

invoke ai server webui — Source: InvokeAI

InvokeAI is a robust, open-source image generation project that takes its inspiration from upon Stable Diffusion, offering users a highly customizable experience for creating unique visuals.

Whether you're looking to generate artwork, photorealistic images, or something more abstract, InvokeAI provides a powerful toolkit with an easy-to-use interface.

Its flexibility is perfect for those who want more control over the creative process, especially for those working with specific intellectual property or requiring tailored workflows.

Key Features

Create highly detailed prompts with options for both positive and negative guidance to guide the generation process.
Generate images based on textual descriptions, with numerous customization options for finer control.
Use an existing image as a reference to help guide the AI in maintaining specific colors, structures, or themes.
Access a unified canvas that enables users to modify images by regenerating certain elements, editing content or colors (inpainting), and extending the image (outpainting).
Experiment with different models, each trained to generate specific styles or outputs, providing flexibility to match your creative needs.
Utilize advanced customization options like Low-Rank Adaptations (LoRAs) and Textual Inversion Embeddings to focus on specific characters, styles, or concepts.
Customize the number of de-noising steps and choose from different schedulers to optimize the generation process for quality and speed.

Invoke AI

3. OpenJourney

OpenJourney is a powerful, open-source text-to-image AI art generator that allows users to create stunning visuals from text prompts.

Launched in November 2022 by PromptHero, it has quickly gained popularity as a free alternative to MidJourney.

Built on Stable Diffusion, OpenJourney was trained using thousands of MidJourney images from its v4 update, as well as other AI models like DALL-E 2.

OpenJourney excels at generating photorealistic and artistic images, and its open-source nature ensures it remains accessible to a wide audience.

Key Features

Create stunning visuals from text prompts with its powerful text-to-image generation capabilities.
Enjoy photorealistic and artistic images, perfect for artists, designers, and anyone looking to generate high-quality content.
Access a library of curated prompt ideas to inspire your creativity and get started with generating art.
Customize the style and content of your generated images by crafting specific prompts that fit your vision.
Benefit from OpenJourney's stable diffusion-based architecture and additional training on MidJourney images for enhanced capabilities.
Take advantage of its wide accessibility, available for free download on Hugging Face as part of a broader ecosystem of open-source AI models.

Openjourney

4. LocalAI (all-rounder)

image generation inside telegram bot created using local ai — This is an example of telegram-bot created using LocalAI | Source: LocalAI

LocalAI is an open-source, free alternative to OpenAI that enables local AI inferencing on consumer-grade hardware.

It acts as a drop-in replacement for OpenAI's API specifications, allowing you to run large language models (LLMs), generate images, audio, and more without the need for a GPU.

local ai api webui — LocalAI API WebUI | Source: LocalAI-frontend

Created and maintained by Ettore Di Giacinto, LocalAI provides a flexible and cost-effective solution for running AI models on-premise.

Key Features

It offers compatibility with OpenAI API specifications, making integration straightforward for developers.
The platform operates on consumer-grade hardware, eliminating the need for a GPU.
Supports a wide range of models and platforms, including Llama, Hugging Face, and Ollama, for diverse applications.
Enables advanced text generation using models like Llama.cpp and transformers.
Allows users to generate images from text prompts for creative projects.
Includes audio features such as text-to-audio and audio-to-text with whisper.cpp.
Facilitates embedding generation for vector database tasks like semantic search.
Offers peer-to-peer inferencing for distributed AI processing across multiple devices.
Integrates voice activity detection using Silero-VAD for improved audio task accuracy.
Provides an easy-to-use WebUI for managing models without technical expertise.
Features a model gallery for browsing and downloading models directly from platforms like Hugging Face.

Local AI

5. Foocus (Editor's choice)

fooocus webui generating a cat picture — Source: Fooocus

Fooocus caught my attention as one of the most user-friendly and innovative open-source image generators out there.

I was especially drawn to its ability to work on modest hardware(like mine, my poor laptop) and can handle diverse styles, having compatibility with various models.

It’s like having a Swiss Army knife for image generation!

Key features

Fooocus boasts a proprietary inpainting algorithm that delivers superior results for editing and completing images.
With the ability to use multiple prompts simultaneously, Fooocus enriches creative possibilities and output diversity, opening up new avenues of artistic expression.
Fooocus supports a vast array of SDXL models, accommodating styles from artistic to photorealistic, giving users endless options for experimentation.
Users can specify aspect ratios for tailor-made image generation, ensuring that every output meets their unique requirements.
Advanced style controls, including contrast, sharpness, and color adjustments, empower users to fine-tune generated images with precision.
Fooocus utilizes A1111's reweighting algorithm, enhancing the influence of specific elements within prompts for more targeted results.
The platform incorporates InsightFace technology for precise face swapping, ideal for creating personalized avatars or modifications.
Optimized for performance across a wide range of hardware configurations, Fooocus ensures accessibility and speed, regardless of the user's setup.

Fooocus

Conclusion

And there you have it! From Stable Diffusion to Fooocus, these are some of the open-source projects you can host or deploy locally to create stunning images right on your hardware.

While I won't dive into the murky waters of how these models get trained (support your favorite creators, and remember, stealing is bad!), I can tell you this: each project offers unique capabilities and tons of creative potential.

I like exploring local AI tools. Take this list of open source AI tools for documents.

Now, before I get lost in a sea of stunning visuals and my laptop's fan decides to take off, I have a tiny request for you.

What do you think? Have any hidden gems that I missed? Do you agree with my not-so-secret affection for LocalAI and Fooocus?

Dive into the comments section and let me know your thoughts. Who knows? Your suggestion might just be the next project I test out (if my CPU allows it, of course)!

Until next time, keep generating and keep dreaming!

It's FOSS turns 13! 13 years of helping people use Linux ❤️

And we need your help to go on for 13 more years. Support us with a Plus membership and enjoy an ad-free reading experience and get a Linux eBook for free.

To celebrate 13 years of It's FOSS, we have a lifetime membership option with reduced pricing of just $76. This is valid until 25th June only.

If you ever wanted to appreciate our work with Plus membership but didn't like the recurring subscription, this is your chance 😃

Get Lifetime Membership of It's FOSS