Meet Auraflow: A Truly Open Source AI Image Generator Aiming to Beat Stable Diffusion 3

clock
2024-07-23 03:55:17

There's a new contender for the title of king of open-source AI image generators: Auraflow. Released last week by the generative media company Fal AI, Auraflow is gaining traction with its standard Apache 2.0 license, which feels like a breath of fresh air compared to the restrictive licensing that Stability AI used to release Stable Diffusion 3 (SD3).

Advocates argue that open-source projects can rapidly speed up development cycles in competitive industries, since it frees developers from licensing and other legal constraints. In the absence of licensing fees, communities frequently form around competent open-source projects, and developers can tweak, modify, train and even profit from their work.

"We are excited to present you [with] the first release of our Auraflow model series, the largest yet completely open-sourced flow-based generation model capable of text-to-image generation," FAL AI said in a blog post. The San Francisco-based company, which was co-founded in 2021 by Burkay Gur and Gorkem Yurtseven—engineers who worked at Coinbase and Amazon respectively—warned that open-source AI is in jeopardy. ”Some even boldly announced that open-source AI is dead,” they said. ”Not so fast!”

During more than four weeks of intensive compute time, Auraflow underwent rigorous training, including a pretraining of images in different sizes, resolutions (256x256, 512x512, and 1024x1024) and aspect ratios (square images, landscapes, portraits, etc). The result? A GenEval score of 0.64, with a boost to 0.703 using a prompt-enhancement pipeline similar to DALL-E 3.

Generations made with Auraflow. Image shared by Fal AI
Generations made with Auraflow. Image shared by Fal AI

In other words, the model provided high-quality results when tested using synthetic benchmarks. However, as good as it is, Auraflow is still just a beta, as Fal considers it version 0.1 rather than a stable release.

The model is a VRAM eater, though. It requires a beefy GPU with around 12 GB of VRAM to run its fp16 version —Stable Diffusion 3 runs fine on just 6GB VRAM, for reference. However, the company claims that a more manageable model is in the works. “Smaller models or MoE’s might be more efficient for consumer GPU cards, which have a limited amount of compute power, so follow closely for a mini version of [this] model that is still as powerful yet much much faster to run,” Fal AI said.

Auraflow is available for download on Huggingface and can be run in ComfyUI with a custom node also available in the ComfyUI Manager.

Auraflow represents a formidable alternative to SD3, but is it good enough to beat it? We compared the two base models and tested their performances across various art styles and prompts. You can be the judge on who’s most likely to win the hearts of AI artists around the world, as we share our observations.

Prompt: "A detailed painting of a sunset over a tranquil lake, the sky filled with hues of orange, pink, and purple, a wooden pier extending into the water, a person sitting at the end of the pier with a fishing rod, surrounded by tall grasses and wildflowers, the overall style is impressionistic with bold brushstrokes and vibrant colors."

Auraflow:

SD3 Medium:

Winner: It's a tie. Auraflow follows the impressionistic style more closely, but SD3 is more detailed and structured.

Prompt: “A high-resolution photograph of a bustling city street at night, neon signs illuminating the scene, people walking along the sidewalks, cars driving by, a street vendor selling hot dogs, reflections of lights on wet pavement, the overall style is hyper-realistic with attention to detail and lighting, a neon sign says ‘Decrypt.’”

Auraflow:

SD3 Medium:

Winner: SD3 Medium offers a more detailed and hyper-realistic image, making it the better model for this prompt.

Prompt: “Hand-drawn illustration of a giant spider chasing a woman in the jungle, extremely scary, anguish, dark and creepy scenery, horror, hints of analog photography influence, sketch.”

Auraflow:

SD3 Medium:

Winner: SD3 Medium provides a more frightening and detailed illustration, making it the better model for this prompt.

Prompt: “A surreal digital artwork of a floating island in the sky, the island covered in lush greenery and waterfalls cascading into the clouds below, a small castle at the center of the island, bridges made of light connecting to other floating islands, the sky is filled with colorful hot air balloons and mythical creatures, the overall style is fantastical with dreamy elements and glowing effects.”

Auraflow:

SD3 Medium:

Winner: Auraflow captured all the elements in the prompt making it the better model for this prompt.

Prompt: “A dog standing on top of a TV showing the word ‘Decrypt’ on the screen. On the left there is a a woman in a business suit holding a coin, on the right there is a robot standing on top of a first aid box. The overall scenery is surreal.”

Auraflow:

SD3 Medium:

Winner: Tie. SD3 Medium offers better clarity, making it the better model for this prompt. Auraflow provides all the elements of the generation too, and showed a good level of understanding in terms of space comprehension.

Prompt: ”A female ninja fighting against a strong samurai in ancient Japan, anime, manga, highly detailed, colorful, dynamic.”

Auraflow:

SD3 Medium:

Winner: SD3 Medium provides a more detailed and dynamic depiction, making it the better model for this prompt. Both lacked key elements in terms of prompt adherence.

Auraflow excels in capturing impressionistic, fantastical, and whimsical styles, while SD3 Medium is better at providing detailed, hyper-realistic, and dynamic scenes.

Both weaknesses can be tweaked with fine tuning, and this is where law beats tech. Auraflow's Apache 2.0 open source license makes it attractive for fine-tuners, allowing free use, reproduction, and distribution under the license terms, unlike SD3 which is more restrictive in that regard. Therefore, it may be easier to start working on Auraflow. But until then, this is just a strategic advantage that hasn't yet been realized.

However, Auraflow requires a lot of VRAM to run, with some reports indicating up to 35 GB, which is significantly higher than SD3, which requires only 6 GB of VRAM. For reference, a 24GB RTX 4090 costs up to $1700 on Amazon whereas a 6GB RTX3050 capable of running SD3 can be found for less than $200. This is a tangible advantage that SD3 has over Auraflow right now.

Considering this, SD3 Medium is currently the better model in this comparison, serving a broader user base due to its lower hardware requirements and comparable results in terms of quality.

Nonetheless, Auraflow shows great promise. If a pruned (smaller) or quantized (less precise) version is developed in the future that reduces its hardware demands, Auraflow could become a strong contender and potentially challenge Stability's long-standing dominance with its Stable Diffusion models.