Using ChatGPT to Create Stable Diffusion Prompts

9 min readMar 7, 2023

I can’t keep up with the pace of change in Stable Diffusion or even find all the details on how to make prompts. Seems like much of the content is geared to using specific website tools, while I am using the command line and a Python script - txt2img.py from CompVis/stable-diffusion: A latent text-to-image diffusion model (github.com). You need at least 10GB of VRAM for this approach. My GPU is an Nvidia RTX 2060 with 12GB of RAM.

Maybe I can use AI to create more AI generated content- let’s combine them all!

Why This matters

ChatGPT has knowledge gaps. If you can learn what they are for a given domain, you can make it better. Maybe you perform additional training (transfer learning). Maybe you augment with outside knowledge, either human or system provided. In the examples below, I take the augment approach and fill some gaps in ChatGPT’s knowledge of stable diffusion prompts. The next step in making a better system is to use an AI to analyze the output and provide feedback to modify the inputs. This has extra challenges with non-deterministic systems, but you can raise the output quality on average with good feedback.

My ChatGPT Prompt

I need 6 stable diffusion prompts. Please provide six samples that will generate exciting, sci-fi inspired imagery of a beautiful world of tomorrow in vibrant color. Include cities and spaceships. Robots and flying cars.

My Stable-Diffusion Command

python scripts/txt2img.py --skip_grid --plms --from-file scifi1.txt --n_iter 3 --n_samples 1 --precision full --outdir ./outputs/scifi

The responses and the images are below. As you can see, the bots have not taken over yet!

The Responses

In the bustling city of tomorrow, towering skyscrapers stretch towards the sky while sleek, flying cars whiz through the air. As you walk down the street, you pass by robots who happily carry out their duties, their metallic bodies gleaming in the sunlight.

These are skyscrapers but much is missing from the prompt.

The spaceship glides through the vast expanse of space, the stars glittering like diamonds all around. Inside, the crew works tirelessly, their hands moving deftly across glowing control panels as they navigate through the galaxy.

Pretty accurate but what overwhelming controls! NASA human factors engineers would never approve.

The city of the future is a marvel of architecture and technology, with intricate networks of glowing neon lights and holographic displays adorning every surface. The streets are alive with robots and cyborgs, each performing their designated tasks with precision and grace.

Classic Cyberpunk color scheme, and I should know having written and recorded the original 90’s soundtrack to the game.

As you gaze out at the vast, verdant landscape from the hovering spacecraft, you can’t help but feel a sense of wonder and awe. Lush forests and shimmering lakes stretch out as far as the eye can see, untouched by the ravages of pollution and climate change.

Very boringly accurate except there should be more lakes.

The robots in the factory hum along, their whirring gears and flashing lights a testament to the power of advanced engineering. Each one is meticulously crafted and programmed to perform its assigned task with maximum efficiency and minimal error.

These are factory robots of today, which makes sense since the prompt said nothing of the future. ChatGPT lost track of the original context.

The spaceship hurtles towards the shimmering, crystal-like space station in orbit around the distant planet. Inside, the crew prepares for a tense negotiation with the mysterious alien race that calls the station home. The fate of humanity hangs in the balance.

These look like covers from Analog Magazine. Accurate, except for the bizarro-world text.

The Human In-The-Loop

Can I do better with some manipulation of the prompt? I sure hope so, or I will soon be out of a job. I will cheat a little too, just for full disclosure, since I know what I really want, and AI systems can’t read my mind yet.

My cheats will be multiple runs plus cherry-picking, prompt re-engineering, using negative prompts, artist styles, and other quality and style modifiers. Clearly, ChatGPT knows little about the details of Stable Diffusion prompt engineering. I may also use a model specifically for awesome robots.

Here come the new prompts! Finding details on prompt syntax is harder than I expected. I found many conflicting bits of information. For stable diffusion, though, the best information was from www.prompthero.com

For the code I am using, the best way to achieve weighting is to put the important words first. That is a little odd in that it breaks the flow of a natural text prompt; however, this little experiment has shown those are not always the best, especially if complex.

Because I did so much prompt re-engineering, the prompts below don’t necessarily cover each image created, but they are close.

flying cars!!!, skyscrapers, blue and gold robots!!, metallic, vibrant, global illumination, 4k, panoramic, in the style of “Zaha Hadid”

spaceship, astronaut, crew using the control panels, stars!, muted colors, in the style of “Frederic Church”, 4k

people!!, robots!!, transparent billboards!, futuristic city in the style of “Frank Gehry”, high-tech, neon lights, translucency, octane render, vibrant, 4k, analog photo

giant spaceship floating over a green landscape!, lakes!, breathtaking, trees, hills, octane render, vibrant, 4k, panoramic, radiant light rays, in the style of “Thomas Kinkade”

factory!, nousr gold robot, assembly line!, in the style of “Jack Kirby”, octane render, vibrant, 4k

crystalline, spacestation!!, spaceship!!!, crew looking out giant windows, in the style of “Andre Norton”, planet!, stars!, octane render, vibrant, 4k

I Win

I can say with confidence the images generated from my handcrafted prompts are much better looking and also do a better job matching the text/concept of the ChatGPT prompts, but I also generated 50+ images in at least one case to get three really good ones and modified the prompts many times. What is “good?” That’s a great question, and there are AI’s that can be trained to help you make those decisions. ;)

Challenges

Know your artists! Some artists appear to have only done commercial or industrial art (or book covers/comics) so it is impossible to get samples without them looking like a weird poster or page of a comic, complete with fake lettering. Other pictures were great, just not what I asked for. For example, a known photographer will give you a portrait of an astronaut.

The most difficult imagery to create was the combination of astronauts and spaceships. Or really the combination of anything and anything. Compound images are often missing an element from what I have seen, so multiple runs are needed. That is why I like the command line and using batch scripts, so I can run multiple sets of commands (all in order) generating a ton of images to review later.

Why 512x512? You will run out of memory on larger sizes with the version of the software I used. You can always use an AI to scale the images up. There are several. Mirwaisse/SRCNN: Implementation of SRCNN in PyTorch (github.com) is good.

A Favorite Prompt

The system really likes making astronauts, but often leaves out their ship! Sometimes I think it loses track of the artist style, too, because of many cases of complete style change. This is still a good prompt. I think angelic is interpreted as “halo,” since I get many circular objects.

spaceman inside a spaceship, spaceman using control panels, stars!, in the style of “Aleksey Savrasov”, 4k, angelic

Some images from this prompt run within this command:

python scripts/txt2img.py --skip_grid --dpm_solver --ckpt ./models/ldm/stable-diffusion-v1/f222.ckpt --from-file scifi2v2.txt  --n_iter 7 --n_samples 1 --precision full --outdir ./outputs/scifi

More From Original Prompts

In the spirit of fairness, I re-ran the original prompts (7 times each). Again, they didn’t hit the prompt requirements, but I did get a few images that weren’t too bad. Actually, I totally win again! ChatGPT just needs some more training to handle prompt creation by people like me, who are new to stable-diffusion. Take writing prompts seriously.