Science fiction, fantasy och allmäna nörderier – sedan 2007

The Un-Guide to Midjourney

Image by Magnus Dahl using Midjourney and Photoshop.

This is not a guide to Midjourney or any other generative AI tool. There are hundreds of tutorials and how-tos available, just a Google away. Go look at them to learn about parameters, commands, and such.

No, this is just me, Magnus Dahl, trying to understand my creative process working with Midjourney. I am struggling to articulate my thoughts on ”prompting” (a horrible word) and ”prompt engineering” (an awful expression), and often I think the best when I’m writing.

Let’s start with the horrible words. A ”prompt” is a text written by a human and given to an AI in the hope that it will return the desired output. When I, the human, write ”Painting of a hortensia, American modernism” in the Midjourney input field, I hope the AI will give me an image that looks like a painting of a hortensia.

Maybe something like this:

Painting of a hortensia in the style of American modernism

Or perhaps like this?

Painting of a hortensia in the style of American modernism

Midjourney is a random image generator. A random image generator that you can steer in the direction you want, but still a random image generator.

A prompt is a wish disguised as a computer system command. It is a manifestation of human intent, offered up to a machine. The word ”prompt” gives a false sense of control. The expression ”prompt engineering” is even more devious, as it hints that generative AI use is a science. Something you can control with mechanical precision. It is not. 

Creativity, even machine-aided, is not about control. It is about empathy and dialogue.

“Prompt engineering” is an expression of the human desire for control. We have created a machine that can do amazing things, so we must control it. The goal of prompt engineering is to reduce the amount of chaos in the AI output and make it predictable. But the methods of control we have today are based on hearsay, rumors, and sales speech. No one – not even the creators of the tools – fully knows what words, phrases, and strategies will actually work. Control is an illusion.

And it does not matter because creativity, even machine-aided creativity, is not about control. It is about empathy and dialogue. It is about giving and taking and sharing. 

Working with Midjourney is an associative process, an exchange of words and images between a human and a machine. It is organic, chaotic, and often non-intuitive. A stream of consciousness that is hard to explain to others.

But I will try.

Start with an idea

First, there is an idea. The idea can be a word, sentence, or paragraph. It can be a feeling, a memory, or just an impulse to create something, anything. 

Here’s an idea: a purple ladybug

Second, I write my idea into the Midjourney input field. When my fingers meet the keyboard, the idea changes. This transition from thought to prompt is fascinating to me. It is not unique to Gen AI; it happens when I write anything with any tool. My thoughts change as I write them down.

The difference when using Midjourney, ChatGPT, or any other AI tool with prompt-based input is that the change is directly linked to my knowledge of how the generative model works. I try to fit my idea into a mold that I, probably incorrectly, believe is the best way to interact with the AI.

I often challenge myself to write prompts that are as far from ”best practice” as possible, but this time, I failed. I just wrote a basic, boring prompt.

3D-movie animation of an evil pink and purple ladybug

Cute. But what will happen if I use my first hortensia image as a style reference with the ladybug prompt?

As the hortensia/ladybug renders, another idea suddenly comes to me: a supersonic blast in a clear sky.

Again, the words change a bit as I put them into Midjourney.

hand-drawn illustration of a supersonic blast in a cloudy sky

That is not how I imagined a supersonic blast, but ok. Now my hortensia/ladybug is done as well!

I like this one, even though it doesn’t look like any 3D animation I have seen. But maybe I can mix it with the supersonic image somehow? That could be interesting. But how? Well, on a whim, I use the picture above as a character reference and the supersonic one as a style reference.

After three variations, it turns out like this:

3D cartoon ladybug flying at supersonic speeds through a cloudy sky.

Shiny! And boring. Let’s rerun the prompt, add some –weird, and see what happens. 

3D cartoon ladybug flying at supersonic speeds through a cloudy sky. –weird 1500

The ladybug looks like it is made out of painted wood! What would a chair in the same style look like? 

Let’s find out.

photo of a wooden workshop chair in an empty artist’s studio, shot with a Fujica ST605.

I added a camera model just for fun. Fujica ST605 is a budget household camera from the 1970s. I keep a list of cameras from different eras to have something to work with. Sometimes, if you want your image to have a vibe of a specific period, it is easier to specify a camera model ubiquitous during that era than to use phrases like ”in the style of the 1970s” or whatever. Sometimes, not always. Random image generator, remember?

I’m unsure how much the Fujica affected the result, but I like the chair image. Very cool floor and lovely lighting. The chair in itself, though, is perhaps the most unsafe-for-kids piece of furniture I have ever seen.

But – I have no use for an image of a chair. I am trying to generate some sort of cartoon ladybug!

Fast forward 4 weeks

Suddenly, I need a picture of a chair. I remember the ladybug chair and look it up in my Midjourney archive. It is close to what I’m looking I’m but not spot on. So I do some experiments with the chair-picture. I use it again as a style reference, an image prompt, and a style reference. I do a lot of variations. Remixes. I try some different prompts. I won’t show all of them here, but after a while, I get this:

Professional studio photo of a wooden chair on a well-lit, white background. –stylize 300

Pretty nice. The hardest part was generating ok-looking legs.

Ladybug goes to space

So, what happened to the ladybug? I returned to it after a while, inspired by the loading screen from the 1983 Atari video game MULE, to make this picture:

C64 Loading screen, dithering

Why did I mix the ladybug with the loading screen from a 41-year-old Atari video game? I’m still trying to figure that out, but the idea came to me after a friend texted me about the game out of the blue.

I downloaded the image and opened it in Photoshop to remove those weird things in the sky and adjust the colors somewhat. 

The photoshopped version.

Next, I uploaded the modified image to Midjourney again, used it as a reference, and prompted away.

evil but cute pink and purple ladybug, 3D character concept art in the style of animated children’s movies

Wow – a spacefaring ladybug robot!

Frankenstein’s prompting

As you can see, my process is a FrankensteinFrankenstein’seas generating pictures generating ideas generating pictures… Everything is based on something else. Which, I guess, is the essence of generative AI? 

Questions like ”What prompt did you use to make this picture?” are largely meaningless because, most of the time, an AI image is not the result of a single, easy-to-show prompt. Instead, they are the result of long, meandering, associative brainstorming sessions between humans and AI.

A prompt can actually be misleading. Take this image of a robot, for example. The final prompt was ”Manga drawing of a mecha in combat”:

Manga drawing of a mecha in combat

While it is correct that the prompt that generated the image read that way, to truly understand the process, one must rewind almost two months.

When you look at someone’s AI artwork, it is essential to remember how much chance plays into the result. Using Midjourney as an artistic or creative tool is akin to action painting – where artists randomly throw paint on a canvas. In action painting, the artist chooses the paint, the canvas, and the location. The artist controls the setting, so to speak, but the end result is inherently random.

”Prompt engineering” is throwing paint over and over again until you get what you want. Frankly, it is not engineering at all, and we should stop calling it that. 

It’s not engineering, it’s not math, it is not coding, it’s not mechanics.

Let’s call what it is: creativity.

    Lämna ett svar

    Din e-postadress kommer inte publiceras. Obligatoriska fält är märkta *