Why Midjourney is way more frustrating than you expect

by | Mar 21, 2024

Why Midjourney is way more frustrating than you expect

Generating images on Midjourney can be really frustrating. 

You see stunning images on social media, and it seems like everybody else can whisper a dream to Midjourney and have it realised in moments. 

Meanwhile, you put in a prompt and get something wildly different to what you had in your head.

I’m going to let you in on a big secret.

Half of the images you see were happy accidents. The user had no initial vision in mind and stumbled upon something stunning. 

Nothing wrong with that. I rely on happy accidents a LOT.  But it can give new users the impression that by using a three-word prompt, Midjourney can reach into your brain and construct what you’re imagining. 

The other half of images involve a creative process you don’t get to see, leading you to believe they happened effortlessly on the first prompt. When in fact, there could have been a very long road of experimentation and prompt-crafting to get there. 

That’s what I want to talk about today. The creative process.

Because Midjourney is not a precision tool. 

It’s pretty difficult to dream something up and then get exactly what you had in your head from it. Especially if you’ve imagined something complicated with lots of different things going on in the image. 

Despite that, Midjourney is still the best AI image generator we have for realising your vision. 

And Midjourney has gotten a LOT better at handling complex prompts and multiple subjects than it used to be. If you have a specific image in your head that you want to create, you might have to compromise. But there are tips to help you get as close as you can. 

Let’s have a look at how I approach a complex and specific idea in Midjourney version 6.0.

When I’m developing a prompt, I like to break things down into subjects, concept, setting, and style. And then build up details in each area as I work. 

Say I wanted an image to represent people who are excited about AI vs people who are more fearful and resistant. 

I might imagine a robot working on a laptop, with two women observing. One woman is delighted with the robot, the other woman is horrified by it. 

Here’s how I’d break down my idea:

  • Subjects: two women and a small futuristic white robot doing work for them on a laptop
  • Concept: one woman is delighted with the robot, the other is horrified
  • Setting: office
  • Style: a natural looking photograph

Let’s type that in, pretty much as it is and see what we get. I’m going to use –ar 16:9 to get a widescreen aspect ratio instead of a square image. And –v 6.0 will make sure we’re using the latest version of Midjourney.

two women and a small futuristic white robot doing work for them on a laptop, one woman is delighted with the robot, the other is horrified, a natural looking photograph in an office setting –ar 16:9 –v 6.0

Output from Midjourney, 4 options for the prompt "two women and a small futuristic white robot doing work for them on a laptop, one woman is delighted with the robot, the other is horrified, a natural looking photograph in an office setting --ar 16:9 --v 6.0"

Ok, so none of these are what I’m looking for, but I can now use these images to start figuring out what I need to prompt to get what I’m looking for. 

I like the images where the robot is in the middle. That’s what was in my head, but I didn’t put it in my prompt, so let’s add that to the prompt. Notice also that both women have the same facial expression. 

Unfortunately, Midjourney (MJ) is just not great at giving different characters separate expressions – but not to worry, we can fix that later. 

So we’ll make some small tweaks to our prompt. I’ve bolded some key changes.

two women and a small futuristic white robot doing work for them on a laptop. Woman 1, on the left, has a huge smile on her face. Woman 2, on the right, is crying. The robot, in the middle, has a neutral expression. A natural looking photograph in an office setting –ar 16:9 –v 6.0

Output from Midjourney, 4 image options for the prompt "two women and a small futuristic white robot doing work for them on a laptop. Woman 1, on the left, has a huge smile on her face. Woman 2, on the right, is crying. The robot, in the middle, has a neutral expression. A natural looking photograph in an office setting --ar 16:9 --v 6.0"

Great, now we have our subjects where we want them. 

But I’m noticing that MJ skews most of the subjects quite young. Also, while MJ has gotten way better at including diverse characters, it’s pretty random. So we could be more specific in our prompt. I like the top right image, so I’m going to prompt for subjects like that. 

I also want to have a consistent look, and one that isn’t quite so much like stock photography. You can look up all kinds of camera models and lenses, but I’m going to keep it simple. I own a Sony ZV-1, and I like taking photos with the focal length set to 50mm, so I’ll put that in my prompt. 

You might also notice there are some janky looking hands in those images too. MJ V6 has gotten better at a lot of things, but seems to have gotten worse at consistently good hands. Not much we can do about that in our prompt 🙁

Here’s what we have now:

two women and a small futuristic white robot doing work for them on a laptop. 40-year-old black woman, on the left, has a huge smile on her face. 40-year-old caucasian woman, on the right, is crying. The robot, in the middle, has a neutral expression. In a modern and minimalist office setting. Taken with a Sony ZV-1 at 50mm focal length  –ar 16:9 –v 6.0

Output from Midjourney, 4 image options for the prompt "two women and a small futuristic white robot doing work for them on a laptop. 40-year-old black woman, on the left, has a huge smile on her face. 40-year-old caucasian woman, on the right, is crying. The robot, in the middle, has a neutral expression. In a modern and minimalist office setting. Taken with a Sony ZV-1 at 50mm focal length  --ar 16:9 --v 6.0"

If you compare this to the results from the first prompt, you can see we’re already getting much more consistent results, and definitely much closer to what I had in my head. 

Although we’re not getting the contrasting expressions, I want to see if I can add physical gestures that will push that concept even further. 

At this point, MJ started to get confused with all the different subjects and details. I trimmed any unnecessary words from our prompt to simplify, and managed to get some pretty good results. The image grid wasn’t consistently correct with all the details, but there were usually some contenders in there.

two women and a small futuristic white robot doing work for them on a laptop. 40-year-old black woman, left, huge smile on her face and giving the thumbs up sign. 40-year-old caucasian woman, right, crying with her hands on her head. The robot, in the middle, has a neutral expression. In a modern and minimalist office setting. Taken with a Sony ZV-1 at 50mm focal length –ar 16:9 –v 6.0

Output from Midjourney, 4 image options for the prompt "two women and a small futuristic white robot doing work for them on a laptop. 40-year-old black woman, left, huge smile on her face and giving the thumbs up sign. 40-year-old caucasian woman, right, crying with her hands on her head. The robot, in the middle, has a neutral expression. In a modern and minimalist office setting. Taken with a Sony ZV-1 at 50mm focal length --ar 16:9 --v 6.0"

That bottom right image is starting to shape up nicely. 

Now I’m noticing that the images that I like have more depth to them, and have warm darker colours. So I’m going to add that to our office setting part of the prompt. 

You can see clearly that the things we haven’t been specific about are being randomly generated by MJ – look at all the different backgrounds in the image grid above. This is great because it gives me inspiration. So my new prompt is:

two women and a small futuristic white robot doing work for them on a laptop. 40-year-old black woman, left, huge smile on her face and giving the thumbs up sign. 40-year-old caucasian woman, right, crying with her hands on her head. The robot, in the middle, has a neutral expression. In a modern and minimalist office setting, the office stretches back into the distance. Warm, dark colours, with orange lighting in the background. Taken with a Sony ZV-1 at 50mm focal length –ar 16:9 –v 6.0 

Now we are getting pretty consistent results, but we’re asking quite a lot of MJ with this prompt, so you can see it doesn’t always deliver everything we asked for. 

But that second image looks good, it even has contrasting expressions on the two women.

Most of the images don’t have the contrasting expressions though, so let’s say we wanted to use the image on the bottom left of the image grid. Upscale that image, and then hit the ‘Vary (Region)’ button. Now use the lasso tool to select the woman’s face, and you can input a new prompt. I kept it pretty simple, but I included a negative prompt to say we didn’t want any smiles or happiness. Anything after the –no, MJ will try to avoid generating. 

very unhappy woman –no smile happy

Midjourney image showing that the woman's face on the right has been changed from a happy expression to an unhappy one.

Now we have something very close to my original vision. 

And hopefully you have a better idea of how to build up a complex prompt, and what to experiment with. 

It’s important to remember that it’s not an exact science. But by working up your prompt slowly, and experimenting as you go, you can get SO much closer to your creative vision. 

I do want to point something out though, because I see a lot of content on social media that makes it look as though it’s super easy to do this in just a few steps. 

Usually that’s because the process has been massively simplified for social media. 

There are 6 example images in this article. I generated closer to 60. Trying things out, tweaking the prompt, and hitting the reroll or the variations button. 

So don’t be fooled by simplified tutorials like this, or in fancy LinkedIn carousels. 

If you have a complex idea, and you want to get close to specific details you have in your head, it takes work. I say this because there’s nothing more frustrating than thinking everyone else has magic prompts that get results quickly. 

That’s just not how it works. 

That’s not to say that every image I create in MJ takes this amount of effort – this is the process for generating an image from a very specific and complex idea. 

Most of the images I create in Midjourney don’t require this level of work.

The images for my blog, or for my emails are more broadly themed, and I don’t need them to be specific or complex. I can pop a quick prompt in, choose one I like and be done. 

But when you need to, you can certainly push the tool further. 

Break down your idea into subjects, concept, setting, and style, and then work up your prompt from there.

This post was originally published in Rise Above The Blah: AI Edition weeks before it appeared here. To get my best content hot off the press sign up here.

Frank Prendergast

Frank Prendergast

I've over two decades of experience helping businesses with their online presence. I'm also the owner of the most-talked-about moustache in the marketing world and I'm the Frank half of the award-winning digital marketing team Frank and Marci. Follow on LinkedIn