Funny Stories

DALL·E has no inherent reality filter. This is intentional, because it is often used to render fantasy, or things like an avocado that is also a chair. You describe what you want, and it does its best to give you that by merging patterns of the individual elements that it has learned from its training sets. If you want both photo-realism and a reasonable approximation of the actual world, your best strategy is to let DALL·E do as much of the scene layout as possible. But even if you don’t intervene, some of DALL·E’s inherent biases can clash, and you get some, sometimes humorous, renderings of scenes out of step with what is possible, or advisable, in the real world.

Your best shot at reality is when you have one or two main characters, interacting with at most each other, and your description of the background, and any anonymous people, is very generic. DALL·E will first try to position your main character(s) in the foreground looking at the camera, and then fill in the rest of the environment to match this orientation and perspective. It starts diverging from reality when you are too specific about the environment, causing it to do some crazy things to match your description to how it has already decided to place your main characters. Also, if your environment contains objects like trains, bridges, and specific rooms, where DALL·E already has some built-in biases, these may conflict with realistic placements in the real world.

Bridges plus people are almost always a problem. After fighting this bias over many days, in which DALL·E would rearrange whole cityscapes to get people looking over one railing at a parallel river, I tried a simple experiment. I asked DALL·E to just render a stone bridge over a small river – nothing about the railings, people, or what’s on the banks. Just give me a bridge over a river, laying it out however you deem necessary. What I got was a small river oriented vertically between two grassy banks, with a stone bridge also running vertically up the middle of the river. At the far end toward the perspective vanishing point, the bridge descended into the water. DALL·E has a built-in problem with bridges and water.

Another humorous anomaly often occurs with fireplaces. When DALL·E rearranges your room to fit how it wants to place the characters, it treats fireplaces just like tables, or bookshelves – movable elements that can be rearranged. It is not sensitive to the fire part, and will often relocate a fireplace – with a blazing fire in it – into places that would surely burn the room down in the real world.

Since DALL·E treats trains like main characters (featured elements in scene composition), when your scene has both people and a train – which happens in most train station scenes – it is insensitive to the interaction between the train and the people. So luggage and passengers are often placed on the tracks with the locomotive barreling toward them. And in a humorous reversal of the bridge bias, DALL·E will often render the view out a running train window with the tracks perpendicular to the train.

DALL·E achieves photo-realistic depth perspective in a scene by rendering near objects larger than far objects, and gradually blurring the focus of the far objects. When you place anonymous characters in front of main characters, you sometimes confuse DALL·E, because it expects the main characters to be larger. So it will sometimes render the anonymous people as midgets – large heads with small bodies.

You really get into trouble when your scene has two main characters interacting with two anonymous characters in pairs – each main character interacting separately with one of the anonymous characters. That happened in the banner image above, where we also had a train. It took many re-renderings over two days to finally get this image right. Ilya and Katerinya were each supposed to be interacting independently with an anonymous colleague, before boarding a waiting train. Together with the text-AI, we tried to separate the two interactions in space – by changing the station, adding pillars, moving the tracks, moving the train – but each time DALL·E would snap back to all four characters in a close huddle doing the strangest of things with each other. All of these failed scenes were humorous, but the best was the banner image above, where Ilya’s colleague is shaking hands with Ilya’s back, Ilya is shaking hands with a third arm coming out of Katerinya’s side, and the train is about to crush them all, and their luggage.

My favorite, though, is a failed scene of Ilya being captured by four soldiers on horseback. In the iMovie, you will see five horses, one for each of the four soldiers and one for Ilya. I had to make this adjustment to eventually get the scene to render realistically. In the novel, there are only four horses, so Ilya has to ride with one of the soldiers. The novel text says that Ilya rode “doubled up behind one of the soldiers on the same horse.” ChatGPT-4o, in summarizing what it was about to submit to DALL·E, said something like “OK, Ilya is doubled over the fourth soldier’s horse.” No, I replied, he is fine, and fully conscious. He is doubled up behind the fourth soldier on his horse. “OK, got it. He’s doubled up on the fourth horse,” came the reply. Apparently the idiom ‘doubled up’ is not well represented in the training set for either ChatGPT or DALL·E. What I got was an image of four horses, with Ilya riding behind the fourth soldier, and another version of Ilya riding on top of the first Ilya’s shoulders. He was “doubled up.”