Trains, Bridges, and Rooms

There are a few persistent rooms, one pivotal bridge, and many trains in this iMovie. (When you read the novel, you will understand why there are so many trains.) DALL·E does a reasonable job laying out the background of a scene, if you let it compose the scene according to its framing biases. A problem arises when you need to show the same background location across multiple scenes, so the viewer will recognize it as the same place. If you let DALL·E have its way, you will get a different looking location each time. So you have to first build a characterless image of the background location – often by layering in one object at a time – that you can feed as input to DALL·E when it comes time to populate the scene with characters. This acts as a constraint on how it lays out the scene to match the familiar location. But DALL·E will only honor your constraints if the way it wants to position the characters fits your pre-established contours. If not, it still renders your background, but then hallucinates duplicate or new objects, like desks, tables, chairs, and windows to go with where it wants to put the characters. This can sometimes be absurd, such as when it relocates a fireplace into the middle of a set of bookshelves.

There are two persistent locations in the iMovie: Ilya’s office and Katerinya’s apartment at 10 Tallinnskaya St. The apartment shows up a lot. DALL·E proved to be completely incompetent at preserving the identity of room locations in a single building floor, or even correctly rendering them on a single try. And this was before characters were added. If you watch the iMovie before reading the novel, you may be surprised that 10 Tallinnskaya St. is not open-architecture, as in the iMovie. The only way I could get DALL·E to consistently render the apartment interior from scene to scene was to take it out of the business of rendering internal walls and doorways. So the apartment became one big room – and even this was a struggle. It’s a nice touch, though. If I had thought of this at the beginning, I might have written the novel this way.

There is only one bridge in the iMovie, but I discovered a strange rendering bias that DALL·E has for bridges. It almost always places the bridge parallel to the waterway it is supposed to cross, rather than perpendicular (and thus it doesn’t cross the waterway). It also wants there to be only one pedestrian railing on the bridge, as opposed to one railing on each side of the pedestrians. ChatGPT-4o surmises this is because DALL·E’s training set is dominated by images of urban promenades next to bodies of water. Whatever the reason for this, DALL·E would not give me a bridge that crosses the urban river in the novel. I had to fool it by first asking it to render a photo-realistic image of an actual bridge – the Lars Anderson Bridge over the Charles River at Harvard. Because there are lots of images of this bridge in its training set, it gave me a real bridge crossing an urban river. I then successively changed the facade of the bridge, and the cityscape on either shore, and added railings and pedestrians. It still insisted on only one balustrade railing, usually in front of the pedestrian looking over the side. If instructed to put a railing behind the pedestrian as well, it would move the one balustrade behind the pedestrian, and suspend the person in the air in front of it. I had to resort to some manual compositing to get two railings for the bridge scenes.

Trains, it seems, have some very rigid rendering biases for DALL·E. It takes a train to be a featured object in a scene, and thus always renders the locomotive engine-first, moving toward the camera. You can angle the track with the train to approach the viewer on the left or right, but you can never render the train moving away from the viewer. This posed a particular problem for the long sequence of scenes involving west and then east train travel. The narrative naturally tells the viewer/reader where the train is going, not where it has been. But with DALL·E, you can’t render a scene in which you see what is ahead of the train (a mountain, a section of unplowed track). So I had to resort to alternating scenes in which the track, without any train, reveals what’s ahead, then the next scene shows the train arriving at that target. To give a sense of eastward vs westward travel, I tried to manipulate the angle of the sun, but hit another hard constraint. DALL·E always puts the shadow cast by the train on the side closest to the viewer, so it arbitrarily relocates the sun to make this happen. Google’s Gemini has no problem rendering trains moving away from the viewer, so this appears to be a DALL·E specific problem. I briefly looked into switching, but found that I preferred OpenAI’s rendering for many other reasons (much easier to control than Gemini).