Facial Expressions

I alternated between two methods for describing a scene to the renderer. The first was to give the AI the actual section of text from the novel and let it set up the details. Sometimes, though, the scene was too complicated for the renderer to get right, so I had to play director and explicitly lay out the scene and the camera angles, position the characters, and describe their expressions and emotions. The first method was preferable by far – when you could get it. The reason is that by giving the AI several paragraphs from the novel that set the tone and the events that lead up to the scene, if it can handle it at all, it is very good at absorbing the emotional nuances of the novel, rendering the faces of not just the foreground characters, but also the out-of-focus background extras, with subtle expressions that would be otherwise hard to describe in text.

The scene in the banner image above is an example of this. The renderer actually could not handle this scene directly from the novel. It took a lot of “directoring” to frame this scene with the main characters in the mid-ground behind a window, and anonymous characters in the foreground. I took many shots at this, from both sides of the glass, trying to explicitly describe the discreet, passing interest of the pedestrians in the couple in the window. Without the novel text, I got a lot of alarming, voyeuristic stares from passersby. Once I managed to get a stable framing, I added the novel text back in to convey the aspirational, as opposed to voyeuristic, interest. And voila! Every character, including the anonymous background extras reflected through the glass, has an appropriate expression which conveys the mood of the scene. It’s a good thing that Katerinya is rendered from behind, because her mis-rendering would almost certainly have cost me this wider scene success.

In many cases, I don’t know what words I would have used to describe some of the spot-on emotions that DALL·E figured out from the broader novel text. Another example of this is the banner image from the parent of this page – the one with Ilya and Katerinya being presented at a formal Christmas party. This was a first-shot, straight from the novel rendering. I worried a lot about this scene, because it is pivotal to the novel, reflecting a change in Katerinya’s bearing that first becomes evident in this scene, as she rises to the occasion of her formal coming-out. So I gave the AI several paragraphs from the novel, both before and after the scene, to create the abstract concept. I was particularly worried about how Katerinya’s facial identity would hold up, let alone whether she would have the appropriate expression. The AI did its own thing, completely disregarding the hosts who were supposed to be greeting them and the footman who was announcing them, and putting the couple right in front of the camera with perfect identities and the perfect expressions for the moment.

This is the positive side of facial rendering. The negative side is that DALL·E struggles with relating a foreground character’s attention and gaze to other people and objects in the scene. Because of the foreground/facing-the-camera bias, it wants the character to look at the camera, so sometimes two people sitting at a table, nominally having a conversation, will be looking past each other. Their emotional expressions are correct, reflecting the content of the conversation, but there is no eye contact. You’ll see a lot of this.