It’s breathtaking. It’s a digital masterpiece. Why do its hands look like that?
In July 2022 OpenAI, an artificial intelligence (AI) company, introduced DALL-E 2, one of the first AI image generators widely available to the public. Users could type in a prompt—anything from “Beyoncé eating pizza” to “a Renaissance portrait of a poodle” to “the Statue of Liberty skateboarding”—and DALL-E 2 responded with a corresponding image set. DALL-E 2, however, created images that were imperfect, often distorted or unrelated to the user’s prompt. And it had competition: about the same time, two other AI companies, Stability AI and Midjourney, both released their own image-generating AI programs. Stability AI launched Stable Diffusion, and Midjourney introduced a self-named tool. By August, Midjourney’s AI image generator was so advanced that one of its images won an art contest at a state fair.
But when users input prompts that included people into any of these generators, they started to notice a recurring bug. Like many beginning artists, the AI tools couldn’t draw hands.
An AI-generated hand might have nine fingers or fingers sticking out of its palm. In some images hands appear as if floating, unattached to a human body. Elsewhere, two or more hands are fused at the wrists.
Why?
There are a few reasons that AI struggles with hands and fingers. One is, simply, that hands are a small part of the human body. In real photographs of people, hands aren’t generally the focus. Notably, AI programs tend to have the same issues with human teeth and ears that they do with hands. AI-generated teeth are often small, overcrowded, and even pointed, while ears are frequently depicted without lobes. Hands, teeth, and ears are all facets of a human body that are both small and highly variable: when scanning a photograph of a person with a missing tooth, for instance, an AI may conclude that all smiles have that same gap. In a January 2023 interview with BuzzFeed News, a spokesperson from Stability AI explained that “within AI datasets, human images display hands less visibly than they do faces.” To successfully depict hands and fingers, AI would need more reference photos with hands as the main focus.
Another issue is that AI doesn’t actually know what a hand is. In two-dimensional images, hands can appear in dozens of different positions: waving, flexing, holding an object, clenching a fist, or poking out of a pants pocket, partially hidden from view. Humans know that these visual discrepancies illustrate how a hand works. AI, without access to the three-dimensional world, knows only how a hand appears. Identifying a fist, thumbs-up, or peace sign as a hand is an impressive feat for AI, and we can hardly blame it for assuming a real hand could be a combination of the three.
Some users have found the quirks of AI-generated hands to be a feature, not a bug. Often, the anomalies serve as a quick way to distinguish between authentic images and AI-generated pictures: a fake image of former U.S. president Donald Trump being arrested, for instance, betrays itself as an AI-generated image thanks to a police officer’s hand melting into Trump’s body. The same holds true for photos of an alleged “extreme sunburn competition,” in which one competitor’s fingers look more like hot dogs than digits; another contestant’s hand has at least seven interlocking fingers. “Looking at gnarled A.I. hands,” The New Yorker wrote in March 2023, “we fall into the uncanny valley and experience a visceral sense of disgust.…The machine’s failure is comforting, in a way.” Perhaps AI can’t understand human hands, The New Yorker and BuzzFeed News have wondered, because it can’t understand what it is like to be human.
But even if AI’s struggle with hands can be seen as a positive, the problem may not persist for much longer. In March 2023 Midjourney released an update to its program intended to make its hands more realistic. Experts suspect Midjourney adjusted its datasets to prioritize clearer images of hands and deprioritize images where hands are hidden or only partially visible. Though the resulting images still aren’t perfect—the aforementioned image of Trump’s arrest was generated after the update—users generally agree that they have improved. As artificial intelligence companies compete to have the best image generator on the market, it is likely that DALL-E, Stable Diffusion, and the rest will follow suit. It’s a race to the perfect artificial hand.