The capabilities of AI tools to manipulate images are expanding, as evidenced by the latest research paper showcasing an impressive system called DragGAN. Although still in the research phase, this system allows users to effortlessly change the appearance of elements in a picture simply by dragging them.
At first glance, this may not seem overly exciting, but the examples provided by the research team demonstrate the system’s remarkable capabilities. Users can modify the dimensions of a car or transform a smile into a frown with a simple click and drag. The system even enables the rotation of a picture’s subject as if it were a 3D model, allowing changes in the direction someone is facing. One demo showcases the adjustment of reflections on a lake and the height of a mountain range with just a few clicks.
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
paper page: https://t.co/Gjcm1smqfl pic.twitter.com/XHQIiMdYOA
— AK (@_akhaliq) May 19, 2023
While the team’s homepage, where these videos are hosted, has experienced crashes due to the overwhelming traffic generated by Twitter users (particularly @_akhaliq, an AI enthusiast who highlights interesting AI papers), the research paper can be accessed on arXiv for further exploration.
According to the researchers, the true significance of this work lies not only in the image manipulation itself but also in the user interface. While AI tools like Generative Adversarial Networks (GANs) have been able to generate realistic images for some time, most methods lack flexibility and precision. In previous approaches, requesting an AI image generator to “create a picture of a lion stalking through the savannah” would yield a result, but it might not precisely match the desired pose or requirements.
The DragGAN model offers a promising solution to this limitation. The user interface closely resembles traditional image-warping techniques, but instead of merely distorting existing pixels, the model generates the subject anew. As the researchers explain, their approach can even “hallucinate occluded content, like the teeth inside a lion’s mouth, and can deform following the object’s rigidity, like the bending of a horse leg.”
It is important to note that this is currently a demonstration, and a comprehensive evaluation of the technology is difficult at this stage. For instance, the realism of the final images remains uncertain based on the low-resolution videos currently available. Nevertheless, the DragGAN system exemplifies another step towards making image manipulation more accessible and intuitive through AI-driven advancements.