Alibaba upgrades its Qwen image model with visual and semantic image editing

3 weeks ago 4
ARTICLE AD BOX

Alibaba has updated its Qwen image model with new editing tools for both visual and semantic changes. 

Qwen-Image-Edit is built on Alibaba's 20-billion-parameter Qwen-Image model and combines two processing strategies: Qwen2.5-VL handles semantic control, while a Variational Autoencoder (VAE) manages the visual appearance. Alibaba hasn't shared detailed technical information about the architecture yet.

According to Alibaba, the system can handle everything from simple touch-ups to complex semantic edits. Appearance editing lets users change specific areas while keeping the rest of the image untouched. Semantic editing makes it possible to modify pixels across the entire image, but the main subject stays consistent.

Video: Alibaba

Ad

THE DECODER Newsletter

The most important AI news straight to your inbox.

✓ Weekly

✓ Free

✓ Cancel at any time

Two editing modes for different workflows

For semantic editing, Alibaba demonstrates how the model can create new IP content featuring its Capybara mascot. Even when most pixels change, the character remains recognizable.

 as a painter with an easel, a chef with vegetables, a guitarist, a magician in a tailcoat, a basketball player, a gardener with a watering can, an astronaut in a space suit, and a ballerina in a tutu.Qwen Image Edit generates new versions of the Capybara mascot that can be used as stickers in messenger apps and other formats. | Image: Alibaba

Other use cases include generating new perspectives with 90- or 180-degree object rotations and using style transfer for avatar creation, such as converting portraits into Studio Ghibli-style images.

 toddler facing forward and in profile, golden dog facing forward and from the side, black raven facing forward and from behind on a branch, lion in profile and from behind on a rock.The model generates new viewpoints for people, animals, and objects. | Image: Alibaba

Qwen Image Edit can also add signs with realistic reflections, remove stray hairs, change letter colors, and edit backgrounds or clothing.

 on the left, the original scene; on the right, the same scene with an orange wooden sign added.Qwen Image Edit places a wooden sign reading "Welcome to Penguin Beach" in front of a penguin colony and generates natural shadows. | Image: Alibaba

Bilingual text editing with step-by-step correction

One of Qwen Image Edit's main strengths is its ability to edit text in both Chinese and English. The system can add, remove, or change text directly in images while preserving the original font, size, and style.

 LeftQwen Image Edit updates Scrabble tiles from "Health Insurance" to "Financial Planning," maintaining the original look. | Image: Alibaba

Users can draw bounding boxes around incorrect or unwanted text. The model then updates those marked areas. While it sometimes struggles with rare or unusual characters like "稽," users can make step-by-step edits, marking specific spots and having the model refine the results until they are satisfied.

Recommendation

Two Chinese calligraphy texts on yellowish paper side by side, with the image on the right showing corrected characters compared to the original on the left.The tool replaces incorrect characters and lets users directly mark the areas that need changes. | Image: Alibaba

Alibaba says Qwen Image Edit delivers state-of-the-art performance on public image editing benchmarks, though it hasn't shared specific numbers. The model is available through Qwen Chat's "Image Editing" feature and can also be found on Github, Hugging Face, and Modelscope.

Qwen Image Edit reflects just how quickly targeted image editing and text rendering are advancing. Until recently, it was difficult for AI to change only specific parts of an image without disrupting everything else.

Black Forest Labs has also entered the space with Flux.1 Context, a model that combines text-to-image generation and image editing. But Flux.1 Context still shows visible artifacts in longer editing chains and sometimes has trouble handling prompts accurately.

Read Entire Article
LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.