JanusCoder unites programming and visual design in one multimodal system

5 hours ago 1

ARTICLE AD BOX

JanusCoder is a new AI model that combines code generation and visual output in a single system. The goal is to close the gap between text-based programming and visual representation, so developers can build apps that blend code and visuals without having to switch tools.

Most AI models have treated coding and visuals as separate tasks, forcing developers to rely on different solutions for each job. Researchers from Hong Kong, China, and the US created JanusCoder and its variant, JanusCoderV, to streamline all these functions under one unified interface.

Visual programming interface for web UI editing, visualizations, chart code, and animations.

Instead of juggling separate models for chart-to-code, web UI building, or animation, JanusCoder handles it all in one place. This unified approach makes it easier to keep things consistent, like using the same color palette throughout a project.

JanusCoder supports several programming languages and can write code for Matplotlib plots, interactive web apps, scientific demos, and mathematical animations. It works with both text prompts and visual input, like screenshots or diagrams, and turns them into working code.

THE DECODER Newsletter

The most important AI news straight to your inbox.

✓ Weekly

✓ Free

✓ Cancel at any time

How JanusCoder was trained

The model is trained on the JanusCode-800K dataset, which the team says is the largest multimodal code intelligence dataset so far. They built it using a custom toolkit that combines different strategies for generating and improving training data.

Text-centered 50.9% vs. vision-centered 49.1% shares in JANUSCODE-800K.

A big part of JanusCoder’s approach is its use of cross-domain learning. Skills from one area can help in another, for example, training with R code can improve results for Mathematica problems, and outputs from Python visualizations can boost chart-to-code accuracy.

But just running code isn’t enough to guarantee good visuals. To fix this, the researchers built a quality control process that uses vision-language models to check four things: task relevance, completeness, code quality, and visual clarity. Only the best samples make it into the final dataset.

How JanusCoder performs against commercial models

In tests, JanusCoder models with 7B to 14B parameters match or outperform leading commercial models with much larger sizes. On Python visualization benchmarks, JanusCoder-14B hits a 9.7 percent error rate - right up there with GPT-4o.

JanusCoderV stands out in chart-to-code tasks, even beating GPT-4o on ChartMimic, but it’s not always ahead on web page generation. Still, when it comes to generating web pages from screenshots and building scientific demos, JanusCoder makes big gains in both visual quality and code structure.

Recommendation

The models also hold their own in general coding tests, and even surpass some data visualization specialists like VisCoder.

JanusCoder combines WebUI editing, visual artifacts, demo generation, dynamic visuals, Chart2Code, and animation generation.

Experiments show the importance of the model’s design. If you remove any data categories from training, performance drops, which highlights the value of cross-domain learning. Skipping the visual quality checks also leads to worse results. The team found that their approach works across different base models, from Qwen3 to InternVL, and across various sizes. All benefit from the JanusCode-800K dataset.

JanusCoder is open source on GitHub and is intended to be a standard for multimodal code intelligence. It’s aimed at developers who want to build complex visual apps without having to jump between multiple AI tools.

This fits into a bigger trend in AI. Companies like Meta are taking a similar direction with new models that go beyond just generating correct code, they're also built to understand how that code fits into real-world applications.

Read Entire Article