Sber has upgraded its text-to-image AI model. Trained on a dataset of images, Kandinsky 3.1 has a different architecture at its core, which enhanced the quality of generation. Artists, designers, and bloggers have been the first to access Kandinsky 3.1 as part of a restricted group of users.
Alexander Vedyakhin, first deputy chairman of the Executive Board, Sberbank:
“Today marks exactly one year since the Kandinsky 2.1 version was released. We continue to develop our model that helps people generate new images and gives everyone a phenomenal arsenal for creativity. Compared to its predecessor, Kandinsky 3.1 is increasingly faster, user-friendlier, and more realistic. Moreover, this version understands both text-based and image-based prompts. Kandinsky 3.1 is an agile, versatile, and fully free tool that can turn any person into an artist and creator. New features will become available soon. As always, the neural network will be free of charge and operational on all kinds of devices.”
The model has improved its text prompt understanding, which makes dialogs intuitive, while speeding up image generation. Apart from texts, the neural network can now ingest virtual tips and supports high-resolution generation.
The most popular functions of the previous model have been carried over to the new model. Users can access image variations, combine text and images, create sticker packs, and maintain backgrounds while editing image parts (ControlNet).
Also, a new Kandinsky Video 1.1 model will be available in the near future to generate videos based on text descriptions. Sber’s team managed to significantly improve the quality of generation by increasing the volume of the training dataset of text-video pairs and architectural improvements to the model. The changes also made it possible to increase the video resolution by two times compared to Kandinsky Video 1.0.