Amazon introduces Nova, its new family of AI Models with text, image, and video generation capabilities

This range of AIs is accessible from AWS Bedrock and is optimized for use in 15 languages, including Spanish.
Imagen que muestra una obra creada con Amazon Nova Canvas de un dinosaurio en una taza de porcelana. A la izquierda de la imagen, que va dando paso a un degradado negro, se lee "Introducing Amazon Nova"
December 4, 2024
Copiar enlace

Amazon has taken advantage of its AWS re:Invent event to unveil Nova, a new series of multimodal generative AI models that will be available on AWS Bedrock, Amazon’s platform for AI development. This lineup includes four models for text generation (Micro, Lite, Pro, and Premier) and two models for creative content generation (Nova Canvas and Nova Reel).

Rohit Prasad, who is the Senior Vice President of General Artificial Intelligence at Amazon, explained: «At Amazon, we are running about 1,000 generative AI applications, and we have gained a comprehensive understanding of the challenges that application developers still encounter.

Our new Amazon Nova models are designed to assist both internal and external developers in overcoming these challenges, delivering powerful intelligence and content generation. These models also provide significant improvements in latency, cost-effectiveness, personalization, Retrieval-Augmented Generation (RAG), and autonomous capabilities».

Text Generation with Micro, Lite, Pro, and Premier

Amazon has developed four text generation models under Nova, optimized for 15 languages, including Spanish, with English being the primary language. Each model is tailored to meet the diverse needs and resources of its clients. These models are:

  • Amazon Nova Micro: This model focuses solely on text input and output, offering low-latency responses at a very affordable cost. It has a context window that can accommodate up to 128,000 tokens, equivalent to roughly 100,000 words.
  • Amazon Nova Lite: An inexpensive multimodal model that quickly processes inputs from images, videos, and texts. It features a context window of 300,000 tokens, which equates to about 225,000 words, 15,000 lines of computer code, or 30 minutes of footage.
  • Amazon Nova Pro: A high-capacity multimodal model that, according to Amazon, offers the optimal balance of accuracy, speed, and cost for a broad range of tasks. Its context window is identical to that of Lite.
  • Amazon Nova Premier: The most advanced of Amazon’s multimodal models, designed for complex reasoning tasks and to serve as a foundation for creating customized models.

Micro, Lite, and Pro models are already available to AWS (Amazon Web Services) customers. As for Amazon Nova Premier, it is scheduled for release in the first quarter of 2025.

Throughout 2025, Amazon plans to continue refining these models to enhance their capabilities. One of their objectives is to expand the context window for several of these AIs to support over 2 million tokens.

Image Generation with Nova Canvas and Video with Nova Reel

In addition to these four text models, Nova includes two creative content generation AIs that are already accessible. First, there is Nova Canvas, which specializes in creating and editing images based on text prompts and existing images. It is highly effective for tasks such as background removal and offers controls for color schemes and the final design of generated works.

Two images generated with Nova Canvas AI. The left one was created with the description 'a very elegant French restaurant' and the right one with the prompt 'Black and white photography, character study, multiple angles'
Images created with Amazon Nova Canvas

Meanwhile, Nova Reel can generate videos up to 6 seconds long in about 3 minutes, using written prompts or reference images. It also allows users to specify camera movement controls (such as pans, 360° rotations, zooms) through natural language inputs. Amazon aims to eventually extend the duration of videos generated by Nova Reel to up to 2 minutes.

Amazon is gearing up for the release of a “voice-to-voice” model and a “multimodal-to-multimodal” model within the Nova series by 2025.

Next year, Amazon faces the challenge of introducing a voice-to-voice model in the Nova series by the first quarter of 2025. According to Amazon, this model aims to revolutionize conversational AI by understanding spoken language in real-time, picking up on verbal and non-verbal cues like tone and rhythm, and providing seamless, human-like interactions with minimal delay.

In addition, Amazon is developing a model that can process inputs of text, images, audio, and video and generate outputs in any of these formats, anticipated to be available by mid-2025. This Amazon Nova model, featuring true multimodal-to-multimodal capabilities—or “any-to-any” modality capabilities—will simplify application development, enabling a single model to handle diverse tasks such as translating content across different media, editing content, and supporting AI agents that can comprehend and produce in all formats.

Photo: Amazon

Other articles related to

Published by

Content Manager in Marketing4eCommerce
Content Manager in Marketing4eCommerce, which translates to: writer, editor, and absolute fan of generating images with AI.

Stay up to date!

 
Únete a nuestro canal de Telegram

All you need to know!

Sign up for our newsletter and receive our best articles on eCommerce and digital marketing in your email for free.