Amazon has taken advantage of its AWS re:Invent event to unveil Nova, a new series of multimodal generative AI models that will be available on AWS Bedrock, Amazon’s platform for AI development. This lineup includes four models for text generation (Micro, Lite, Pro, and Premier) and two models for creative content generation (Nova Canvas and Nova Reel).
Rohit Prasad, who is the Senior Vice President of General Artificial Intelligence at Amazon, explained: «At Amazon, we are running about 1,000 generative AI applications, and we have gained a comprehensive understanding of the challenges that application developers still encounter.
Our new Amazon Nova models are designed to assist both internal and external developers in overcoming these challenges, delivering powerful intelligence and content generation. These models also provide significant improvements in latency, cost-effectiveness, personalization, Retrieval-Augmented Generation (RAG), and autonomous capabilities».
Amazon has developed four text generation models under Nova, optimized for 15 languages, including Spanish, with English being the primary language. Each model is tailored to meet the diverse needs and resources of its clients. These models are:
Micro, Lite, and Pro models are already available to AWS (Amazon Web Services) customers. As for Amazon Nova Premier, it is scheduled for release in the first quarter of 2025.
Throughout 2025, Amazon plans to continue refining these models to enhance their capabilities. One of their objectives is to expand the context window for several of these AIs to support over 2 million tokens.
In addition to these four text models, Nova includes two creative content generation AIs that are already accessible. First, there is Nova Canvas, which specializes in creating and editing images based on text prompts and existing images. It is highly effective for tasks such as background removal and offers controls for color schemes and the final design of generated works.
Images created with Amazon Nova Canvas
Meanwhile, Nova Reel can generate videos up to 6 seconds long in about 3 minutes, using written prompts or reference images. It also allows users to specify camera movement controls (such as pans, 360° rotations, zooms) through natural language inputs. Amazon aims to eventually extend the duration of videos generated by Nova Reel to up to 2 minutes.
Amazon is gearing up for the release of a “voice-to-voice” model and a “multimodal-to-multimodal” model within the Nova series by 2025.
Next year, Amazon faces the challenge of introducing a voice-to-voice model in the Nova series by the first quarter of 2025. According to Amazon, this model aims to revolutionize conversational AI by understanding spoken language in real-time, picking up on verbal and non-verbal cues like tone and rhythm, and providing seamless, human-like interactions with minimal delay.
In addition, Amazon is developing a model that can process inputs of text, images, audio, and video and generate outputs in any of these formats, anticipated to be available by mid-2025. This Amazon Nova model, featuring true multimodal-to-multimodal capabilities—or “any-to-any” modality capabilities—will simplify application development, enabling a single model to handle diverse tasks such as translating content across different media, editing content, and supporting AI agents that can comprehend and produce in all formats.
Photo: Amazon
Your email address will not be published. Required fields are marked *
Δ