Danish Kapoor
Danish Kapoor

Google Gemini 1.5 Pro will now be able to listen to audio files

Google's innovations in the field of artificial intelligence are non-stop. Gemini 1.5 Pro, which was recently announced at the Google Next event, is the first model to be made available to the public through Google's Vertex AI platform. This innovation, introduced in February, draws attention especially with its ability to understand audio files and extract information from them. Now users will be able to upload audio files from earnings calls or videos in a way the model can understand; This means speeding up transactions without the need for a written transcript.

Gemini 1.5 Pro is described as the middle-weight model of the Gemini family and surpasses even the most powerful member of the family, Gemini Ultra, in terms of performance. It stands out for its abilities to understand complex instructions and eliminate the need to fine-tune models, Google claims.

However, use of Gemini 1.5 Pro is currently limited to users with access to Vertex AI. Most people encounter Gemini language models mostly through the Gemini chatbot. Standing out with its powerful capabilities and capacity to understand long commands, Gemini Ultra strengthens the Gemini Advanced chat bot; However, it lags behind Gemini 1.5 Pro in terms of speed.

Among Google's major AI models, not only Gemini 1.5 Pro but also Imagen 2 is being updated. Imagen 2, a text-to-image conversion model, supports Gemini's image creation capabilities and adds inpainting and outpainting features that allow users to add or remove elements from images. Additionally, the SynthID digital watermark feature is available for use on all images created through Imagen models. SynthID adds a watermark that is invisible to the viewer but signals its origin when examined through a detection tool.

Imagen's new features, specifically inpainting and outpainting, are already available in other text-to-image models, such as Stable Cascade by Stability AI and Generative AI by Getty's iStock. These features are available to a wide range of consumers in newer Samsung Galaxy phones.

Google also shares with the public the way to increase the timeliness of the answers by supporting AI answers with up-to-date information through Google Search. The fact that the answers produced by large language models are not always up to date is sometimes a conscious choice; Google is deliberately preventing Gemini from responding to questions about the 2024 US elections.

Gemini has recently come under criticism for producing photos featuring historically inaccurate people. However, Google's constant innovations and developments in artificial intelligence continue to push the boundaries of technology.

Audio files are now more understandable with Google's artificial intelligence innovations

In addition to Gemini 1.5 Pro offered by Google through the Vertex AI platform, developments in artificial intelligence in other areas also attract attention. In particular, the ability to extract information from audio content expands the usage areas of artificial intelligence and enriches the user experience. These developments reinforce Google's leadership in technology and artificial intelligence, while also increasing the opportunities it offers to users and the business world.

Danish Kapoor