xAI's Grok-1.5V model can now process images

Founded by Elon Musk and known as a rival to OpenAI, xAI takes the first Grok model that can process visual information even further. The new Grok-1.5V continues to develop as a versatile artificial intelligence model that can process not only texts, but also documents, diagrams, graphs, screenshots and photos. The release of this model comes just a few weeks after the company's previous release, the Grok-1.5 model, which was more proficient in coding and mathematics.

Highlights of Grok-1.5V

At the launch of the Grok-1.5V, several examples were given of how the model could be used in the real world. For example, you can show a photo of a flowchart and ask Grok to translate it into Python code, ask him to write a story from a drawing, or have him explain a “meme” you don't understand, aka a viral social media image. It can be difficult for anyone to keep up with the ever-changing content of the internet. Grok will help in such cases.

Grok-1.5V will soon be available to be experienced by xAI's early testers and existing users. However, the company did not give an exact timeline about the launch date of this new model. This, combined with Grok-1.5's ability to process longer contexts, allows the model to check data from more sources to better understand specific queries.

Additionally, xAI has also published a reference dataset called RealWorldQA. This dataset contains 700 images that can be used to evaluate AI models. Each item contains questions and answers that can be easily verified but may challenge multifaceted models. xAI claimed its technology received the highest score when it tested Grok with RealWorldQA against rivals such as OpenAI's GPT-4V and Google's Gemini Pro 1.5 models.

The new Grok-1.5V seems to further increase users' expectations from artificial intelligence technology. This model shows how far artificial intelligence can go not only in text-based tasks but also in understanding audiovisual data. In the future, the usage areas of such versatile artificial intelligence models are expected to expand.