New small language model from Microsoft that can look at images and tell their content

Microsoft announced the new version of its small language model called Phi-3, which can look at images and tell what they contain. Being a multi-mode model, Phi-3-vision can read both text and images and is best used on mobile devices.

Microsoft says Phi-3-vision, available now in preview, is a 4.2 billion-parameter model (parameters indicate how complex a model is and how much its training understands) that can do common visual reasoning tasks like asking questions about graphics.

But Phi-3-vision is much smaller than other vision-focused AI models, such as OpenAI's DALL-E or Stability AI's Stable Diffusion. Unlike these models, Phi-3-vision does not create an image, but can understand what is in the image and analyze it for the user.

The software giant announced the Phi-3 in April with the launch of the Phi-3-mini, the smallest Phi-3 model with 3.8 billion parameters. There are two more members of the Phi-3 family: Phi-3-small (7 billion parameters) and Phi-3-medium (14 billion parameters).

AI model developers are introducing small, lightweight AI models like Phi-3 as demand for more affordable and less compute-intensive AI services grows. Smaller models can be used to power AI features on devices like phones and laptops without needing to take up too much space in computer memory. In addition to the Phi-3 and its predecessor, the Phi-2, Microsoft has released other smaller models. Orca-Math, a math problem-solving model, reportedly answers math questions better than larger counterparts like Google's Gemini Pro.

Microsoft adds members of the Phi-3 family to Azure's model library

Phi-3-vision is now available in preview. Other members of the Phi-3 family (Phi-3-mini, Phi-3-small, and Phi-3-medium) are now available through Azure's model library.