Microsoft has released the new Phi-3.5 models:
- Phi-3.5-MoE-instruct,
- Phi-3.5-mini-instruct, and
- Phi-3.5-vision-instruct.
The Phi-3.5-mini-instruct, with 3.82 billion parameters, is built for basic and quick reasoning tasks.
The Phi-3.5-MoE-instruct, with 41.9 billion parameters, handles more advanced reasoning.
The Phi-3.5-vision-instruct, with 4.15 billion parameters, is designed for vision tasks like image and video analysis
Phi-3.5 MOE-instruct
Phi-3.5-MoE instruct is a 42-billion-parameter open-source model.
It features 16 experts, with two being activated during generation, and has 6.6 billion parameters engaged in each inference.
It demonstrates significant improvements in reasoning capabilities, outperforming larger models such as Llama 3.1 8B and Gemma 2 9B across various benchmarks.
The model supports multilingual applications., and extends its context length to 128,000 tokens.
The specific languages covered however are unclear.
Phi-3.5-MoE falls slightly behind GPT-4o-mini but surpasses Gemini 1.5 Flash in benchmarks.
The model is intended for use in memory and compute-constrained environments and latency-sensitive scenarios.
Key use cases for Phi-3.5-MoE include ;
- general-purpose AI systems,
- applications requiring strong reasoning in code,
- mathematics,
- logic, and
- as a foundational component for generative AI-powered features
Phi-3.5-mini-instruct
With 3.8 billion parameters, this model is lightweight yet powerful. It outperforms larger models such as Llama3.1 8B and Mistral 7B.
It supports a 128K token context length, significantly more than its main competitors, which typically support only up to 8K.
As option in long-context tasks such as document summarisation and information retrieval, it outperforms several larger models like Llama-3.1-8B-instruct and Mistral-Nemo-12B-instruct-2407 on various benchmarks.
The model is intended for;
- commercial and research use,
particularly in memory and compute-constrained environments,
- latency-bound scenarios, and
- applications requiring strong reasoning in code, math, and logic.
Phi-3.5-vision-instruct
Phi-3.5 Vision is a 4.2 billion parameter model and it excels in multi-frame image understanding and reasoning.
It has shown improved performance in benchmarks like MMMU, MMBench, and TextVQA, demonstrating its capability in visual tasks.
It even outperforms OpenAI GPT-4o on several benchmarks.
The model integrates an image encoder, connector, projector, and the Phi-3 Mini language model.
With a context length of 128K tokens it supports both text and image inputs and is optimised for prompts using a chat format
The model was trained over 6 days using 256 A100-80G GPUs, processing 500 billion tokens that include both vision and text data.
The Phi-3.5 models are now available on the AI platform Hugging Face under an MIT license.
They are accessible for a wide range of applications.
The Phi-3.5 models release aligns with Microsoft’s commitment to providing open-source AI tools that are both efficient and versatile.