Focuses on inference efficiency, providing a leaner expert configuration optimized for latency-critical applications. It is suitable for workloads like routing, lightweight decision-making, and fast interactive chat, where responses must be both quick and reasonably accurate. The Scout configuration makes it a natural front-line model in cascaded systems, handing off to heavier models only when needed. This helps teams manage costs while preserving a good user experience.
