Free
Premium
Thanks to our energy-based dynamic pricing, you get access to cheaper Premium AI Models.
Browse the largest catalog of free LLMs or filter for CLōD-hosted models to get the best rates for your workflow.
GLM-4.5-Air is an efficient 106B-parameter model offering 128K context and hybrid reasoning for cost-conscious deployments.
GLM-5 is an open-source model for architect-level, long-horizon systems engineering and programming.
An enhanced Grok 4 model with extended context capacity, supporting programming and technology domains with reasoning capabilities.
A Grok 3 model optimized for conversational AI tasks, emphasizing responsive dialogue and tool support.
Thinking-optimized 80B variant with stronger multi-step reasoning.
Ultra-large coding-focused model built for advanced software and debugging tasks.
Reasoning-optimized 235B model with high analytical strength.
Next-gen 80B model offering strong reasoning and versatile knowledge at lower cost.
High-capability Turbo model delivering strong general-purpose performance.
Fast and cost-efficient 7B Turbo model for lightweight language tasks.
Open-source large-scale language model for high-quality reasoning and generation.
Cost-optimized variant of the 235B model for high-throughput general instruction tasks.
Throughput-optimized 235B model delivering high-volume performance at reduced cost.
A third-generation, 32B-parameter Qwen model optimized for tasks in the health category, supporting complex reasoning and workflows.
The fastest, cheapest version of GPT-5. It's great for summarization and classification tasks.
The faster, more cost-efficient version of GPT-5. It's great for well-defined tasks and precise prompts.
OpenAI's flagship model for coding and agentic tasks across industries.
GPT-OSS-20B is a 21B Apache 2.0 model delivering strong reasoning with efficient single-GPU performance.
A turbocharged GPT-4 variant focused on high-speed text generation with enhanced tool integration and large context handling.
An intelligent reasoning model for coding and agentic tasks with configurable reasoning effort.
A multimodal GPT model with strong finance domain focus, capable of handling text and image inputs for complex information processing.
Kimi K2.5 is Moonshot AI’s top open-source reasoning model with SOTA performance and 2× faster INT4 inference.
Reasoning-oriented variant optimized for multi-step problem solving.
A smaller, versatile GPT model optimized for multimodal finance, marketing, academic, SEO, and translation tasks with efficient outputs.
A GPT-4.1 model with extensive multi-domain support, optimized for programming, science, technology, legal, health, and academic tasks.
A 24B model matching top compact models, with larger versions up to 70B—ideal for chat, support, translation, and summarization.
General-purpose instruction-tuned model for practical task execution.
MiniMax M2.5 is a state-of-the-art coding and agentic AI model with 80.2% SWE-Bench and 37% faster performance than M2.1.
Lower-cost 8B model tuned for lightweight chat and content tasks.
A high-context Llama 3.1 turbo variant supporting tool integrations and optimized for fast, responsive trivia and general text generation.
Pretrained generative Sparse Mixture of Experts model designed for efficient, high-performance instruction following.
Large-scale 405B model optimized for high-quality reasoning and professional tasks.
Balanced 17B model tuned for broad capability and usefulness.
A Llama 3.3 instruct turbo model tailored for concise marketing applications, offering efficient generation and prompt responses.
A powerful Llama 3.3 large-scale instruct model designed to generate marketing-focused content with high coherence and clarity.
Efficiency-focused 17B model optimized for fast inference.
An instruction-tuned Llama 3.1 model optimized for guided text generation with focused performance in trivia and knowledge retrieval tasks.
Lightweight 3B model with good speed and cost-efficiency for simpler tasks.
A high-throughput Llama 3.1 8B model by Cerebras, designed for trivia, rapid answer generation, and real-time question resolving.
Compact general-purpose instruction-tuned model with balanced capabilities.
A Gemini model supporting large context inputs with strong multimodal reasoning and broad category applications.
An advanced Gemini 2.5 Pro model designed for comprehensive multimodal reasoning and extensive category coverage including finance and legal.
Efficient 4-bit quantized Gemma 3 variant with strong reasoning and tool-use.
Optimized variant of R1 for faster, high-volume inference.
Gemini 3 Flash Preview is a low-latency, cost-efficient multimodal reasoning model for agentic workflows and coding.
Updated Deepseek model balancing capability with improved efficiency.
Advanced reasoning model focused on complex, chain-of-thought tasks.
Large-scale Cogito model focused on deep reasoning and complex knowledge tasks.
A DeepSeek model that combines exceptional computational efficiency with advanced reasoning and strong agent capabilities.
Sparse MoE architecture offering cost-efficient intelligence at scale.
70B Cogito V2 variant emphasizing factuality and applied reasoning.
A high-capacity Claude Opus 4.0 model optimized for programming tasks, complex reasoning, and extended dialogue handling.
26B-parameter (3B active) sparse MoE model optimized for long-context inference, function calling, and multi-step agents.
Anthropic's fastest model with near-frontier intelligence.
A versatile Claude Sonnet 4.0 model designed for broad application across programming, finance, marketing, science, and roleplay domains.
Anthropic's smart model for complex agents and coding.
Anthropic's premium model combining maximum intelligence with practical performance.