AI Models
ai.KMITL provides access to multiple state-of-the-art AI models from different providers. Each model has unique strengths and characteristics. This guide will help you choose the right model for your task.
Context Limit
All hosted models currently run with a 12,000 token context window. If a vendor model advertises a larger window, assume it is capped to 12K tokens inside ai.KMITL until further notice.
Premium Usage
Some models are marked as premium and can consume more than 1 quota unit per message. Use the quota comparison table below to double-check costs before running long conversations.
Available Models
Google Gemini & Gemma
- Gemini 2.5 Pro — Premium reasoning, vision, PDF, and effort-control support. Best for heavy research and complex drafting.
- Gemini 2.5 Flash / Flash Lite — Fast, balanced models for day-to-day chats, product copy, and product support flows.
- Gemini 2.0 Flash / Flash Lite — Stable default for most text + vision use cases with low quota cost.
- Gemini 2.0 Flash Lite — Lightweight fallback when you need maximum throughput.
- Google Gemma 3 (27B) — Tuned LLM for quick brainstorming with an open-source flavor.
OpenAI GPT & o-series
- GPT 5.1 / GPT 5 / GPT 5 Codex variants — Flagship premium models focused on reasoning, code generation, and multimodal use.
- GPT 4.1 family (Standard, Mini, Nano) — Balanced for structured outputs and product UX flows.
- GPT 4o & 4o Mini — Good trade-off between creativity and cost for interactive chat UIs.
- GPT OSS 20B / 120B — Open-weight variants hosted through OpenAI endpoints for experimentation.
- o3 / o3 mini / o3 pro — Reasoning-specialized models with adjustable “effort” controls; choose pro for the most rigorous analysis.
- o4 mini / o1 — Fast reasoning-first experiences when latency matters.
Anthropic Claude
- Claude Sonnet 4.5 / 4 / 3.7 / 3.5 — High-quality default for instructions, analysis, and code; Sonnet 4.5 adds optional reasoning toggle.
- Claude Haiku 4.5 — Cost-effective, fast responses, handy for support or summarization.
- Claude Opus 4 — Most capable Claude option for deep reasoning and nuanced planning.
xAI Grok
- Grok 4 — Premium reasoning + multimodal support for opinionated, creative drafting.
- Grok 4 Fast — Budget-friendly, reasoning-capable alternative suited for rapid back-and-forth chats.
DeepSeek
- DeepSeek 3.1 — Efficient reasoning model with function-calling support.
- DeepSeek R1 — Premium reasoning-first variant for algorithmic planning and math.
Meta Llama
- Llama 4 Maverick & Scout — Vision-savvy assistants ideal for creative brainstorming or design reviews.
- Llama 3.1 8B Instant — Lightweight instruct model for commodity text generation.
Image & Multimodal Creation
- GPT Image 1 — Photorealistic and illustrative assets with multiple canvas sizes.
- Gemini 2.5 Flash Image — Text-in/text-out with optional embedded generation.
- Google Nano Banana — Edit images with text.
- Google Imagen 3 & Imagen 4 — High-quality illustration.
OpenRouter
Access to various models through OpenRouter:
- DeepSeek
- Mixtral
- And more specialized models
How to Choose a Model
By Task Type
| Task | Recommended Model | Why |
|---|---|---|
| Coding help | Claude, GPT-5, GPT Codex | Strong at code understanding |
| Math/Science | GPT-5, Claude Opus | Excellent reasoning |
| Creative writing | Claude Opus, GPT-5 | Creative and articulate |
| Quick questions | Gemini Flash, Groq | Very fast responses |
| Document analysis | Gemini Pro | Huge context window |
| General chat | Claude | Best all-around |
By Speed
Fastest: Groq, Gemini Flash
Balanced: Claude Sonnet, GPT-5, GPT-5.1 Slower but thorough: Claude Opus, GPT-5, o3
By Context Length
All models are capped at 12K tokens in ai.KMITL. Treat vendor-advertised maximums as future capabilities.
Switching Models
You can switch models at any time:
- Click the model selector at the top of the chat
- Browse or search for a model
- Click to select it
- Continue your conversation
Model Memory
When you switch models, the new model receives the entire conversation history. It will have context of everything discussed so far.
Model Capabilities
Text Generation
All models can generate text, but with different styles:
- Claude: Natural, conversational, detailed
- GPT: Structured, analytical, clear
- Gemini: Factual, comprehensive, thorough
Code Understanding
Best models for coding:
- Claude Sonnet - Excellent explanations
- GPT - Strong debugging
- Claude Opus - Complex algorithms
Reasoning
Best models for complex reasoning:
- Claude Opus - Deep analysis
- GPT - Structured thinking
- Gemini Pro - Comprehensive evaluation
Multimodal (Images, Files)
All major models support:
- ✅ Image analysis
- ✅ PDF reading
- ✅ Document understanding
- ✅ Code in images
Special Features
Claude Extended Thinking
Some Claude models support extended reasoning for complex problems. The model will "think" through the problem step by step.
GPT-4 Vision
GPT-4 models have strong vision capabilities for analyzing images, diagrams, and screenshots.
Gemini Long Context
Gemini Pro can handle extremely long documents - perfect for analyzing entire books or large codebases.
Usage Tips
When to Use Each Model
Starting a new topic?
Use Claude Sonnet or GPT-5 (reliable, well-rounded)
Need it fast?
Use Claude Haiku or Gemini Flash (speed optimized)
Complex problem?
Use Claude Opus or GPT-5 (deep reasoning)
Long document?
Use Gemini Pro (massive context)
Simple question?
Use Haiku or Gemini Flash Lite (quick and efficient)
Model Comparison Examples
Question: "What is photosynthesis?"
- Claude Sonnet: Detailed, educational explanation
- GPT-5: Structured, clear breakdown
- Gemini Flash: Quick, accurate summary
- Haiku: Concise, efficient answer
All correct, different styles!
Quota Considerations
Message Counting
Each message you send (regardless of model) counts toward your monthly quota of 1,000 messages. Choose faster models if you're having a long conversation!
Quota Usage Comparison
| Model | Category | Quota Cost | Premium? | Notes |
|---|---|---|---|---|
| Gemini 2.5 Pro | Text | 5x | Yes | Full reasoning + vision |
| Gemini 2.0 Flash | Text | 2x | No | Default balanced pick |
| Gemini 2.5 Flash Lite | Text | 1x | No | Throughput friendly |
| Google Gemma 3 (27B) | Text | 1x | No | Open-source tuned |
| GPT 5.1 | Text | 5x | Yes | Flagship reasoning |
| GPT 5.1 Codex Mini | Text | 3x | Yes | Coding-focused |
| GPT 4o | Text | 1x | No | General-purpose multimodal |
| o3 Pro | Text | 20x | Yes | Max-effort reasoning |
| Grok 4 | Text | 10 | Yes | Creative + multimodal |
| Grok 4 Fast | Text | 1x | No | Low-latency chats |
| DeepSeek R1 | Text | 3x | Yes | Math & planning |
| DeepSeek 3.1 | Text | 1x | No | Efficient reasoning |
| Llama 3.1 8B Instant | Text | 1x | No | Lightweight responses |
| GPT Image 1 | Image | 15x | Yes | Photorealistic renders |
| Gemini 2.5 Flash Image | Image | 10x | Yes | Text-in/text-out image gen |
Quota-Friendly Strategies
- Use faster models for simple questions
- Use powerful models when you need accuracy
- Combine models: Ask Gemini Flash first, then GPT-5 for details
- Edit prompts before sending to get it right the first time
Bring Your Own Key (BYOK)
If you have your own API keys:
- Use any model unlimited
- No monthly message limit
- Full control over costs
- See BYOK Guide for setup
Custom Models
If you've added your own API keys, you can also
- Use beta models
- Access newest releases immediately
- Configure custom parameters
Frequently Asked Questions
Can I use multiple models in one conversation?
Yes! Switch models anytime. The conversation history transfers over.
Which model is best?
For most users: Gemini 2.5 Flash is the best starting point. It's fast, capable, and handles most tasks excellently.
Do different models cost different amounts?
- All non-premium models count equally (1 message = 1 message).
- Premium models cost more (view Quota Usage Comparison Table above).
- With BYOK, actual costs vary by provider.
Why does model X sometimes give better answers than model Y?
Each model has different training, strengths, and characteristics. Try a few to find what works best for your needs.
Can I request new models?
Yes! Contact support with model requests.
Experiment!
Don't be afraid to try different models. You'll quickly learn which ones work best for your specific needs.
Next Steps
- Learn about Chat Interface features
- Enable Web Search & Tools
- Read Writing Great Prompts
