AI Models

ai.KMITL provides access to multiple state-of-the-art AI models from different providers. Each model has unique strengths and characteristics. This guide will help you choose the right model for your task.

Context Limit

All hosted models currently run with a 12,000 token context window. If a vendor model advertises a larger window, assume it is capped to 12K tokens inside ai.KMITL until further notice.

Premium Usage

Some models are marked as premium and can consume more than 1 quota unit per message. Use the quota comparison table below to double-check costs before running long conversations.

Available Models

Google Gemini & Gemma

Gemini 2.5 Pro — Premium reasoning, vision, PDF, and effort-control support. Best for heavy research and complex drafting.
Gemini 2.5 Flash / Flash Lite — Fast, balanced models for day-to-day chats, product copy, and product support flows.
Gemini 2.0 Flash / Flash Lite — Stable default for most text + vision use cases with low quota cost.
Gemini 2.0 Flash Lite — Lightweight fallback when you need maximum throughput.
Google Gemma 3 (27B) — Tuned LLM for quick brainstorming with an open-source flavor.

OpenAI GPT & o-series

GPT 5.1 / GPT 5 / GPT 5 Codex variants — Flagship premium models focused on reasoning, code generation, and multimodal use.
GPT 4.1 family (Standard, Mini, Nano) — Balanced for structured outputs and product UX flows.
GPT 4o & 4o Mini — Good trade-off between creativity and cost for interactive chat UIs.
GPT OSS 20B / 120B — Open-weight variants hosted through OpenAI endpoints for experimentation.
o3 / o3 mini / o3 pro — Reasoning-specialized models with adjustable “effort” controls; choose pro for the most rigorous analysis.
o4 mini / o1 — Fast reasoning-first experiences when latency matters.

Anthropic Claude

Claude Sonnet 4.5 / 4 / 3.7 / 3.5 — High-quality default for instructions, analysis, and code; Sonnet 4.5 adds optional reasoning toggle.
Claude Haiku 4.5 — Cost-effective, fast responses, handy for support or summarization.
Claude Opus 4 — Most capable Claude option for deep reasoning and nuanced planning.

xAI Grok

Grok 4 — Premium reasoning + multimodal support for opinionated, creative drafting.
Grok 4 Fast — Budget-friendly, reasoning-capable alternative suited for rapid back-and-forth chats.

DeepSeek

DeepSeek 3.1 — Efficient reasoning model with function-calling support.
DeepSeek R1 — Premium reasoning-first variant for algorithmic planning and math.

Meta Llama

Llama 4 Maverick & Scout — Vision-savvy assistants ideal for creative brainstorming or design reviews.
Llama 3.1 8B Instant — Lightweight instruct model for commodity text generation.

Image & Multimodal Creation

GPT Image 1 — Photorealistic and illustrative assets with multiple canvas sizes.
Gemini 2.5 Flash Image — Text-in/text-out with optional embedded generation.
Google Nano Banana — Edit images with text.
Google Imagen 3 & Imagen 4 — High-quality illustration.

OpenRouter

Access to various models through OpenRouter:

DeepSeek
Mixtral
And more specialized models

How to Choose a Model

By Task Type

Task	Recommended Model	Why
Coding help	Claude, GPT-5, GPT Codex	Strong at code understanding
Math/Science	GPT-5, Claude Opus	Excellent reasoning
Creative writing	Claude Opus, GPT-5	Creative and articulate
Quick questions	Gemini Flash, Groq	Very fast responses
Document analysis	Gemini Pro	Huge context window
General chat	Claude	Best all-around

By Speed

Fastest: Groq, Gemini Flash
Balanced: Claude Sonnet, GPT-5, GPT-5.1 Slower but thorough: Claude Opus, GPT-5, o3

By Context Length

All models are capped at 12K tokens in ai.KMITL. Treat vendor-advertised maximums as future capabilities.

Switching Models

You can switch models at any time:

Click the model selector at the top of the chat
Browse or search for a model
Click to select it
Continue your conversation

Model Memory

When you switch models, the new model receives the entire conversation history. It will have context of everything discussed so far.

Model Capabilities

Text Generation

All models can generate text, but with different styles:

Claude: Natural, conversational, detailed
GPT: Structured, analytical, clear
Gemini: Factual, comprehensive, thorough

Code Understanding

Best models for coding:

Claude Sonnet - Excellent explanations
GPT - Strong debugging
Claude Opus - Complex algorithms

Reasoning

Best models for complex reasoning:

Claude Opus - Deep analysis
GPT - Structured thinking
Gemini Pro - Comprehensive evaluation

Multimodal (Images, Files)

All major models support:

✅ Image analysis
✅ PDF reading
✅ Document understanding
✅ Code in images

Special Features

Claude Extended Thinking

Some Claude models support extended reasoning for complex problems. The model will "think" through the problem step by step.

GPT-4 Vision

GPT-4 models have strong vision capabilities for analyzing images, diagrams, and screenshots.

Gemini Long Context

Gemini Pro can handle extremely long documents - perfect for analyzing entire books or large codebases.

Usage Tips

When to Use Each Model

Starting a new topic?

Use Claude Sonnet or GPT-5 (reliable, well-rounded)

Need it fast?

Use Claude Haiku or Gemini Flash (speed optimized)

Complex problem?

Use Claude Opus or GPT-5 (deep reasoning)

Long document?

Use Gemini Pro (massive context)

Simple question?

Use Haiku or Gemini Flash Lite (quick and efficient)

Model Comparison Examples

Question: "What is photosynthesis?"

Claude Sonnet: Detailed, educational explanation
GPT-5: Structured, clear breakdown
Gemini Flash: Quick, accurate summary
Haiku: Concise, efficient answer

All correct, different styles!

Quota Considerations

Message Counting

Each message you send (regardless of model) counts toward your monthly quota of 1,000 messages. Choose faster models if you're having a long conversation!

Quota Usage Comparison

Model	Category	Quota Cost	Premium?	Notes
Gemini 2.5 Pro	Text	5x	Yes	Full reasoning + vision
Gemini 2.0 Flash	Text	2x	No	Default balanced pick
Gemini 2.5 Flash Lite	Text	1x	No	Throughput friendly
Google Gemma 3 (27B)	Text	1x	No	Open-source tuned
GPT 5.1	Text	5x	Yes	Flagship reasoning
GPT 5.1 Codex Mini	Text	3x	Yes	Coding-focused
GPT 4o	Text	1x	No	General-purpose multimodal
o3 Pro	Text	20x	Yes	Max-effort reasoning
Grok 4	Text	10	Yes	Creative + multimodal
Grok 4 Fast	Text	1x	No	Low-latency chats
DeepSeek R1	Text	3x	Yes	Math & planning
DeepSeek 3.1	Text	1x	No	Efficient reasoning
Llama 3.1 8B Instant	Text	1x	No	Lightweight responses
GPT Image 1	Image	15x	Yes	Photorealistic renders
Gemini 2.5 Flash Image	Image	10x	Yes	Text-in/text-out image gen

Quota-Friendly Strategies

Use faster models for simple questions
Use powerful models when you need accuracy
Combine models: Ask Gemini Flash first, then GPT-5 for details
Edit prompts before sending to get it right the first time

Bring Your Own Key (BYOK)

If you have your own API keys:

Use any model unlimited
No monthly message limit
Full control over costs
See BYOK Guide for setup

Custom Models

If you've added your own API keys, you can also

Use beta models
Access newest releases immediately
Configure custom parameters

Frequently Asked Questions

Can I use multiple models in one conversation?

Yes! Switch models anytime. The conversation history transfers over.

Which model is best?

For most users: Gemini 2.5 Flash is the best starting point. It's fast, capable, and handles most tasks excellently.

Do different models cost different amounts?

All non-premium models count equally (1 message = 1 message).
Premium models cost more (view Quota Usage Comparison Table above).
With BYOK, actual costs vary by provider.

Why does model X sometimes give better answers than model Y?

Each model has different training, strengths, and characteristics. Try a few to find what works best for your needs.

Can I request new models?

Yes! Contact support with model requests.

Experiment!

Don't be afraid to try different models. You'll quickly learn which ones work best for your specific needs.

AI Models ​

Available Models ​

Google Gemini & Gemma ​

OpenAI GPT & o-series ​

Anthropic Claude ​

xAI Grok ​

DeepSeek ​

Meta Llama ​

Image & Multimodal Creation ​

OpenRouter ​

How to Choose a Model ​

By Task Type ​

By Speed ​

By Context Length ​

Switching Models ​

Model Capabilities ​

Text Generation ​

Code Understanding ​

Reasoning ​

Multimodal (Images, Files) ​

Special Features ​

Claude Extended Thinking ​

GPT-4 Vision ​

Gemini Long Context ​

Usage Tips ​

When to Use Each Model ​

Starting a new topic? ​

Need it fast? ​

Complex problem? ​

Long document? ​

Simple question? ​

Model Comparison Examples ​

Quota Considerations ​

Quota Usage Comparison ​

Quota-Friendly Strategies ​

Bring Your Own Key (BYOK) ​

Custom Models ​

Frequently Asked Questions ​

Can I use multiple models in one conversation? ​

Which model is best? ​

Do different models cost different amounts? ​

Why does model X sometimes give better answers than model Y? ​

Can I request new models? ​

Next Steps ​