Overview
Transitioning from static content to conversational AI requires complex technical infrastructure. Multilingual chatbots must seamlessly detect intent, route queries, retrieve relevant legal statutes across language barriers, and generate safe, fluent responses.
System Architecture
┌─────────────────────────────────────────────────────────────────┐
│ USER INTERFACE │
│ (Web/Mobile with language selection, voice input support) │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ LANGUAGE DETECTION │
│ fastText / CLD3 / mBERT → Confidence score → Route or prompt │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ INPUT PREPROCESSING │
│ • Diacritic normalization (Vietnamese) │
│ • Tokenization (Chinese: jieba) │
│ • Code-switching detection (Spanglish, Chinglish) │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ MULTILINGUAL RAG PIPELINE │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Query │ │ Cross-Lang │ │ Re-rank │ │
│ │ Embedding │ → │ Retrieval │ → │ & Filter │ │
│ │ (Multilang) │ │ (Vector DB) │ │ (by lang) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ RESPONSE GENERATION │
│ • Language-appropriate LLM selection │
│ • Prompt with retrieved context │
│ • Post-processing (grammar, hallucination check) │
└────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CONVERSATION MEMORY │
│ • Multilingual history management │
│ • Summarization for context window │
│ • Language switch handling │
└─────────────────────────────────────────────────────────────────┘
Language Detection
Detection Methods
| Method | Speed | Accuracy | Best For |
|---|---|---|---|
| fastText | Very fast | Good | Real-time detection |
| CLD3 | Fast | Good | General purpose |
| langdetect | Medium | Good | Python native |
| mBERT | Slow | Excellent | Complex cases |
Code-Switching Challenges
| Input Example | Challenge |
|---|---|
| "Mi hermano was detained by ICE ayer" | Mixed Spanish-English |
| "我的H-1B visa快要expire了" | Mixed Chinese-English |
| "Toi can gap lawyer ve visa" | Mixed Vietnamese-English (no diacritics) |
Detection Strategy
from langdetect import detect, detect_langs
def detect_with_fallback(text, threshold=0.8):
"""Detect language with confidence-based fallback."""
try:
results = detect_langs(text)
top_lang = results[0]
if top_lang.prob >= threshold:
return top_lang.lang, "detected"
else:
# Low confidence - likely code-switching
return None, "prompt_user"
except:
return None, "prompt_user"
def route_language(text, user_preference=None):
"""Route to appropriate language pipeline."""
if user_preference:
return user_preference
lang, status = detect_with_fallback(text)
if status == "prompt_user":
return "ask_user" # Trigger language selection UI
return lang
User Preference Override
| Scenario | Action |
|---|---|
| User explicitly selects language | Always use preference |
| Detection conflicts with preference | Use preference |
| No preference, high-confidence detection | Use detection |
| No preference, low-confidence detection | Ask user |
Multilingual RAG Systems
Vector Store Architecture
| Approach | Description | Trade-offs |
|---|---|---|
| Unified store | All languages in single vector space | Simple, good for cross-lingual |
| Partitioned | Separate stores per language | Better precision, more complex |
| Hybrid | Unified with language metadata filters | Balanced approach |
Multilingual Embedding Models
| Model | Languages | Dimensions | Notes |
|---|---|---|---|
| OpenAI text-embedding-3-large | 100+ | 3072 | Best cross-lingual |
| Cohere Embed v3 | 100+ | 1024 | Good multilingual |
| BGE-M3 | 100+ | 1024 | Open source |
| E5-multilingual | 100+ | 768 | Open source |
| Qwen2.5-Embedding | Excellent CJK | Variable | Best for Chinese |
Cross-Lingual Retrieval
Enable queries in one language to retrieve documents in another:
from sentence_transformers import SentenceTransformer
import chromadb
# Initialize multilingual embedder
embedder = SentenceTransformer('BAAI/bge-m3')
# Query in Spanish
query = "¿Cuáles son mis derechos si ICE viene a mi casa?"
query_embedding = embedder.encode(query)
# Retrieve from both English and Spanish collections
results = collection.query(
query_embeddings=[query_embedding],
n_results=10,
# Include both language partitions
where={"$or": [{"language": "en"}, {"language": "es"}]}
)
Quota-Based Retrieval
Counteract English dominance in retrieval:
def balanced_retrieval(query_embedding, n_per_language=5):
"""Retrieve equal documents from each language."""
results = {}
for lang in ["en", "es", "zh", "vi"]:
lang_results = collection.query(
query_embeddings=[query_embedding],
n_results=n_per_language,
where={"language": lang}
)
results[lang] = lang_results
# Combine and re-rank
all_results = combine_results(results)
return rerank_by_relevance(all_results)
Chunking Strategies
Language-Specific Requirements
| Language | Chunking Approach | Preprocessing |
|---|---|---|
| English | Semantic/recursive character | Standard tokenization |
| Spanish | Semantic/recursive character | Handle text expansion |
| Chinese | Morphological segmentation | jieba word boundaries |
| Vietnamese | Syllable-merge + normalize | underthesea normalization |
Chunk Size Recommendations
| Parameter | English | Spanish | Chinese | Vietnamese |
|---|---|---|---|---|
| Chunk size | 512 tokens | 512 tokens | 384 tokens | 384 tokens |
| Overlap | 64 tokens | 64 tokens | 96 tokens | 64 tokens |
| Min chunk | 100 tokens | 100 tokens | 75 tokens | 75 tokens |
Token Bloat Consideration
Non-Latin scripts consume more tokens per semantic unit:
| Language | "Immigration law" | Tokens |
|---|---|---|
| English | "immigration law" | 2 |
| Chinese | "移民法" | 4-6 |
| Vietnamese | "luật di trú" | 4-5 |
Impact: Effective context window is 60-75% smaller for Asian languages.
Response Generation
Language-Appropriate Model Routing
MODEL_ROUTING = {
"es": "llama-3.3-8b", # Strong Spanish
"zh": "qwen2.5-14b", # Native Chinese
"vi": "qwen3-32b", # Good Vietnamese
"en": "llama-3.3-8b", # Default
}
def get_model_for_language(detected_lang):
return MODEL_ROUTING.get(detected_lang, MODEL_ROUTING["en"])
Prompt Engineering for Multilingual
SYSTEM_PROMPT_TEMPLATE = {
"en": """You are an immigration legal information assistant.
Provide accurate, helpful information while clearly stating you are not a lawyer.
Always recommend consulting with a qualified immigration attorney.""",
"es": """Eres un asistente de información legal de inmigración.
Proporciona información precisa y útil, indicando claramente que no eres abogado.
Siempre recomienda consultar con un abogado de inmigración calificado.""",
"zh": """你是一位移民法律信息助手。
提供准确、有帮助的信息,同时明确说明你不是律师。
始终建议咨询合格的移民律师。""",
"vi": """Bạn là trợ lý thông tin pháp lý về di trú.
Cung cấp thông tin chính xác, hữu ích, đồng thời nêu rõ bạn không phải là luật sư.
Luôn khuyên người dùng tham khảo ý kiến luật sư di trú có trình độ."""
}
Hallucination Prevention
| Strategy | Implementation |
|---|---|
| Grounding | Require citations to retrieved documents |
| Confidence signals | "Based on the information provided..." |
| Uncertainty acknowledgment | "I'm not certain about..." |
| Escalation triggers | Detect when to recommend attorney |
Post-Processing
| Check | Purpose | Tool |
|---|---|---|
| Grammar correction | Fix minor LLM errors | Language-specific models |
| Hallucination detection | Verify claims against sources | NLI models |
| Terminology verification | Match official glossary | Dictionary lookup |
| Disclaimer presence | Ensure legal caveats included | Regex/template check |
Conversation Memory
Context Window Challenges
| Issue | Impact | Mitigation |
|---|---|---|
| Token bloat | Non-Latin uses 3-4x tokens | Aggressive summarization |
| Language switches | Context confusion | Maintain language tags |
| Long conversations | Context overflow | Rolling summary |
Memory Management Strategy
class MultilingualConversationMemory:
def __init__(self, max_tokens=4000):
self.max_tokens = max_tokens
self.messages = []
self.language_history = []
def add_message(self, role, content, language):
self.messages.append({
"role": role,
"content": content,
"language": language,
"timestamp": datetime.now()
})
self.language_history.append(language)
self._maybe_summarize()
def _maybe_summarize(self):
"""Compress history if exceeding token limit."""
current_tokens = self._count_tokens()
if current_tokens > self.max_tokens * 0.8:
# Summarize older messages
old_messages = self.messages[:-4] # Keep recent
summary = self._generate_summary(old_messages)
self.messages = [
{"role": "system", "content": f"Previous conversation summary: {summary}"}
] + self.messages[-4:]
def handle_language_switch(self, new_language):
"""Handle mid-conversation language change."""
if self.language_history and self.language_history[-1] != new_language:
# Generate bridge summary in new language
bridge = self._translate_summary(new_language)
self.messages.append({
"role": "system",
"content": f"[Language switched to {new_language}] Summary: {bridge}"
})
Language Switch Handling
| Scenario | Action |
|---|---|
| User switches language | Acknowledge, continue in new language |
| Mixed input | Respond in dominant language |
| Explicit request | Switch and summarize context |
Error Handling
Graceful Degradation
| Failure | Fallback |
|---|---|
| Language detection fails | Prompt for selection |
| RAG retrieval empty | Use general knowledge + disclaimer |
| Model timeout | Queue and notify user |
| Translation error | Show English + apology |
Error Messages by Language
ERROR_MESSAGES = {
"en": {
"retry": "I'm having trouble understanding. Could you please rephrase?",
"technical": "I'm experiencing technical difficulties. Please try again.",
"escalate": "This question requires a qualified attorney. Here are resources..."
},
"es": {
"retry": "Tengo dificultades para entender. ¿Podría reformular su pregunta?",
"technical": "Estoy experimentando dificultades técnicas. Por favor intente de nuevo.",
"escalate": "Esta pregunta requiere un abogado calificado. Aquí hay recursos..."
},
# ... other languages
}
Implementation Checklist
Phase 1: Detection & Routing
- [ ] Implement language detection pipeline
- [ ] Set up user preference storage
- [ ] Configure fallback to user selection
- [ ] Test code-switching scenarios
Phase 2: RAG Pipeline
- [ ] Select multilingual embedding model
- [ ] Configure vector store with language partitions
- [ ] Implement balanced retrieval
- [ ] Set up language-specific chunking
Phase 3: Generation
- [ ] Configure model routing by language
- [ ] Create language-specific system prompts
- [ ] Implement post-processing pipeline
- [ ] Add hallucination detection
Phase 4: Memory & Polish
- [ ] Build conversation memory manager
- [ ] Implement summarization for token management
- [ ] Test language switching scenarios
- [ ] Configure error handling
Next Steps
- Review language-specific guides for preprocessing details
- Design multilingual UX for conversation interface
- Set up translation workflow for knowledge base
- Plan full implementation timeline