Multilingual Chatbot Architecture | ICE Encounter

Overview

Transitioning from static content to conversational AI requires complex technical infrastructure. Multilingual chatbots must seamlessly detect intent, route queries, retrieve relevant legal statutes across language barriers, and generate safe, fluent responses.

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        USER INTERFACE                           │
│  (Web/Mobile with language selection, voice input support)      │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    LANGUAGE DETECTION                           │
│  fastText / CLD3 / mBERT → Confidence score → Route or prompt  │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    INPUT PREPROCESSING                          │
│  • Diacritic normalization (Vietnamese)                         │
│  • Tokenization (Chinese: jieba)                                │
│  • Code-switching detection (Spanglish, Chinglish)              │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                  MULTILINGUAL RAG PIPELINE                      │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │ Query       │    │ Cross-Lang  │    │ Re-rank     │         │
│  │ Embedding   │ →  │ Retrieval   │ →  │ & Filter    │         │
│  │ (Multilang) │    │ (Vector DB) │    │ (by lang)   │         │
│  └─────────────┘    └─────────────┘    └─────────────┘         │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                  RESPONSE GENERATION                            │
│  • Language-appropriate LLM selection                           │
│  • Prompt with retrieved context                                │
│  • Post-processing (grammar, hallucination check)               │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    CONVERSATION MEMORY                          │
│  • Multilingual history management                              │
│  • Summarization for context window                             │
│  • Language switch handling                                     │
└─────────────────────────────────────────────────────────────────┘

Language Detection

Detection Methods

Method	Speed	Accuracy	Best For
fastText	Very fast	Good	Real-time detection
CLD3	Fast	Good	General purpose
langdetect	Medium	Good	Python native
mBERT	Slow	Excellent	Complex cases

Code-Switching Challenges

Input Example	Challenge
"Mi hermano was detained by ICE ayer"	Mixed Spanish-English
"我的H-1B visa快要expire了"	Mixed Chinese-English
"Toi can gap lawyer ve visa"	Mixed Vietnamese-English (no diacritics)

Detection Strategy

from langdetect import detect, detect_langs

def detect_with_fallback(text, threshold=0.8):
    """Detect language with confidence-based fallback."""
    try:
        results = detect_langs(text)
        top_lang = results[0]

        if top_lang.prob >= threshold:
            return top_lang.lang, "detected"
        else:
            # Low confidence - likely code-switching
            return None, "prompt_user"
    except:
        return None, "prompt_user"

def route_language(text, user_preference=None):
    """Route to appropriate language pipeline."""
    if user_preference:
        return user_preference

    lang, status = detect_with_fallback(text)

    if status == "prompt_user":
        return "ask_user"  # Trigger language selection UI

    return lang

User Preference Override

Scenario	Action
User explicitly selects language	Always use preference
Detection conflicts with preference	Use preference
No preference, high-confidence detection	Use detection
No preference, low-confidence detection	Ask user

Multilingual RAG Systems

Vector Store Architecture

Approach	Description	Trade-offs
Unified store	All languages in single vector space	Simple, good for cross-lingual
Partitioned	Separate stores per language	Better precision, more complex
Hybrid	Unified with language metadata filters	Balanced approach

Multilingual Embedding Models

Model	Languages	Dimensions	Notes
OpenAI text-embedding-3-large	100+	3072	Best cross-lingual
Cohere Embed v3	100+	1024	Good multilingual
BGE-M3	100+	1024	Open source
E5-multilingual	100+	768	Open source
Qwen2.5-Embedding	Excellent CJK	Variable	Best for Chinese

Cross-Lingual Retrieval

Enable queries in one language to retrieve documents in another:

from sentence_transformers import SentenceTransformer
import chromadb

# Initialize multilingual embedder
embedder = SentenceTransformer('BAAI/bge-m3')

# Query in Spanish
query = "¿Cuáles son mis derechos si ICE viene a mi casa?"
query_embedding = embedder.encode(query)

# Retrieve from both English and Spanish collections
results = collection.query(
    query_embeddings=[query_embedding],
    n_results=10,
    # Include both language partitions
    where={"$or": [{"language": "en"}, {"language": "es"}]}
)

Quota-Based Retrieval

Counteract English dominance in retrieval:

def balanced_retrieval(query_embedding, n_per_language=5):
    """Retrieve equal documents from each language."""
    results = {}

    for lang in ["en", "es", "zh", "vi"]:
        lang_results = collection.query(
            query_embeddings=[query_embedding],
            n_results=n_per_language,
            where={"language": lang}
        )
        results[lang] = lang_results

    # Combine and re-rank
    all_results = combine_results(results)
    return rerank_by_relevance(all_results)

Chunking Strategies

Language-Specific Requirements

Language	Chunking Approach	Preprocessing
English	Semantic/recursive character	Standard tokenization
Spanish	Semantic/recursive character	Handle text expansion
Chinese	Morphological segmentation	jieba word boundaries
Vietnamese	Syllable-merge + normalize	underthesea normalization

Chunk Size Recommendations

Parameter	English	Spanish	Chinese	Vietnamese
Chunk size	512 tokens	512 tokens	384 tokens	384 tokens
Overlap	64 tokens	64 tokens	96 tokens	64 tokens
Min chunk	100 tokens	100 tokens	75 tokens	75 tokens

Token Bloat Consideration

Non-Latin scripts consume more tokens per semantic unit:

Language	"Immigration law"	Tokens
English	"immigration law"	2
Chinese	"移民法"	4-6
Vietnamese	"luật di trú"	4-5

Impact: Effective context window is 60-75% smaller for Asian languages.

Response Generation

Language-Appropriate Model Routing

MODEL_ROUTING = {
    "es": "llama-3.3-8b",      # Strong Spanish
    "zh": "qwen2.5-14b",        # Native Chinese
    "vi": "qwen3-32b",          # Good Vietnamese
    "en": "llama-3.3-8b",       # Default
}

def get_model_for_language(detected_lang):
    return MODEL_ROUTING.get(detected_lang, MODEL_ROUTING["en"])

Prompt Engineering for Multilingual

SYSTEM_PROMPT_TEMPLATE = {
    "en": """You are an immigration legal information assistant.
Provide accurate, helpful information while clearly stating you are not a lawyer.
Always recommend consulting with a qualified immigration attorney.""",

    "es": """Eres un asistente de información legal de inmigración.
Proporciona información precisa y útil, indicando claramente que no eres abogado.
Siempre recomienda consultar con un abogado de inmigración calificado.""",

    "zh": """你是一位移民法律信息助手。
提供准确、有帮助的信息，同时明确说明你不是律师。
始终建议咨询合格的移民律师。""",

    "vi": """Bạn là trợ lý thông tin pháp lý về di trú.
Cung cấp thông tin chính xác, hữu ích, đồng thời nêu rõ bạn không phải là luật sư.
Luôn khuyên người dùng tham khảo ý kiến luật sư di trú có trình độ."""
}

Hallucination Prevention

Strategy	Implementation
Grounding	Require citations to retrieved documents
Confidence signals	"Based on the information provided..."
Uncertainty acknowledgment	"I'm not certain about..."
Escalation triggers	Detect when to recommend attorney

Post-Processing

Check	Purpose	Tool
Grammar correction	Fix minor LLM errors	Language-specific models
Hallucination detection	Verify claims against sources	NLI models
Terminology verification	Match official glossary	Dictionary lookup
Disclaimer presence	Ensure legal caveats included	Regex/template check

Conversation Memory

Context Window Challenges

Issue	Impact	Mitigation
Token bloat	Non-Latin uses 3-4x tokens	Aggressive summarization
Language switches	Context confusion	Maintain language tags
Long conversations	Context overflow	Rolling summary

Memory Management Strategy

class MultilingualConversationMemory:
    def __init__(self, max_tokens=4000):
        self.max_tokens = max_tokens
        self.messages = []
        self.language_history = []

    def add_message(self, role, content, language):
        self.messages.append({
            "role": role,
            "content": content,
            "language": language,
            "timestamp": datetime.now()
        })
        self.language_history.append(language)
        self._maybe_summarize()

    def _maybe_summarize(self):
        """Compress history if exceeding token limit."""
        current_tokens = self._count_tokens()

        if current_tokens > self.max_tokens * 0.8:
            # Summarize older messages
            old_messages = self.messages[:-4]  # Keep recent
            summary = self._generate_summary(old_messages)

            self.messages = [
                {"role": "system", "content": f"Previous conversation summary: {summary}"}
            ] + self.messages[-4:]

    def handle_language_switch(self, new_language):
        """Handle mid-conversation language change."""
        if self.language_history and self.language_history[-1] != new_language:
            # Generate bridge summary in new language
            bridge = self._translate_summary(new_language)
            self.messages.append({
                "role": "system",
                "content": f"[Language switched to {new_language}] Summary: {bridge}"
            })

Language Switch Handling

Scenario	Action
User switches language	Acknowledge, continue in new language
Mixed input	Respond in dominant language
Explicit request	Switch and summarize context

Error Handling

Graceful Degradation

Failure	Fallback
Language detection fails	Prompt for selection
RAG retrieval empty	Use general knowledge + disclaimer
Model timeout	Queue and notify user
Translation error	Show English + apology

Error Messages by Language

ERROR_MESSAGES = {
    "en": {
        "retry": "I'm having trouble understanding. Could you please rephrase?",
        "technical": "I'm experiencing technical difficulties. Please try again.",
        "escalate": "This question requires a qualified attorney. Here are resources..."
    },
    "es": {
        "retry": "Tengo dificultades para entender. ¿Podría reformular su pregunta?",
        "technical": "Estoy experimentando dificultades técnicas. Por favor intente de nuevo.",
        "escalate": "Esta pregunta requiere un abogado calificado. Aquí hay recursos..."
    },
    # ... other languages
}

Implementation Checklist

Phase 1: Detection & Routing

[ ] Implement language detection pipeline
[ ] Set up user preference storage
[ ] Configure fallback to user selection
[ ] Test code-switching scenarios

Phase 2: RAG Pipeline

[ ] Select multilingual embedding model
[ ] Configure vector store with language partitions
[ ] Implement balanced retrieval
[ ] Set up language-specific chunking

Phase 3: Generation

[ ] Configure model routing by language
[ ] Create language-specific system prompts
[ ] Implement post-processing pipeline
[ ] Add hallucination detection

Phase 4: Memory & Polish

[ ] Build conversation memory manager
[ ] Implement summarization for token management
[ ] Test language switching scenarios
[ ] Configure error handling

Next Steps

Review language-specific guides for preprocessing details
Design multilingual UX for conversation interface
Set up translation workflow for knowledge base
Plan full implementation timeline