Emergency Hotline: Call 1-844-363-1423 (United We Dream Hotline)
ICE Encounter

Overview

Transitioning from static content to conversational AI requires complex technical infrastructure. Multilingual chatbots must seamlessly detect intent, route queries, retrieve relevant legal statutes across language barriers, and generate safe, fluent responses.


System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        USER INTERFACE                           │
│  (Web/Mobile with language selection, voice input support)      │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    LANGUAGE DETECTION                           │
│  fastText / CLD3 / mBERT → Confidence score → Route or prompt  │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    INPUT PREPROCESSING                          │
│  • Diacritic normalization (Vietnamese)                         │
│  • Tokenization (Chinese: jieba)                                │
│  • Code-switching detection (Spanglish, Chinglish)              │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                  MULTILINGUAL RAG PIPELINE                      │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │ Query       │    │ Cross-Lang  │    │ Re-rank     │         │
│  │ Embedding   │ →  │ Retrieval   │ →  │ & Filter    │         │
│  │ (Multilang) │    │ (Vector DB) │    │ (by lang)   │         │
│  └─────────────┘    └─────────────┘    └─────────────┘         │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                  RESPONSE GENERATION                            │
│  • Language-appropriate LLM selection                           │
│  • Prompt with retrieved context                                │
│  • Post-processing (grammar, hallucination check)               │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    CONVERSATION MEMORY                          │
│  • Multilingual history management                              │
│  • Summarization for context window                             │
│  • Language switch handling                                     │
└─────────────────────────────────────────────────────────────────┘

Language Detection

Detection Methods

Method Speed Accuracy Best For
fastText Very fast Good Real-time detection
CLD3 Fast Good General purpose
langdetect Medium Good Python native
mBERT Slow Excellent Complex cases

Code-Switching Challenges

Input Example Challenge
"Mi hermano was detained by ICE ayer" Mixed Spanish-English
"我的H-1B visa快要expire了" Mixed Chinese-English
"Toi can gap lawyer ve visa" Mixed Vietnamese-English (no diacritics)

Detection Strategy

from langdetect import detect, detect_langs

def detect_with_fallback(text, threshold=0.8):
    """Detect language with confidence-based fallback."""
    try:
        results = detect_langs(text)
        top_lang = results[0]

        if top_lang.prob >= threshold:
            return top_lang.lang, "detected"
        else:
            # Low confidence - likely code-switching
            return None, "prompt_user"
    except:
        return None, "prompt_user"

def route_language(text, user_preference=None):
    """Route to appropriate language pipeline."""
    if user_preference:
        return user_preference

    lang, status = detect_with_fallback(text)

    if status == "prompt_user":
        return "ask_user"  # Trigger language selection UI

    return lang

User Preference Override

Scenario Action
User explicitly selects language Always use preference
Detection conflicts with preference Use preference
No preference, high-confidence detection Use detection
No preference, low-confidence detection Ask user

Multilingual RAG Systems

Vector Store Architecture

Approach Description Trade-offs
Unified store All languages in single vector space Simple, good for cross-lingual
Partitioned Separate stores per language Better precision, more complex
Hybrid Unified with language metadata filters Balanced approach

Multilingual Embedding Models

Model Languages Dimensions Notes
OpenAI text-embedding-3-large 100+ 3072 Best cross-lingual
Cohere Embed v3 100+ 1024 Good multilingual
BGE-M3 100+ 1024 Open source
E5-multilingual 100+ 768 Open source
Qwen2.5-Embedding Excellent CJK Variable Best for Chinese

Cross-Lingual Retrieval

Enable queries in one language to retrieve documents in another:

from sentence_transformers import SentenceTransformer
import chromadb

# Initialize multilingual embedder
embedder = SentenceTransformer('BAAI/bge-m3')

# Query in Spanish
query = "¿Cuáles son mis derechos si ICE viene a mi casa?"
query_embedding = embedder.encode(query)

# Retrieve from both English and Spanish collections
results = collection.query(
    query_embeddings=[query_embedding],
    n_results=10,
    # Include both language partitions
    where={"$or": [{"language": "en"}, {"language": "es"}]}
)

Quota-Based Retrieval

Counteract English dominance in retrieval:

def balanced_retrieval(query_embedding, n_per_language=5):
    """Retrieve equal documents from each language."""
    results = {}

    for lang in ["en", "es", "zh", "vi"]:
        lang_results = collection.query(
            query_embeddings=[query_embedding],
            n_results=n_per_language,
            where={"language": lang}
        )
        results[lang] = lang_results

    # Combine and re-rank
    all_results = combine_results(results)
    return rerank_by_relevance(all_results)

Chunking Strategies

Language-Specific Requirements

Language Chunking Approach Preprocessing
English Semantic/recursive character Standard tokenization
Spanish Semantic/recursive character Handle text expansion
Chinese Morphological segmentation jieba word boundaries
Vietnamese Syllable-merge + normalize underthesea normalization

Chunk Size Recommendations

Parameter English Spanish Chinese Vietnamese
Chunk size 512 tokens 512 tokens 384 tokens 384 tokens
Overlap 64 tokens 64 tokens 96 tokens 64 tokens
Min chunk 100 tokens 100 tokens 75 tokens 75 tokens

Token Bloat Consideration

Non-Latin scripts consume more tokens per semantic unit:

Language "Immigration law" Tokens
English "immigration law" 2
Chinese "移民法" 4-6
Vietnamese "luật di trú" 4-5

Impact: Effective context window is 60-75% smaller for Asian languages.


Response Generation

Language-Appropriate Model Routing

MODEL_ROUTING = {
    "es": "llama-3.3-8b",      # Strong Spanish
    "zh": "qwen2.5-14b",        # Native Chinese
    "vi": "qwen3-32b",          # Good Vietnamese
    "en": "llama-3.3-8b",       # Default
}

def get_model_for_language(detected_lang):
    return MODEL_ROUTING.get(detected_lang, MODEL_ROUTING["en"])

Prompt Engineering for Multilingual

SYSTEM_PROMPT_TEMPLATE = {
    "en": """You are an immigration legal information assistant.
Provide accurate, helpful information while clearly stating you are not a lawyer.
Always recommend consulting with a qualified immigration attorney.""",

    "es": """Eres un asistente de información legal de inmigración.
Proporciona información precisa y útil, indicando claramente que no eres abogado.
Siempre recomienda consultar con un abogado de inmigración calificado.""",

    "zh": """你是一位移民法律信息助手。
提供准确、有帮助的信息,同时明确说明你不是律师。
始终建议咨询合格的移民律师。""",

    "vi": """Bạn là trợ lý thông tin pháp lý về di trú.
Cung cấp thông tin chính xác, hữu ích, đồng thời nêu rõ bạn không phải là luật sư.
Luôn khuyên người dùng tham khảo ý kiến luật sư di trú có trình độ."""
}

Hallucination Prevention

Strategy Implementation
Grounding Require citations to retrieved documents
Confidence signals "Based on the information provided..."
Uncertainty acknowledgment "I'm not certain about..."
Escalation triggers Detect when to recommend attorney

Post-Processing

Check Purpose Tool
Grammar correction Fix minor LLM errors Language-specific models
Hallucination detection Verify claims against sources NLI models
Terminology verification Match official glossary Dictionary lookup
Disclaimer presence Ensure legal caveats included Regex/template check

Conversation Memory

Context Window Challenges

Issue Impact Mitigation
Token bloat Non-Latin uses 3-4x tokens Aggressive summarization
Language switches Context confusion Maintain language tags
Long conversations Context overflow Rolling summary

Memory Management Strategy

class MultilingualConversationMemory:
    def __init__(self, max_tokens=4000):
        self.max_tokens = max_tokens
        self.messages = []
        self.language_history = []

    def add_message(self, role, content, language):
        self.messages.append({
            "role": role,
            "content": content,
            "language": language,
            "timestamp": datetime.now()
        })
        self.language_history.append(language)
        self._maybe_summarize()

    def _maybe_summarize(self):
        """Compress history if exceeding token limit."""
        current_tokens = self._count_tokens()

        if current_tokens > self.max_tokens * 0.8:
            # Summarize older messages
            old_messages = self.messages[:-4]  # Keep recent
            summary = self._generate_summary(old_messages)

            self.messages = [
                {"role": "system", "content": f"Previous conversation summary: {summary}"}
            ] + self.messages[-4:]

    def handle_language_switch(self, new_language):
        """Handle mid-conversation language change."""
        if self.language_history and self.language_history[-1] != new_language:
            # Generate bridge summary in new language
            bridge = self._translate_summary(new_language)
            self.messages.append({
                "role": "system",
                "content": f"[Language switched to {new_language}] Summary: {bridge}"
            })

Language Switch Handling

Scenario Action
User switches language Acknowledge, continue in new language
Mixed input Respond in dominant language
Explicit request Switch and summarize context

Error Handling

Graceful Degradation

Failure Fallback
Language detection fails Prompt for selection
RAG retrieval empty Use general knowledge + disclaimer
Model timeout Queue and notify user
Translation error Show English + apology

Error Messages by Language

ERROR_MESSAGES = {
    "en": {
        "retry": "I'm having trouble understanding. Could you please rephrase?",
        "technical": "I'm experiencing technical difficulties. Please try again.",
        "escalate": "This question requires a qualified attorney. Here are resources..."
    },
    "es": {
        "retry": "Tengo dificultades para entender. ¿Podría reformular su pregunta?",
        "technical": "Estoy experimentando dificultades técnicas. Por favor intente de nuevo.",
        "escalate": "Esta pregunta requiere un abogado calificado. Aquí hay recursos..."
    },
    # ... other languages
}

Implementation Checklist

Phase 1: Detection & Routing

  • [ ] Implement language detection pipeline
  • [ ] Set up user preference storage
  • [ ] Configure fallback to user selection
  • [ ] Test code-switching scenarios

Phase 2: RAG Pipeline

  • [ ] Select multilingual embedding model
  • [ ] Configure vector store with language partitions
  • [ ] Implement balanced retrieval
  • [ ] Set up language-specific chunking

Phase 3: Generation

  • [ ] Configure model routing by language
  • [ ] Create language-specific system prompts
  • [ ] Implement post-processing pipeline
  • [ ] Add hallucination detection

Phase 4: Memory & Polish

  • [ ] Build conversation memory manager
  • [ ] Implement summarization for token management
  • [ ] Test language switching scenarios
  • [ ] Configure error handling

Next Steps

  1. Review language-specific guides for preprocessing details
  2. Design multilingual UX for conversation interface
  3. Set up translation workflow for knowledge base
  4. Plan full implementation timeline
Legal Disclaimer

This website does not provide legal advice. The information provided on this site is for general informational and educational purposes only. It does not create an attorney-client relationship.

Information on this website may not be current or accurate. Immigration law is complex and varies by jurisdiction and individual circumstances. Always consult with a qualified immigration attorney for advice specific to your situation.

Neither ICE Encounter, its developers, partners, nor any contributors shall be liable for any actions taken or not taken based on information from this site. Use of this site is subject to our Terms of Use and Privacy Policy.