Orlando’s AI Assistant — RAG Demo | Case Study by Orlando Ascanio

The Problem

Standard chatbots either hallucinate or refuse to engage. I wanted to demonstrate what a grounded, production-quality AI assistant looks like — one that pulls from a real knowledge base, streams answers in real time, and handles edge cases gracefully across multiple languages.

The Solution

Built a custom RAG pipeline where documents about my background, skills, and projects are chunked, embedded, and retrieved at query time. The Next.js frontend streams tokens as they arrive from the Gemini API, making the experience feel instantaneous rather than loading-then-delivering.

Key decisions:

Custom RAG over fine-tuning — retrieval keeps the knowledge base updatable without model retraining
15+ context categories — knowledge is structured by domain (experience, skills, projects, values) so retrieval is precise
Multi-language support — the system detects query language and responds in kind, tested across 8 locales
Safety-aware logic — queries outside the knowledge scope return a calibrated refusal, not a hallucination
Keyboard shortcuts — power users can navigate and submit without reaching for the mouse

The Outcome

A fully interactive demo deployed to production that shows RAG pipeline implementation, streaming UI patterns, multi-language support, and safety-aware prompt design working together. Demonstrates that grounded AI can feel both capable and trustworthy.