RAG systems
Chatbots and search that answer from your data: document ingestion, chunking, embeddings, vector search (pgvector, Pinecone) and grounded retrieval with citations.
I build production AI on top of real products: retrieval-augmented generation (RAG) over your own documents, autonomous and multi-agent systems, and LLM features wired end to end into web and mobile apps. I work across the OpenAI, Anthropic and Gemini APIs with the Vercel AI SDK and LangChain — and I treat reliability, grounding and evaluation as part of the build, not an afterthought.
Chatbots and search that answer from your data: document ingestion, chunking, embeddings, vector search (pgvector, Pinecone) and grounded retrieval with citations.
Autonomous and multi-agent systems that take actions, run on a schedule, and integrate with your tools — like the dual-agent pipeline behind TechBlog AI Agent.
Generation, summarization, classification and chat added to an existing app, end to end — data and retrieval, API layer, and the web or mobile UI.
Structured outputs, schema validation, guardrails, evaluation sets and human-in-the-loop where it matters — so the feature is trustworthy in front of real users.
B2B platform for AI conversational agents with a vector-search knowledge base and 3D avatars. Implements retrieval-augmented generation (RAG) over each client’s documents, with multichannel delivery across chat, voice, WhatsApp and embeddable web widgets.
Autonomous dual-agent system that discovers AI/tech news from 20+ RSS feeds, rewrites it in Spanish, and publishes automatically every three hours. Uses PostgreSQL-backed deduplication and scheduled execution.
Cross-platform Enneagram personality app with AI-generated reports and personalized coaching, integrating an LLM to turn assessment data into tailored narrative output.
SaaS credit-analysis platform with an AI chatbot that drafts FCRA dispute letters and explains credit concepts, generating structured, document-ready output.
Health and wellness platform with an AI chatbot and a natural-remedies knowledge library, answering user questions grounded in a curated content base.
Yes. Ramón builds retrieval-augmented generation (RAG) systems: ingesting and chunking your documents, generating embeddings, storing them in a vector database (such as pgvector or Pinecone), and retrieving the right context at query time so the model answers from your data instead of guessing. He shipped exactly this in Clona, a B2B platform whose conversational agents answer from a vector-search knowledge base across chat, voice and WhatsApp.
Ramón works with the OpenAI, Anthropic, and Gemini APIs, and routes across models with OpenRouter. On the application side he uses the Vercel AI SDK and LangChain for orchestration, plus vector stores and embeddings for retrieval. He picks the model and tooling per use case — cost, latency, and quality — rather than defaulting to one provider.
Yes. Ramón has built autonomous and multi-agent systems in production. TechBlog AI Agent is a dual-agent pipeline that discovers news from 20+ RSS feeds, rewrites it, and publishes automatically every few hours, with PostgreSQL-backed deduplication and scheduled execution — agents doing real work on a schedule, not a demo.
The core technique is grounding: RAG so answers come from real sources, structured outputs and schema validation so responses are machine-checkable, and guardrails plus fallbacks for when the model is uncertain. Where it matters, he adds evaluation sets to measure quality across changes and keeps a human in the loop for high-stakes actions. The goal is an AI feature you can trust in front of real users, not just a working prompt.
It depends on scope, but a focused AI feature — say a RAG chatbot or a generation flow on top of an existing app — often ships in around 2 to 5 weeks. Pricing is quoted per project once the scope is clear rather than as a fixed rate, so the first step is a short call to define the use case, the data involved, and how reliability will be measured.
Yes — most AI work Ramón does sits on top of an existing product rather than starting from scratch. Because he works full-stack across React, Next.js, React Native and the backend, he can wire an LLM feature end to end: data and retrieval, the API layer, and the web or mobile UI, without coordinating separate contractors.
More than a decade shipping product across web, mobile and AI left me something more valuable than a stack: judgment. If your team is stuck on a technical decision, evaluating a stack, or wants a second opinion before sinking months into a direction — let's talk.