Ecommerce Chatbot : Building a RAG-Powered LLM

Project Overview

In this project, I built an e-commerce chatbot powered by RAG that can both answer FAQ-style questions and help users navigate a product catalog. The bot sits behind a simple Streamlit app interface (our UI) where a user types a query. Under the hood, the system has two knowledge sources: product FAQs and a product database. I compiled common product-related FAQs into a text document and indexed them using LangChain and ChromaDB (a vector database). Meanwhile, I scraped a sample of product data from Flipkart (product names, prices, categories, etc.) and loaded it into a local SQLite database for quick querying. The idea is that if the user asks a factual question (e.g. “What is the return policy?”), the bot will retrieve the answer from the indexed FAQ document. If the user asks to find or compare products (e.g. “Show me red sneakers under $50”), the bot will query the SQLite product DB to find matching items.

To summarize the data workflow: I first performed offline indexing for the FAQs – splitting the document into chunks and embedding them (using a model like MiniLM), then storing those embeddings in a Chroma vector store. ChromaDB is an open-source, Python-native vector database designed for LangChain workflows

The routing logic is a key part of the design. After the user submits a question, a custom router.py module inspects the query. It uses simple keyword checks or a small classification chain (following the LangChain routing patternpython.langchain.com) to decide: Is this question about browsing/searching products, or is it a general FAQ? If it’s product-related, the router sends the query to the SQL workflow; if it’s informational, it goes to the RAG/FAQ workflow. This ensures the chatbot “knows” to either run a database search or a retrieval-augmented chain depending on the intent.

At query time, the chatbot converts the user question into an embedding, retrieves the most similar FAQ chunks from Chroma, and appends them to the user’s prompt. The combined prompt goes into the LLM to generate the answer. For product queries, the bot runs SQL queries on the SQLite data (like filtering by category or price) and returns formatted product info.

Introduction to RAG

n simple terms, RAG means giving a language model an “open book” to look things up in while it answers a question. First you ingest or index a set of documents (e.g. news articles, PDFs, your own knowledge base), converting them into embeddings and storing them in a vector database. Then at query time, when a user asks something, the system retrieves the most relevant documents (based on vector similarity) and passes that context along with the question to the LLM to generate an answer. In practice this often means taking a few top-matching text chunks and prepending them to the prompt sent to the model. The result is an answer that’s grounded in real data rather than purely the model’s internal “imagination.”

Researchers describe RAG as a way to “ground” LLM outputs in actual facts by fetching relevant information from an external knowledge base. For example, instead of trusting the model’s internal knowledge (which might be outdated or incomplete), we let it see fresh content. RAG can reduce hallucinations because the model answers “with the most current, reliable facts” instead of making things up.

Tech Stack

Python 3: The glue language for everything (scraping, database, ML, etc.).
Streamlit: For building the web front-end. I used Streamlit’s chat components to create the interactive UI so users can type questions and see bot replies in real-time.
LangChain: To orchestrate the RAG pipeline. LangChain provides utilities for splitting documents, creating vectorstores, and chaining LLM prompts. I used LangChain’s document loaders and embedding interfaces during indexing.
ChromaDB: The vector database for storing FAQ embeddings. Chroma is open-source and doesn’t require external credentials, so it was easy to embed into the project. We used it to quickly retrieve FAQ chunks similar to the query.
SQLite3: A lightweight relational database for product info. After scraping Flipkart, I dumped the product details (name, price, link, etc.) into a SQLite file. The sql.py module runs queries against this DB to fetch product recommendations. SQLite’s ease of use makes it ideal for small demo databases.
(Additional tools: BeautifulSoup/requests for scraping Flipkart, and OpenAI’s GPT via LangChain for LLM inference.)

Using this stack, the flow works like this: Streamlit collects the user’s text ➔ router.py classifies it ➔ either faq.py (RAG with Chroma + LLM) or sql.py (SQL lookup) is called ➔ the answer is sent back to Streamlit and displayed.

Architecture and Routing Logic

The overall architecture is modular. The main interface (main.py) is the Streamlit app. It handles user input and displays chat messages. When the user enters a query, main.py passes it to the router (router.py). The router’s logic is quite simple: for example, it might check if the question contains product-related keywords (“price”, “under”, “category”) or matches a regex for product queries. If so, it treats it as a catalog/navigation request; otherwise, it assumes it’s an informational FAQ. (Conceptually this follows LangChain’s idea of a router chain, which uses the output of one step to decide the next.)

If it’s a product query, router.py calls a function in sql.py. That module connects to the SQLite DB and runs a SQL query like SELECT * FROM products WHERE category=? AND price<?. The results (product names, prices, maybe image URLs) are formatted into a user-friendly response. For example, if you ask “show me laptops under 50000”, the bot might list matching laptops with links. This is all deterministic SQL logic.
If it’s an FAQ, router.py calls the RAG pipeline in faq.py. In faq.py, I use LangChain to load the Chroma vectorstore containing the FAQ embeddings. The module runs a similarity search with the user question to retrieve, say, the top 3 relevant FAQ snippets. These snippets are then combined with the user’s question as context and sent to the LLM (via LangChain’s chain/run interface). The LLM produces a concise answer that’s grounded in the retrieved text. This two-step process (retrieve then generate) matches the RAG approach.

In practice, the routing logic ensures smooth user experience: I don’t have to rely on the LLM to figure out navigation or filtering intent. By preprocessing the question type, the bot avoids irrelevant retrievals or missing out on database queries. This hybrid design blends the best of both worlds – structured product lookups and flexible FAQ answering. A simplified diagram (not shown here) would depict main.py at the top calling either the SQL flow or the RAG flow and then returning the output.

Key Files and Code Structure

The code is organized into separate modules, each with a clear role:

main.py: The Streamlit entry point. In my code, I set up the chat interface here (text input, chat history). This file imports the router and handles new user messages: on each submit, main.py calls the router with the query and then displays the returned answer. It also manages Streamlit session state so that the chat history persists. Essentially, main.py ties together the UI and backend logic.
faq.py: The FAQ/RAG workflow. This module handles all retrieval-augmented generation. It initializes (or loads) the Chroma vectorstore that was built from the FAQ document. When given a question, it runs a vector similarity search to get relevant FAQ chunks. It then constructs a prompt (using a template) that includes the question and those chunks as context, and calls the LLM (via LangChain) to generate an answer. Any prompt engineering (instructions to the model) happens here. Finally, it returns the LLM’s answer to main.py.
sql.py: The product database query module. It opens a connection to the SQLite file (containing data scraped from Flipkart) and defines functions to query it. For example, there are functions to search by keyword, filter by category, or retrieve details of a specific product ID. When the router determines the user wants product info, it invokes one of these functions with parameters (extracted from the user query). The results (often a few product rows) are formatted into text (or a structured list) and returned.
router.py: The query router. This file contains logic to decide the question type. In my implementation, it examines the text for certain keywords or patterns. If the query looks like a product search (e.g. it mentions a number, a price range, or product names), it routes to sql.py; otherwise it defaults to faq.py. One could also implement this as a LangChain RouterChain (classifier chain) – for instance, using a small LLM call to label the intent. Either way, router.py returns the output from the chosen module so main.py can display it.

By separating concerns into these files, the code stays clean and maintainable. Each file has a single responsibility, and the first-person development flow is clear (I “hand off” the user query from the UI to the router, then to the appropriate service).

At this stage, the core chatbot functionality is in place: it can answer FAQs and run product searches dynamically. Future work might involve deploying the app or optimizing the chains, but the design and implementation above cover the main technical challenges and architecture without diving into deployment details.

Ecommerce Chatbot : Building a RAG-Powered LLM

Project Overview

Introduction to RAG

Tech Stack

Architecture and Routing Logic

Key Files and Code Structure

Submit a Comment Cancel reply

Recent Posts

Recent Comments