Step 1: Import Necessary Libraries and Set Up Your Environment
The first step in building a RAG-enhanced chatbot involves importing the required libraries and setting up your development environment.
For this example, we will be using Python and Google Colab, which provides a free and accessible environment for running Python code.
First, we need to make sure the required libraries are installed. This is done via the following code:
!pip install -q OpenAI tenacity pandas
This code does the following:
- Install required libraries: Installs OpenAI, Tenacity, and Pandas, which are essential for building our chatbot.
These libraries perform the following roles:
- OpenAI: Provides access to powerful language models like GPT-3.5 Turbo.
- Tenacity: Simplifies adding retry logic to API calls, ensuring robust performance.
- Pandas: Enables data manipulation and analysis, particularly for handling CSV files.
Next, the libraries need to be imported and this is done by:
import openai
import os
import json
import pandas as pd
from tenacity import retry, wait_random_exponential, stop_after_attempt
Make sure you run these code blocks as they are or they will not run.
Step 2: Extract Data from Google Drive
As was Mentioned previously, the RAG technique uses a local data file to help with answering the user's queries. So the next step would be extract the data from Google Drive.
This involves the following code:
from google.colab import drive
drive.mount('/content/drive')
After you run this, Google Collab will promt you to give it permissions to access your files. Make sure to do so as otherwise you won't be able to continue with the process.
And these are the rest of the code blocks that you need to add so we can move to extracting the data
os.chdir('/content/drive/MyDrive/Upgrade/ShopAssist_Data')
!ls
By mounting google drive, you tell the engine that your dataset should be uploaded to "ShopAssist_Data". This ensures that all the file can be readily uploaded.
Step 3: Data transformation
After extracting data, a bit of cleaning is required to make sure all the values are cleaned from units, special chars and have a consistent format.
def clean_data(df):
"""Extract numeric values from columns with units (e.g., "8GB" -> 8)."""
df["RAM Size"] = df["RAM Size"].str.extract(r'(\d+)').astype(float)
df["Clock Speed"] = df["Clock Speed"].str.extract(r'([\d\.]+)').astype(float)
df["Laptop Weight"] = df["Laptop Weight"].str.extract(r'([\d\.]+)').astype(float)
df["Price"] = df["Price"].str.replace(',', '').astype(float)
return df
df = clean_data(df)
df.head()
Step 4: Connect to OpenAI using key
This is the core feature that makes the chatbot work. The OpenAi API is where most of the magic happens. The first step is connecting your code to it:
# Read the OpenAI API Key
openai.api_key = open("OpenAI_API_Key.txt", "r", encoding="utf-8").read().strip()
os.environ["OPENAI_API_KEY"] = openai.api_key
client = openai
Make sure the OpenAI key is in your drive for this code to work, as otherwise you will get a lot of errors.
Step 5: Create the LLM function
Now, the LLM function will help make all our processing faster and more efficient. Here is the sample code:
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
def get_chat_response(conversation):
"""Send conversation to OpenAI API and retrieve chatbot response."""
response = openai.chat.completions.create(
model="gpt-3.5-turbo",
messages=conversation,
response_format={"type": "json_object"},
seed=1234,
n=5 # Number of responses to generate
)
return response.choices[0].message.content # Extract response correctly
The LLM will help with extracting relevant information from our local data and providing a reply based on the prompt provided by the user.
Step 6: Code function to extract user preferences from user input
There are lots of different fields like, the type of memory, desired model, the CPU model, clock speed, the desired laptop brand, and so on. Coding a function to find the Prompt user will make your code more effective at determining an ideal choice for your client. To do that, we add code to make sure our bot know the proper context, that way, you can know which type of responses to provide
def extract_user_preferences(user_message):
"""Extracts user intent (GPU, RAM, budget, display quality, etc.) from user input."""
system_prompt = f"""Extract the following preferences from the user input and return them in a valid JSON format:
- GPU Model (e.g., NVIDIA RTX 3070, GTX 1650, AMD Radeon RX 6800)
- Display Type (e.g., OLED, IPS, LCD)
- RAM Size (numeric value in GB)
- Storage Type (SSD or HDD, optional)
- Budget (numeric value in USD)
If any preference is missing, return an empty string for that key."""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}
]
extracted_preferences = get_chat_response(messages)
try:
preferences = json.loads(extracted_preferences)
except json.JSONDecodeError:
preferences = {}
return preferences
Step 7: Coding the actual chatbot functionality
We are almost done, here is where the magic happens and the chatbot gets its personality
def recommend_laptops(user_preferences, df):
"""Finds the best laptop matches based on user preferences."""
if not user_preferences:
return "Unable to extract user preferences. Please provide more details."
def map_category(user_pref, mapping, default):
if user_pref and user_pref.lower() in mapping:
return mapping[user_pref.lower()]
return default
# Define a mapping for categorical values to numerical equivalents
speed_mapping = {"low": 1.5, "medium": 2.5, "high": 3.5} # GHz
ram_mapping = {"low": 4, "medium": 8, "high": 16} # RAM in GB
weight_mapping = {"light": 1.0, "medium": 2.0, "heavy": 3.0} # Laptop weight in kg
df["Price"] = df["Price"].astype(str).str.replace(',', '').astype(float)
# Ensure numeric values for comparison (handling missing values)
df["Clock Speed"] = pd.to_numeric(df["Clock Speed"], errors='coerce')
df["RAM Size"] = pd.to_numeric(df["RAM Size"], errors='coerce')
df["Laptop Weight"] = pd.to_numeric(df["Laptop Weight"], errors='coerce')
# Convert user inputs to numeric values (handling text categories)
budget = float(user_preferences.get('Budget', 25000))
clock_speed = map_category(user_preferences.get("Processing speed"), speed_mapping, 1.5)
ram_size = map_category(user_preferences.get("RAM"), ram_mapping, 8)
weight = map_category(user_preferences.get("Portability"), weight_mapping, 2.0)
filtered_df = df[ (df["Graphics Processor"].str.lower().str.contains(user_preferences.get('GPU intensity', '').lower(), na=False)) &
(df["Display Type"].str.lower().str.contains(user_preferences.get('Display quality', '').lower(), na=False)) &
(df["Laptop Weight"] <= weight) &
(df["RAM Size"] >= ram_size) &
(df["Clock Speed"] >= clock_speed) &
(df["Price"] <= budget)
]
if filtered_df.empty:
return "No laptops match your preferences. Try adjusting your filters."
return filtered_df.head().to_string()
Step 8: Finalize function for better responses
The goal of this step is to extract information in a human readable format:
def format_chatbot_response(preferences, recommendations):
"""Formats chatbot output for better readability."""
# Fix Budget Display (Ensure Correct Format)
budget = f"{preferences.get('Budget', 'N/A'):,.0f}" # Adds comma formatting (e.g., 2,000)
# Display Extracted Preferences in a Readable Format
formatted_preferences = f"""
Your Preferences:
- GPU Model: {preferences.get('GPU Model', 'Not Specified')}
- Display Type: {preferences.get('Display Type', 'Not Specified')}
- RAM Size: {preferences.get('RAM Size', 'Not Specified')} GB
- Storage Type: {preferences.get('Storage Type', 'Not Specified')}
- Budget: ${budget}
"""
# If no recommendations, return a user-friendly message
if recommendations == "No laptops match your preferences. Try adjusting your filters.":
return formatted_preferences + f"
{recommendations}"
# Display Recommendations in a Structured Format
formatted_recommendations = """Based on your preferences, I recommend:
"""
for index, row in recommendations.iterrows():
formatted_recommendations += f"""
**{row['Brand']} {row['Model Name']}**
- Processor: {row['Core']} {row['CPU Manufacturer']} @ {row['Clock Speed']} GHz
- Display: {row['Display Type']} ({row['Display Size']}")
- Graphics: {row['Graphics Processor']}
- Battery Life: {row['Average Battery Life']} hours
- Weight: {row['Laptop Weight']} kg
- Storage: {row['Storage Type']}
- Special Features: {row['Special Features']}
- Price: ${row['Price']:,.2f}
- Description: {row['Description'][:100]}...
"""
return formatted_preferences + formatted_recommendations