Fine-tuning an OpenAI model - AI Advisor integrated with custom data

After exploring what RAG brought to the table for our custom chatbots, I decided to take a detour into the world of fine-tuning and the Mixture of Experts approach. Why? Because simplicity is key. Let’s dive into how we can train our custom “budgeting expert” and what we can get from this approach.

Link to RAG model - AI Advisor integrated with custom data.

Index

Project definition
1.1 Purpose and Scope
1.2 Background and Objectives
1.3 Product Description
1.3.2 Key Features
Solution
2.1 Scope phasing
2.2 Procedure
Results
Conclusions
Recommendation
APPENDIX 1: Comparative Analysis Table RAG vs. Fine-Tuned Model

1. Project definition

1.1 Purpose and Scope

The primary purpose of this project is to enhance the quality of information provided by our chatbot, focusing on creating a less complex yet effective solution. Initially, a Retrieval Augmented Generation (RAG) model was developed, but its escalating complexity led to exploring the Mixture of Experts (MoE) approach. The scope encompasses the development and fine-tuning of a specialized Large Language Model (LLM) within the MoE structure, such as a “Budgeting Expert.” This single expert model will serve as a test case to evaluate the MoE approach’s effectiveness in delivering high-quality, accurate responses compared to the previous RAG model and the generalist base model. The project aims to achieve a balance between system simplicity and information quality, aiming to enhance the chatbot’s performance with a more manageable and efficient model architecture.

Hypothesis: the creation of topic-specific experts within a MoE will enhance data quality in chatbot interactions. This approach is expected to provide accurate responses tailored to user requirements while maintaining a stable level of system complexity, even as more experts are added for diverse topics.

1.2 Background and Objectives

Problem Statement: The primary objective is to elevate the quality of client interactions with our chatbot by ensuring the provision of accurate, reliable, and data-driven responses. At present, a major impediment to achieving this goal is the AI’s propensity for ‘hallucinating’ or generating incorrect information. In our initial approach, we developed a RAG model to mitigate this issue. Nevertheless, as the project scaled up to accommodate a wide range of data types, encompassing both structured and unstructured forms across various topics, the RAG model’s complexity escalated significantly. This complexity is further amplified when the RAG model is tasked with handling user queries differently each time, depending on the type of document retrieved. For instance, a query might be addressed one way when pulling information from structured data such as CSV, and another way when referring to unstructured text. This dynamic approach adds layers of complexity to the RAG model as it needs to be adept at processing and responding to varied data types in contextually relevant ways. In light of these challenges, we are proposing a strategy: fine-tuning a LLM across specific knowledge domains. Additionally, we intend to implement an intelligent question-routing algorithm called orchestrator, which will discern the subject matter of each query and steer it towards the most suitable, specialized LLM. This method, known as the MoE approach, is designed to enhance the precision and reliability of the chatbot’s responses, ultimately fostering greater client trust and satisfaction.
Note: Additionally, we can maintain the utilization of the RAG model by integrating it as one of the experts within the MoE.
Objectives:
- To enhance the quality of the chatbot’s responses by fine-tuning a single expert model within the MoE.
- To conduct a comparative analysis of the fine-tuned expert model, the original base model, and the RAG model, in order to validate the effectiveness of the MoE approach.

1.3 Product Description

Our product features a cutting-edge chatbot system, utilizing a MoE. This innovative approach marks a significant departure from the RAG method. The MoE excels in delivering customized and adaptable solutions for chatbot knowledge requirements. It leverages specialized LLMs across diverse knowledge domains, ensuring precise and relevant responses to user inquiries.

1.3.2 Key Features

Specialized Expert Modules: The system comprises fine-tuned LLMs, each an expert in a specific domain like personal finance, budgeting or student loans.
Scalability and Flexibility: Designed to be scalable, the system can easily accommodate additional expert modules as new needs emerge.

2. Solution

Our strategy involves refining a specialized LLM to function as an expert model, thereby enhancing the response quality of our chatbot. We plan to undertake a thorough comparative analysis, measuring the performance of this fine-tuned expert model against both the current base model and the RAG. This targeted method is designed to directly boost both the accuracy and reliability of the chatbot’s responses. In this section, we will detail our phased approach and the specific procedures for implementing our proposed solution.

2.1 Scope phasing

This project encompasses a series of systematic steps aimed at enhancing the chatbot’s performance through a specialized expert model within the MoE. These steps are designed to ensure the development, integration, and evaluation of the model are executed with precision and effectiveness. The process includes:

Dataset Creation
Dataset Preparation
Model Fine-Tuning
Model Integration
Comparison of Models

2.2 Procedure

To validate our hypothesis, we plan to fine-tune a LLM on a specific topic, create an expert model, and then compare its responses with those of the base model. Specifically, we intend to specialize the LLM in the area of budgeting, thus developing what we call the “Budgeting Expert”. This approach is clarified in the following diagram, which details the structure of the MoE. This structure underpins the customized architecture of our model, illustrating how the different modules within the LLM are optimized to handle different aspects from financial analysis to more complex analyses.

The procedure involves the following steps:

Dataset Creation: We select a specific webpage for scraping to generate a dataset consisting of question-and-answer pairs. Leveraging the scraper developed for the RAG model, we construct a dataset in CSV format containing 1,000 such pairs. To create the questions and answers for this dataset, we implement a logic that utilizes OpenAI. Firstly, OpenAI is employed to generate questions about the content obtained from the scraping process. Subsequently, these questions are fed back into OpenAI, which then generates the corresponding answers. This approach ensures a dynamic and relevant set of question-and-answer pairs based on the scraped content.
Dataset Preparation: OpenAI requires a specific format for training data, namely a JSON file where each entry comprises a system prompt, the user question, and the assistant’s response.
Model Fine-Tuning: We fine-tune an OpenAI model, specifically the gpt-3.5-turbo-1106, utilizing OpenAI’s APIs. The fine-tuning process for 1,000 data points takes approximately two hours, resulting in the fine-tuned model identified as ‘ft:gpt-3.5-turbo-1106:personal::8hpLGpEs’.
Model Integration: To deploy the new model, we replace the reference in our system. Instead of using ‘gpt-4-1106-preview’, we now integrate ‘ft:gpt-3.5-turbo-1106:personal::8hpLGpEs’.

3. Results

To obtain the desired results, we prepared the RAG model by uploading the content scraped from a webpage. For the RAG and the base model, we utilized the “gpt-4-1106-preview” version of OpenAI’s model. Our focus shifted solely to questions related to the content of the webpage, specifically on budgeting. For the fine-tuning model, we only used the content from the same webpage that was utilized for the RAG model. This fine-tuned model was based on the “gpt-3.5-turbo-1106” version of OpenAI’s model. Our comparative analysis then included three variants: the base model, the RAG model, and the specifically fine-tuned model. By adopting this approach, we were able to assess and compare the performance and outputs of each model variant. Accompanying this explanation, we have included a table comparing the answers between the base model, the RAG, and the fine-tuned model. Additionally, the table documents the time each model took to generate each answer. It’s important to note that the response time is subjective as it depends on factors like internet speed, the state of OpenAI’s servers, and the length of the generated answer. However, the key aspect of this addition is to observe the tendency of each model in terms of response time, providing further insight into their operational efficiency.

Note: In this analysis, we only fine-tuned one model of the MoE. Therefore, the documented time reflects only the time taken for the expert model to generate an answer. In future studies, with the implementation of the orchestrator, it will be necessary to include the time taken by the orchestrator to select the appropriate expert model to respond to a user’s question. This will add another layer to our understanding of the operational efficiency of these models, particularly in a multi-expert system.

Base model	RAG	Fine-tuned
give me the 5 Budgeting Misconceptions You Should Stop Believing
Income Too Low to Budget: Every income level can benefit from a budget; it's not just for high earners. Budgeting Restricts Freedom: A budget actually gives financial freedom by identifying where money can be saved or spent. Budgeting Is Time-Consuming: Modern tools and apps can simplify the process, making it quick and efficient. Only for the Financially Struggling: Budgeting is a tool for everyone to reach financial goals faster, regardless of their economic status. Budget Equals Deprivation: It's about smarter spending, not cutting out all enjoyment. For personalized budgeting advice, consider speaking with a financial advisor. Thanks!	It seems you're looking for advice on budgeting misconceptions. Here are five common misconceptions you should be aware of: Budgets are restrictive: They actually provide financial freedom by helping you control spending. You don't need a budget if you make a lot of money: High earnings still require management to avoid unnecessary debt. Budgeting is too time-consuming: Modern tools and apps simplify and expedite the budgeting process. A budget is a one-time setup: Budgets need regular updates as your financial situation changes. I can't have fun if I'm budgeting: Budgeting includes setting aside money for entertainment too. If you encounter obstacles or want to learn more effective budgeting strategies, consider meeting with a financial advisor.	The 5 Budgeting Misconceptions You Should Stop Believing are: Creating a budget is difficult. Budgets are time-consuming. Why do I need a budget when I can work it all out in my head? Unexpected expenses pop up all the time, so I can't budget for them. I make a high salary, so I don't need a budget.
Time 13s	Time 39s	Time 7s
some misconceptions about budgeting
Budgeting is too restrictive: Some believe it limits the freedom to spend on what you enjoy. It's only for people struggling financially: Many think that budgeting isn't necessary for individuals with a comfortable income. It's time-consuming: The belief is that budgeting requires a lot of time and detailed tracking. You can't budget with irregular income: Those with fluctuating incomes often think they can't effectively budget. A single missed target derails the budget: People sometimes feel that if they exceed their budget once, the whole budget is ruined. Actionable Steps: Create flexible budget categories to allow for personal spending. Regardless of income, set financial goals to benefit from budgeting. Simplify the process with budgeting apps to save time. Adjust budget monthly to account for income changes. Expect fluctuations and adjust your budget as needed to stay on track.	Here are some common misconceptions about budgeting that may hinder financial progress: Budgeting is time-consuming: Initially setting up a budget requires some time, but once established, it's quick to maintain. You must be good at math: Budgeting is more about tracking than complex calculations, and many apps can help. Budgets are restrictive: They're meant to give you control, not limit you, helping prevent overspending. Only people with debt need budgets: Budgets are beneficial for everyone, as they help in building savings and managing finances. Budgeting prevents fun: You can allocate funds for entertainment and leisure, ensuring you enjoy life while being financially responsible. Actionable step: Start by listing your income and expenses to overcome the misconception that budgeting is complex or unnecessary. Use budgeting tools to simplify the process.	One of the misconceptions about budgeting is that it's difficult to predict unexpected expenses, however, a portion of a budget can be set aside for such occasions. Another misconception is that creating a budget is time-consuming, but it can be done quickly after initially organizing finances. Lastly, people may believe that the process of budgeting is complicated, but simple applications are available to assist in creating and following a budget.
Time 27s	Time 28s	Time 5s
What is budgeting?
Budgeting refers to: Creating a plan for how to spend your money. Tracking your income and expenses. Ensuring you always have enough for necessities and savings. Helping you avoid overspending and debt accumulation. Actionable Steps: Determine your income. List your monthly expenses. Set savings and debt repayment goals. Adjust expenses to align with your financial goals. Monitor and update your budget regularly.	Budgeting is the process of creating a plan to manage your money, which allows you to: Determine in advance whether you will have enough money to do the things you need to do or would like to do. Itemize your income and expenses, prioritizing spending and focusing on financial goals. Save for emergencies and plan for future expenses like retirement or buying a home. Stay out of unnecessary debt by ensuring you are not spending more than you earn. Actionable step: Start by listing all sources of income. Categorize and list your monthly expenses. Allocate funds to each category based on priority and goals. Track and review your spending to stay within your budget.	Budgeting is the process of creating a plan for your income and expenses, helping you know how much money is coming in and going out of your accounts each month. It involves itemizing your expected expenses and comparing this against your actual income to determine the best ways to save and to adjust your spending.
Time 9s	Time 32s	Time 5s
good reasons to implement a budget
Financial clarity: Understanding where money goes each month. Savings promotion: Encouraging regular saving habits. Debt reduction: Prioritizing and managing repayments more effectively. Spending control: Curb unnecessary expenditure. Goal alignment: Align spending habits with financial goals. Actionable Steps: Start tracking your income and expenses. Set clear financial goals. Create a budget plan to match your goals. Review and adjust regularly. Consider using budgeting tools or apps for assistance.	Good reasons to implement a budget include: Debt Management: Allocate funds to pay off debts, potentially saving on interest. Savings Growth: Systematically save for retirement, emergencies, or specific purchases. Expense Tracking: Gain clarity on spending habits and identify areas for cost reduction. Financial Goals: Sets a clear path to achieving financial goals like buying a home or vacationing. Peace of Mind: Reduce financial anxiety knowing expenses are covered. Actionable step: List Income/Expenses: Organize your finances by listing all sources of income and expenses. Set Financial Goals: Clearly define what you're aiming to achieve with your budget. Track and Adjust: Regularly review your budget, track your progress, and make necessary adjustments. Consider Tools/Apps: Use budgeting tools or apps for easier management and tracking.	Implementing a budget is important for many reasons, including ensuring that you spend less than you earn, allowing you to create an emergency savings fund, making large purchases such as a home or a car, and focusing on your financial goals to avoid additional debt. Budgeting can also help you better understand and manage your spending patterns and save money for future security or retirement.
Time 17s	Time 43s	Time 5s
How do I budget?
To start budgeting effectively: Assess Income: Determine your monthly income after taxes. Track Expenses: List all monthly expenses, fixed and variable. Set Goals: Decide on your financial goals, e.g., saving or debt repayment. Plan Budget: Allocate funds to necessities, savings, debts, then discretionary spending. Monitor Spending: Regularly check your spending against your budget. Adjust as Needed: Update your budget to reflect changes in income or expenses. For a tailored budget plan, consider consulting with a financial advisor.	To create a budget, follow these steps: List income and expenses: Document all sources of income and every expense, fixed and variable. Establish priorities: Determine essential costs (needs) and nonessential spending (wants). Set financial goals: Identify short-term and long-term goals to guide your budgeting framework. Find savings opportunities: Analyze your regular expenses to identify areas where you can reduce spending. Adjust accordingly: Monitor your spending and compare it to your budgeted amounts, adjusting as needed to stay on track. Use tools: Consider budgeting apps or spreadsheets to keep everything organized. Actionable step: Start by listing down your monthly income and expenses in a spreadsheet or a budgeting app to visualize your current financial situation.	To create a successful budget, you need to outline your financial goals, set a timeline for them, and define which expenses are essential and which are not. This involves a basic, intermediate, or advanced method, using a 50/30/20 rule, zero-based budgeting, a pay-yourself-first strategy, or envelope budgeting, depending on which suits your financial situation and preferences. Do you have any expenses that aren't covering your costs?
Time 13s	Time 31s	Time 4s

4. Conclusions

Enhancement of Chatbot Response Quality: The fine-tuning of a single expert model enhanced the quality of the chatbot’s responses, particularly in the domain of budgeting, aligning with the first objective.
Comparative Analysis: The comparative analysis of the fine-tuned expert model against the original base model and the RAG model demonstrated the MoE approach’s effectiveness. This aligns with the second objective of validating the MoE approach through comparative analysis.
Domain-Specific Knowledge Integration: The fine-tuned expert model displayed a notable alignment with the budgeting content from the selected website, indicating successful integration of domain-specific knowledge.
Time Efficiency in Response Generation: the base model outpaces the RAG model in response time due to the latter’s complex information retrieval from vectorDB. However, the fine-tuned model, GPT-3.5-turbo-1106, surpasses even the base model’s speed, which uses GPT-4-1106-preview, showcasing efficiency improvements through fine-tuning.

5. Recommendation

Based on the conclusions, the following recommendations are proposed:

Dataset Quality Improvement: Refine the dataset used for fine-tuning by removing unnecessary textual elements like “according to the content provided” to streamline and enhance the quality of the responses.
User Engagement Enhancement: Incorporate calls to action in the chatbot’s dataset, such as encouraging users to consult EarnUp services. This addition will enable the chatbot to include these prompts in its responses, potentially increasing user engagement and providing more directed assistance.

APPENDIX 1: Comparative Analysis Table RAG vs. Fine-Tuned Model

Aspect	RAG	Fine-tuning Model
Complexity	Moderate (Requires coding and architectural skills)	High (Demands deep understanding of deep learning, NLP, expertise in data preprocessing, model configuration, and evaluation)
Accuracy	Variable (Depends on domain and task)	High (Enhances domain-specific understanding and predictions)
Domain Specificity	Moderate (May not capture domain-specific patterns as effectively)	High (Can impart domain-specific terminology and nuances)
Up-to-date Responses	High (Ensures updated responses via external documents)	Low (Becomes a fixed snapshot of its training dataset, requires regular retraining for evolving data)
Avoidance of Hallucinations	High (Reduces hallucinations by anchoring in retrieved documents)	Moderate (Reduces hallucinations in domain-specific data but unfamiliar queries may still cause errors)
Cost Considerations	Requires compute power for embedding models and vector databases	Involves significant compute power and costs for training, data acquisition, and maintenance
Dataset Importance	Not specified in the sources	Critical, as the quality and relevance of the dataset significantly impact the effectiveness of fine-tuning
Data Dynamics	Excels in dynamic data environments by continuously querying external sources for up-to-date information.	Models become static snapshots of data and may become outdated in dynamic scenarios.
Model Customization	Focuses on information retrieval but may not adapt linguistic style or domain-specificity based on retrieved info.	Allows adaptation of LLM’s behavior, writing style, and domain knowledge to specific nuances and terminologies.
Suitability for Multiple Tasks	Not specifically designed for task switching.	Efficiency in task switching is untested for LoRA in situations requiring fine-tuning on multiple tasks sequentially.