WP Tutorials

Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer

Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer

Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer



Learn how to implement RAG (Retrieval Augmented Generation) from scratch, straight from a LangChain software engineer. This Python course teaches you how to use RAG to combine your own custom data with the power of Large Language Models (LLMs).

💻 Code: https://github.com/langchain-ai/rag-from-scratch

If you’re completely new to LangChain and want to learn about some fundamentals, check out our guide for beginners: https://www.freecodecamp.org/news/beginners-guide-to-langchain/

✏️ Course created by Lance Martin, PhD.
Lance on X: https://twitter.com/rlancemartin

⭐️ Course Contents ⭐️
⌨️ (0:00:00) Overview
⌨️ (0:05:53) Indexing
⌨️ (0:10:40) Retrieval
⌨️ (0:15:52) Generation
⌨️ (0:22:14) Query Translation (Multi-Query)
⌨️ (0:28:20) Query Translation (RAG Fusion)
⌨️ (0:33:57) Query Translation (Decomposition)
⌨️ (0:40:31) Query Translation (Step Back)
⌨️ (0:47:24) Query Translation (HyDE)
⌨️ (0:52:07) Routing
⌨️ (0:59:08) Query Construction
⌨️ (1:05:05) Indexing (Multi Representation)
⌨️ (1:11:39) Indexing (RAPTOR)
⌨️ (1:19:19) Indexing (ColBERT)
⌨️ (1:26:32) CRAG
⌨️ (1:44:09) Adaptive RAG
⌨️ (2:12:02) The future of RAG

🎉 Thanks to our Champion and Sponsor supporters:
👾 davthecoder
👾 jedi-or-sith
👾 南宮千影
👾 Agustín Kussrow
👾 Nattira Maneerat
👾 Heather Wcislo
👾 Serhiy Kalinets
👾 Justin Hual
👾 Otis Morgan
👾 Oscar Rahnama

Learn to code for free and get a developer job: https://www.freecodecamp.org

Read hundreds of articles on programming: https://freecodecamp.org/news

source

Comments (44)

  1. Thanks for your tutorial.

  2. Thank you for the video. This is way better than the vast majority of paid courses.

  3. what's the diagram tool used? anyone knows?

  4. I want to ask. Sometimes, you passed the docs as the context, but sometimes you passed the retriever to the context. Is there any reason or when I should pass the retriever instead of docs itself ? Thanks!

  5. Taylor Sharon Robinson Donald Lopez Barbara

  6. Hey, thanks for the helpful video. One minor thing that I think it can make the tutorial more useful is to use laser pointer or cursor to point to what you are talking about on the slide. I kind of couldn't keep track of what you are talking about while watching the video.

  7. Thanks for the amazing content. Is there any video where he discusses memory in rag. I try to give a summary of the previous query and response along with the new query to limit token usage. As the chat continues the LLM hallucinates. I believe there might be a better way to manage old user queries and a way to "forget" by analyzing whether it is a new topic or not . Please let me know 🙂

  8. Thank you for this video.

    I have spent two weeks going through the video and coding along. Its really amazing.

  9. THis helped a lot .

  10. Where is the video in which he explains how to layout those basic RAG pipelines?

  11. RAG-Fusion uses the RRF to calculate a fused score of retrieved documents that were previously ranked by their respective retrievers. What I don't understand is that here only one retriever is being used and it's responsible for retrieving a list of documents per query, so in such a scenario, query1 might lead to fetch doc1, doc2 doc3, and so on. Yes, it's quite possible that doc1 is ranked 1 for query1 and ranked 2 for query2. But even in such a case, it's only based on a single retriever's ranking. Then, isn't it redundant to apply RRF on top of it?

  12. The hallucination testing part doesn't work very well

  13. The accent is so hard to catch up

  14. Never stop improving is enough for successful life 🙏

  15. dang, there still isn't a way to connect Ollama instead of an openai key at time of writing. if someone has figured out before this has been made very available and you see this- please enlighten us! <3

  16. Can this be done without LangChain?

  17. Does anyone know why I get a RateLimitError when I try using my OpenAI API Key with my paid ChatGPT account? It says: "You exceeded your current quota, please check your plan and billing details" but I have never done anything with this API Key before. Does anyone know how to fix this? Alternatively, does anyone know how to perform RAG without this? Please help!

  18. Thanks a lot! Really helpful!

  19. love from India, keep doing the great work, Lance <3

  20. so much details – i had to watch to twice to get to understand it just wow

  21. 🎯 Key points for quick navigation:

    00:00 📚 Introduction to RAG by Lance Martin, a LangChain engineer.
    00:14 💡 Explanation of how RAG combines custom data with LLMs.
    00:28 🔍 Motivation: Most data is private, but LLMs are trained on public data.
    01:08 🗃️ Context windows in LLMs are growing, allowing more private data to be input.
    01:48 ⚙️ Overview of RAG: Indexing, retrieval, and generation stages.
    02:54 📊 RAG unites LLMs' processing with large-scale private data.
    03:24 🧠 Breakdown of RAG components: Query translation, routing, construction, and more.
    04:46 ⭐ Methods for document retrieval and reranking in RAG.
    05:55 💾 Indexing external documents and converting them to numerical representations.
    08:25 🧩 Splitting documents for embedding due to context window limits.
    10:00 🖥️ Computing fixed-length vectors for documents using embeddings.
    12:45 🔍 Using k-nearest neighbors to find similar documents.
    15:59 📝 Generating answers based on retrieved documents in RAG.
    17:07 📝 Prompt templates for generating answers in LLMs.
    19:02 🔗 Combining prompts, LLMs, and retrievers into chains.
    22:14 🚀 Introduction to advanced query translation in RAG.
    23:07 ✔️ Importance of rewriting queries for effective retrieval.
    24:05 🌐 Multi-query approach: Rewriting questions from different perspectives.
    25:38 🚀 Indexed a blog post on agents in a vector store.
    26:19 🔍 Split question into sub-questions and retrieve relevant documents.
    28:08 🔧 Used LangSmith to trace intermediate and final steps.
    30:42 🗂️ Built a consolidated list from multiple retrievals.
    35:02 🧩 Discussed sub-question decomposition retrieval.
    36:23 🔄 Combined answers to iterative sub-questions for final answer.
    38:18 🔗 Connected question-answer pairs sequentially in prompts.
    41:02 📚 Stepback prompting for generating more abstract questions.
    43:02 🪜 Generated more generic questions to enhance context for retrieval.
    44:45 🔄 Retrieval performed on both original and stepback questions.
    48:50 🌐 HYDE involves converting questions into hypothetical documents for better alignment with document embeddings.
    49:43 🔎 Generated hypothetical documents based on questions for more effective retrieval.
    51:15 📝 Hypothetical Document: Demonstrated hypothetical document generation and retrieval process.
    51:44 🌟 Performance: Using hypothetical document generation can improve retrieval performance.
    52:13 🚦 Routing: Involves translating a query and routing it to appropriate data sources.
    53:48 🔍 Semantic Routing: Embeds and compares questions to prompts for routing.
    56:08 🔗 Routing Mechanism: Connects the intended data source to specific retriever chains.
    58:11 🚀 Semantic Routing Example: Demonstrates choosing a prompt based on semantic similarity.
    59:47 💬 Query Construction: Transforms natural language queries to structured queries for metadata filters.
    01:00:15 🗓️ Example Query: Converts natural questions into structured queries with date filters and metadata.
    01:04:26 📚 Query Optimization: Optimizes retrieval by translating natural language into data-querying domain-specific languages.
    01:11:48 🗄️ Hierarchical Indexing: Raptor technique deals with questions needing detailed and broader information.
    01:12:57 🧩 Hierarchical indexing helps in retrieving more relevant document chunks by clustering and summarizing documents recursively.
    01:14:08 🤏 Summaries provide high-level semantic representations, while raw chunks offer detailed, document-specific insights.
    01:15:04 🧪 Comprehensive studies indicate that hierarchical indexing enhances semantic search by offering better coverage across different question types.
    01:17:19 📇 Process involved embedding, clustering, and recursive summarization to build a tree structure of document information.
    01:20:09 🛠️ Code demonstration included creating a vector store, embedding documents, clustering, summarizing, and managing tokens.
    01:22:22 🔍 CoBER method enhances semantic search by generating embeddings for every token and computing maximum similarities between questions and documents.
    01:24:57 🧑‍💻 RoBERTA library facilitates playing with CoBER, which showcases good performance but requires evaluating production readiness due to possible latency issues.
    01:26:40 🌐 CoBER demonstrated through LangChain retriever integration, offering an efficient and unique indexing approach.
    01:28:10 🗺️ Langraph released for building more complex state machines and diverse logical flows in RAG applications.
    01:33:05 🔍 Corrective RAG workflow improved retrieval by re-assessing document relevance and performing web searches for ambiguous results.
    01:35:06 🧩 Functions for state modification in Langraph illustrated how each state (node) in the flow modifies the document retrieval process.
    01:37:08 🔍 Logical filtering: Use a grading chain to mark documents as relevant or not and perform actions based on the results.
    01:37:32 🚦 Conditional routing: Based on the 'search' value, route the workflow to either transform the query for a web search or proceed to generate a response.
    01:39:13 📑 Document relevance check: Filter documents for relevance before transforming the query and performing a web search.
    01:39:55 🔄 Query transformation: Adjust the query based on information retrieved from a web search to improve relevance.
    01:40:52 📊 Detailed node inspection: Use tools like LangSmith to inspect each node's output to ensure the logical flow is correct.
    01:42:26 🚀 Moving from chains to flows: Transitioning from simple chains to complex flows offers cleaner and more sophisticated workflows.
    01:44:06 🔧 Flow engineering: Flow engineering with Lang graph is intuitive and allows for sophisticated logical reasoning workflows.
    01:45:03 🧩 Integrating ideas: Combining query analysis and adaptive flow engineering improves your RAG pipeline's efficiency.
    01:46:14 📚 Corrective workflows: Use unit tests to ensure smooth corrective workflows during model inference.
    01:48:34 💡 Command R: Uses Command R model with structured output, enabling binary yes/no responses for easier logical flow control.
    01:56:21 ⚙️ Binding functions to nodes: Bind each node in your graph to a specific function to handle different logical decisions and flows.
    01:58:24 🔄 If tool calls are not in the response, a fallback mechanism is triggered to choose the next data source.
    01:59:18 🔍 Different data sources (web search vs. Vector store) are used, and their outputs determine the subsequent nodes in the graph.
    02:00:25 🧾 Conditional edges in the graph handle logic such as document relevance and hallucination checks.
    02:01:05 📊 Functions are defined as nodes and edges in the graph, following a flow that matches a predefined diagram for logic.
    02:03:18 🗂️ The flow diagram for the graph aligns with the logic drawn out earlier, ensuring consistent data routing and processing.
    02:05:10 ⏱️ The implemented RAG system processes questions quickly, demonstrating efficient retrieval and generation handling.
    02:07:15 ⚡ Command R model shows rapid performance and effective handling of relevance, hallucination, and answer usefulness checks within the RAG system.
    02:08:55 🧠 Lang graph provides a reliable, less flexible solution compared to agents, suitable for defined flows and faster implementation.
    02:10:51 🧩 Agents offer more flexibility for open-ended workflows at the cost of reliability, especially when working with smaller LLMs.
    02:11:46 💻 Open-source models like Command R can be run locally, enabling fast inference and practical use for online applications.
    02:12:46 🔧 Practical implementation of RAG systems combines Lang graph with Command R for a fast, reliable solution adaptable for various workflows.
    02:17:09 📉 Tested GPT-4's ability to retrieve and reason over multiple facts within a large context window, showing degradation in performance as complexity and context length increase.
    02:18:27 🧩 Observations included the difficulty of retrieving facts placed at the beginning of a large context window, potentially due to a recency bias.
    02:19:10 🔄 Confirmed that adding reasoning tasks exacerbates retrieval difficulty, highlighting limits within LLMs without a retrieval augmentation system.
    02:19:52 🚩 Be skeptical of single-needle retrievals as they often oversimplify the retrieval problem.
    02:21:00 🎯 Focus on the retrieval of precise document chunks, but be cautious of over-engineering.
    02:22:48 🏗️ Consider document-centric RAG over chunking to simplify retrieval and reduce complexity.
    02:26:30 🧩 Clustering documents and summarizing clusters help to handle queries requiring multiple pieces of information.
    02:28:07 🔍 Use long-context embedding models to embed full documents effectively.
    02:31:33 🖥️ Using open-source models can make RAG systems more accessible and efficient, even on local machines.

    Made with HARPA AI

  22. Shouldn't we just pass the text from retrieved docs instead of the string representation of Document class?

  23. Tbh, this is not from scratch if you are using an heavily abstracted framework(Langchain). Its misleading.

  24. how to get langchain api key

  25. I don't understand how this precious course is available for free. So well explained that too with the code. I am so grateful for Lance and freeCodeCamp. Thank you guys, thank you so much.

  26. Great Video

  27. amazing content on RAG

  28. Kudos that is the best so far

  29. Thank you

  30. Isn't this process cost prohibitive? It seems like its taking an input that would cost 10 tokens, and generating an input that costs hundreds if not thousands of tokens.

    Perhaps Im misunderstanding the costs and underworkings of these LLMs.

  31. Hello there!

    This material is amazing. I would love to read the articles cited in the video. Could someone please list them for me? Thank you!

  32. really appreciate these kind of videos <3. also can somebody shed some light over all options that can be used in search_kwargs, i can find only 2 or 3 options on the internet ;-;

  33. may be not the best, explanation and not understandable easily

  34. Thanks for sharing <3

  35. Very, very good! Can you post the links in the comments? Not all are in the notebooks (e.g., 2 of 4 at the 1:21:38 mark).

  36. 🎯 Key points for quick navigation:

    02:21:13 🔄 RAG Evolution
    02:22:20 ❓ Reconsider Chunking
    02:23:42 📑 Document-Centric RAG
    02:25:20 🔄 Multi-rep Indexing
    02:26:30 📊 Utilize Raptor
    02:28:34 🔄 Beyond Single-Shot
    02:30:23 🧠 Enhance with Reasoning
    02:30:38 🎯 Out-of-Scope Queries

    Made with HARPA AI

  37. Is it just me or are there straight up always errors based on changed libs?
    Even tho when you simply try to execute their code.
    It's really frustrating working with langchain at this point.

  38. Lance uploaded this himself, this seems copy and paste no?

  39. Where can we find this video about Chunking?

  40. Great Stuff : Great explanations and implementations:
    My concern with this is the relegation of the language models abilitys and the return to coding these methods into a system instead of training a model to have these desired powers:

    i personally have found the level of output from rag of course is very high as well as the powerfull questioning strategys: but we can see with the prompting that these prompts are still telling the model its ROLE!: in stead we should be Commanding the model to provide the desired response and expect the model to follow th einstruction pathway we have set:
    Hence the layered prompting can still be given to a model to perform the whole process internally (in the mind) and we would expect the model to produce this set of processes before producing an output:

    But: all come with training: hence Langsmith/LangChain /Llama index etc :
    we can use these strategie despite being very slow as oftem multiple shots are made to a model before producing the output instead of using a oneshot prompt we are loading the prompt : we need to use these sessions with the rag to produce data ! ie examples of the chain of events and output at erach stage of these events : ie step by step method such as used here : an train the model on the method: using the dat we have produced:
    the fact we need such rags is that the model is not following the exact methodology internally as this is not the qway the model achives output: hence aloowing for the model to learn this as a repeatable method (in the outputs it needs to produce to come to the final output) :
    hence your training prompt could be to think step by step … and internally produce the steps required to solve the task in its thought space…. then solve the task in stages in the thought space as desired by the methtod being trained … then oriduce the final output :
    so it would recall content required in the thoughts and use the content to produce its results : hence by training its thoughts this way in fact the whole completion would be displayed in ciircustances where the same prompt is used … in in cases where a lesser prompt is used as you already must have experience , often there is a hidden prompt being used … the model chooses the context it requires from its examples : so it will even use any prompt it needs to fulfill a task: or simular task … so given a question it will choose the methodoolgy it needs from its trainined tasks:
    hence : methodolgy trained as you have with all of these different rags is the key to creating a model which can produce these outputs wihout a rag … enablihng for corpus dumping into the model : and even document recall (all documents which can fit in context window will be save in full, if the document is chunkewd then not so ! as each chunk is maped to an input !<< so we would need to train the model on chunks of a book and ask to recall chunks? this is not quite desireable as all books are not the bible so we would want to recall a whole document so we would need to recall all of the chucks for that book or manual!… hence chunking by the contents and not size: so each chunk is meaning ful !

    so for data productions and interoperablity with other knowldge bases , these tech niques are great ! but the model can store and recal verbatum if chunked correctly : as well as answer these same questions with the correct prompt and training :

    Chat gpt is actually a fraud : as it was using dialogFlow ! as a top layer and then using the model to produce the outputs: we need to undertand how untrained the model actually is… and the rag is what enables the mode to perform : as well as its intent detector which prompts the model to produce the correct output (pydantic ) then formats it for you !

    they also said that there model was over 1400 b parameters but would not let that loose on th epublic domain (as it is sold privatly!) …. hence designing your super nachine with a rag system and open sourced model is in truth greater than the model that they can offer you ! the open source models will prove to be ther most powerfull models over the public restrited cloud providers !! (look at mistral and thier fear and heavy gurard railing which even chat gpt does not in trith have (the do but less))..

    often its best to make the mistake first and say sorrys! <<< american perspective:
    Follow the rule they are there for a reason ! <<<< european perspective!

  41. This felt like a semester condensed in a few hours. This dude reads a lot. I learned about so many of interesting things.

  42. Are there any evaluation sets for RAG? If one gets a new RAG method, how can they compare it to other methods out there?

Leave your thought here

Your email address will not be published. Required fields are marked *

Enable Notifications OK No thanks