02 Basic Concepts of LangChain and RAG

LangChain

This is a framework for developing functionalities related to large language models (LLMs).

Core concept: Provide unified interfaces for various LLMs, "chaining" LLM-related components together.

Mainly provides six functions:

Prompts: Prompt engineering
Models: Invoking various models
History: Managing conversational history
Indexes: Managing and analyzing various documents
Chains: Building chains of execution for functions
Agents: Building intelligent agents

Deficiencies of General LLMs

Lack of real-time knowledge; new knowledge after training cutoff cannot be acquired.
Lack of domain-specific knowledge due to insufficient training samples.
Hallucination issues – sometimes sounding plausible yet providing incorrect information.
Data security concerns.

RAG (Retrieval Augmented Generation) technology provides solutions to the above problems.

Simply put: RAG = Retrieval technology + LLM

RAG Workflow

User query → Retrieve relevant data → Prompt optimization + retrieved data + user query packed into a model → LLM answers based on reference materials

Specifically, it includes the following three steps:

Offline indexing, e.g., continuously obtaining the latest information via crawlers. Divide the extracted information into chunks, vectorize them, and store them in a vector database.
Real-time retrieval: Instead of directly submitting the user's query to the model, it retrieves relevant data and combines it with prompt optimization to form the final prompt.
Generation: Input the prompt into the LLM to obtain the output content.

What is a Vector?

A vector converts the semantic information of text into a fixed-length list of numbers, allowing computers to "understand" the meaning of text and perform similarity calculations.

The process of vector embedding typically uses a text embedding model. Through deep learning and other techniques, such models extract semantic features from text and map them into fixed-length numerical sequences.

After obtaining the numerical sequence, using the cosine similarity algorithm, the semantic similarity between two sentences can be calculated (the closer to 1, the more similar), thereby improving the efficiency of semantic matching.

The semantic dimensionality of a text embedding model is an important indicator. For example, the text-embedding-v1 model can generate a 1536-dimensional vector. Roughly understood as the "score" (intensity) of a given text on 1536 "topics" (abstract semantic features).

The more dimensions, the more precise the semantic matching. However, more dimensions also impose greater computational performance pressure.

What is Cosine Similarity?

The numerical sequence of a vector determines its direction and length in high-dimensional space. The cosine similarity algorithm eliminates the influence of length to obtain the angle of direction. The smaller the angle, the more similar (i.e., same direction).

Formula of cosine similarity: Cosine similarity = dot product of two vectors / product of their magnitudes

cos_sim = (vec_a · vec_b) / (||vec_a|| × ||vec_b||)

def get_dot(vec_a, vec_b)
  """Calculate the dot product of two vectors"""
  if len(vec_a)!= len(vec_b)
    raise ValueError("The two vectors must have the same dimension")
  dot_sum = 0
  for a,b in zip(vec_a, vec_b)
    dot_sum += a * b
  return dot_sum

def get_norm(vec)
  """Calculate the magnitude of a vector"""
  sum_square = 0
  for v in vec:
    sum_square += v * v
  return numpy.sqrt(sum_square)

def cos_similarity(vec_a, vec_b)
  """Cosine similarity"""
  return get_dot(vec_a, vec_b)/get_norm(vec_a) * get_norm(vec_b)

Using a two-dimensional space as an example: the closer the direction, the more similar the semantics.