Natural Language to SQL using AI Agent for QA | LangChain Tutorial

Hackers Realm
2 days ago
9 min read

In today’s data-driven world, accessing information efficiently is just as important as storing it. However, querying databases often requires technical knowledge of SQL, which can be a barrier for non-technical users. Imagine if anyone on your team could simply ask questions in plain English like “Show me the total sales for the last quarter” — and instantly get the correct answer from your database. That’s exactly what Natural Language to SQL (NL2SQL) systems aim to achieve. By combining AI language models with database querying, we can build intelligent agents that understand natural language questions, generate accurate SQL queries, and return meaningful results — all automatically.

Natural Language to SQL using AI Agent for QA | LangChain Tutorial — Natural Language to SQL using AI Agent for QA

In this tutorial, we’ll explore how to create such an AI-powered Question Answering (QA) Agent using LangChain, one of the most popular frameworks for building applications around large language models (LLMs). You’ll learn how to connect an LLM to a SQL database, use LangChain tools to translate natural language into SQL, and let the agent autonomously reason about your queries to deliver precise answers.

You can watch the video-based tutorial with step by step explanation down below.

Import Modules

!pip install langchain langchain-community langgraph langchain-core

Install the above package to continue with the rest of the code

from langchain_community.utilities import SQLDatabase
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain.chat_models import init_chat_model
from langchain import hub
from langgraph.prebuilt import create_react_agent

from langchain_community.utilities import SQLDatabase : This helps connect LangChain to an actual SQL database. It acts as a wrapper around your database (like SQLite, MySQL, PostgreSQL, etc.) so LangChain can send SQL queries and receive results easily.
from langchain_community.agent_toolkits import SQLDatabaseToolkit : This toolkit bundles together tools that allow an AI agent to interact with your SQL database.
from langchain.chat_models import init_chat_model : This function initializes a chat-based LLM, such as GPT-4, Claude, Gemini, etc., in a LangChain-compatible format.
from langchain import hub : The LangChain Hub is like a “model repository” for ready-made prompts, agents, and chains.
from langgraph.prebuilt import create_react_agent : creates a ReAct (Reasoning + Acting) type agent — one of LangChain’s most powerful agent architectures.

Connecting LangChain to a SQL Database

Before our AI agent can answer questions about data, it first needs access to the database. LangChain makes this step simple with the SQLDatabase utility, which allows us to connect directly to any SQL-compatible database such as SQLite, MySQL, or PostgreSQL.

db = SQLDatabase.from_uri('sqlite:///data/northwind.db')
print(db.dialect)
print(db.get_usable_table_names())

sqlite ['Categories', 'CustomerCustomerDemo', 'CustomerDemographics', 'Customers', 'EmployeeTerritories', 'Employees', 'Order Details', 'Orders', 'Products', 'Regions', 'Shippers', 'Suppliers', 'Territories']

SQLDatabase.from_uri('sqlite:///data/northwind.db') : This line establishes a connection to the database using a standard URI format. In this case, we’re connecting to the Northwind sample database, a popular dataset that simulates a company’s sales data (customers, orders, products, etc.). The 'sqlite:///data/northwind.db' string tells LangChain to use SQLite and locate the database file inside the data directory.
db.dialect : Once connected, we can print the database dialect, which identifies the type of SQL engine being used (e.g., SQLite, PostgreSQL, MySQL).
This helps LangChain automatically adapt SQL syntax to the specific database flavor.
db.get_usable_table_names() : This command retrieves a list of all available tables in the connected database.

Once the database connection is established, we can verify that everything is working correctly by running a simple SQL query through LangChain’s SQLDatabase object.

# execute sample query
print(db.run("SELECT * from Customers LIMIT 3;"))

[('ALFKI', 'Alfreds Futterkiste', 'Maria Anders', 'Sales Representative', 'Obere Str. 57', 'Berlin', 'Western Europe', '12209', 'Germany', '030-0074321', '030-0076545'), ('ANATR', 'Ana Trujillo Emparedados y helados', 'Ana Trujillo', 'Owner', 'Avda. de la Constitución 2222', 'México D.F.', 'Central America', '05021', 'Mexico', '(5) 555-4729', '(5) 555-3745'), ('ANTON', 'Antonio Moreno Taquería', 'Antonio Moreno', 'Owner', 'Mataderos 2312', 'México D.F.', 'Central America', '05023', 'Mexico', '(5) 555-3932', None)]

db.run(): The run() method allows us to execute raw SQL commands directly against the connected database. It acts as a simple interface for sending SQL statements and retrieving their results.
The query selects all columns (*) from the Customers table but limits the output to just three rows — making it ideal for quickly previewing data without overwhelming the console.

Initializing the Language Model and Toolkit

Now that our database connection is ready, the next step is to equip our system with an AI language model that can understand natural language questions and generate appropriate SQL queries. We also need to give it the right tools to interact with the database.

# initialize llm
llm = init_chat_model('gpt-4o-mini', model_provider='openai')

# initialize the toolkit
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
tools = toolkit.get_tools()

llm = init_chat_model('gpt-4o-mini', model_provider='openai') : Here, we initialize a chat-based Large Language Model (LLM) — in this case, OpenAI’s gpt-4o-mini.This model serves as the brain of our system, responsible for:
- Understanding the user’s natural language questions
- Reasoning about what information is being asked
- Generating appropriate SQL queries to retrieve that data
The init_chat_model() function simplifies LLM initialization by automatically setting up the model and configuration needed for conversational tasks.
toolkit = SQLDatabaseToolkit(db=db, llm=llm) : The SQLDatabaseToolkit acts as a bridge between the language model and the database. It bundles together all the tools the agent needs to:
- Access table schemas
- Generate SQL queries
- Execute those queries safely
- Return structured answers back to the user
By passing both the db (database connection) and llm (language model) into the toolkit, we give the AI the necessary context to translate natural language into SQL.
tools = toolkit.get_tools() : This line extracts the individual tools (or “capabilities”) from the toolkit that the agent can use. Each tool represents a specific action — such as inspecting database schema, running queries, or processing results. These tools will later be supplied to the AI agent so it can decide when and how to use them to answer a user’s question.

After initializing the SQLDatabaseToolkit, we can inspect what tools are available for our AI agent to use.

print(tools)

[QuerySQLDatabaseTool(description="Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.", db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x000001BC7F9C5970>), InfoSQLDatabaseTool(description='Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3', db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x000001BC7F9C5970>), ListSQLDatabaseTool(db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x000001BC7F9C5970>), QuerySQLCheckerTool(description='Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!', db=<langchain_community.utilities.sql_database.SQLDatabase object at 0x000001BC7F9C5970>, llm=ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001BC1C892060>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001BC1C8AC5F0>, root_client=<openai.OpenAI object at 0x000001BC1C87B890>, root_async_client=<openai.AsyncOpenAI object at 0x000001BC1C8920C0>, model_name='gpt-4o-mini', model_kwargs={}, openai_api_key=SecretStr('**********')), llm_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['dialect', 'query'], input_types={}, partial_variables={}, template='\n{query}\nDouble check the {dialect} query above for common mistakes, including:\n- Using NOT IN with NULL values\n- Using UNION when UNION ALL should have been used\n- Using BETWEEN for exclusive ranges\n- Data type mismatch in predicates\n- Properly quoting identifiers\n- Using the correct number of arguments for functions\n- Casting to the correct data type\n- Using the proper columns for joins\n\nIf there are any of the above mistakes, rewrite the query. If there are no mistakes, just reproduce the original query.\n\nOutput the final SQL query only.\n\nSQL Query: '), llm=ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001BC1C892060>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001BC1C8AC5F0>, root_client=<openai.OpenAI object at 0x000001BC1C87B890>, root_async_client=<openai.AsyncOpenAI object at 0x000001BC1C8920C0>, model_name='gpt-4o-mini', model_kwargs={}, openai_api_key=SecretStr('**********')), output_parser=StrOutputParser(), llm_kwargs={}))]

When you print the tools list, you’ll see a set of tool objects that LangChain automatically generates based on the toolkit and database connection.
QuerySQLDataBaseTool – executes SQL queries on the connected database.
InfoSQLDatabaseTool – provides schema information (like column names and data types).
ListSQLDatabaseTool – lists all available tables in the database.
These tools are essential for enabling autonomous reasoning and execution. When you later create an AI agent using these tools, the model can dynamically decide when to use each tool.

Setting Up the Prompt Template for Natural Language to SQL

Now that our database, tools, and language model are ready, the next step is to define how the AI should think and respond when converting natural language into SQL queries.

# prompt template for nl2sql
prompt_template = hub.pull('langchain-ai/sql-agent-system-prompt')
prompt_template.messages[0].pretty_print()

prompt_template = hub.pull('langchain-ai/sql-agent-system-prompt') :
- The LangChain Hub is a shared library of ready-to-use prompts and configurations for common AI tasks.
- By pulling "langchain-ai/sql-agent-system-prompt", we are fetching a predefined prompt specifically optimized for SQL reasoning.
- This prompt includes detailed instructions that tell the AI model how to:
  - Interpret user questions written in natural language
  - Think step-by-step to construct valid SQL queries
  - Use the available database tools correctly
  - Return results in a readable, conversational format
prompt_template.messages[0].pretty_print() :
- This line simply prints out the first system message from the prompt template in a nicely formatted way.
- It allows us to inspect the underlying instructions being sent to the model.
The printed output is essentially the guiding prompt — the internal "personality" and logic blueprint of your SQL agent. It ensures the model consistently translates natural language into valid, safe, and meaningful SQL commands.

Creating the SQL AI Agent

After defining the prompt template, we can now bring everything together — the language model, the database tools, and the reasoning prompt — to create our fully functional AI Agent.

This agent will understand natural language questions, generate valid SQL queries, and return accurate answers from the database.

system_message = prompt_template.format(dialect='SQLite', top_k=5)

# create the sql ai agent
sql_agent = create_react_agent(llm, tools, prompt=system_message)

system_message = prompt_template.format(dialect='SQLite', top_k=5) :
- Here, we customize the prompt we pulled earlier from the LangChain Hub.
- The format() method allows us to inject context-specific parameters into the system prompt:
  - dialect='SQLite' — tells the agent which SQL dialect the database uses.This ensures the generated SQL queries are syntactically correct for SQLite (e.g., handling LIMIT, string quotes, or date functions properly).
  - top_k=5 — sets the number of rows or results the agent should return by default when the query doesn’t specify a limit.This keeps outputs concise and readable during QA interactions.
sql_agent = create_react_agent(llm, tools, prompt=system_message) :
- This is where we actually create the AI Agent — using LangChain’s ReAct (Reasoning + Acting) architecture.
- The ReAct framework enables the model to:
  - Reason about the user’s question
  - Decide which tools to use (e.g., list tables, inspect schema, run query)
  - Act by executing SQL commands through those tools
  - Observe the results and respond in natural language
- The three main components here are:
  - llm → The language model (gpt-4o-mini), which provides reasoning and natural language understanding.
  - tools → The SQL-related tools (query execution, schema inspection, etc.) we initialized earlier.
  - prompt=system_message → The guiding system prompt that defines how the agent should behave and respond.

Running a Sample Query with the AI Agent

Now that our SQL AI Agent is set up, it’s time to see it in action!

Let’s ask the agent a simple natural language question and watch how it transforms that question into an SQL query, executes it, and returns a result.

# sample execution
query = 'Identify the ten most expensive products (name, unit price).'

# stream the events
for event in sql_agent.stream(
    {"messages": ('user', query)},
    stream_mode='values'
):
    event['messages'][-1].pretty_print()

Query Result — Sample query execution result

Here, we’re giving the agent a human-friendly question — not a structured SQL statement. The user simply asks “Identify the ten most expensive products”, and the AI agent will take care of understanding, reasoning, and generating the correct SQL behind the scenes.
Next we stream the reasoning process of the AI agent step-by-step — a very useful feature when building or demonstrating AI workflows.
The agent receives the message (user → query).
It starts thinking through the ReAct framework:
- Reasoning: “To answer this, I should look at the Products table.”
- Action: Generates and executes a SQL command like

SELECT ProductName, UnitPrice FROM Products ORDER BY UnitPrice DESC LIMIT 10;

Observation: Gets the query results from the database.
Response: Converts them into a readable answer for the user.

The stream() function allows you to observe each step in real time — which is great for debugging, learning, and transparency.
Finally, pretty_print() neatly displays the AI’s intermediate steps and final output in a structured format.

Lets see another sample execution.

# sample execution
query = 'Get the top 5 employees and the number of orders they have processed'

# stream the events
for event in sql_agent.stream(
    {"messages": ('user', query)},
    stream_mode='values'
):
    event['messages'][-1].pretty_print()

AI Response for the Query — Sample execution result

The AI Agent will interpret this natural language request, determine which tables to use (likely Employees and Orders), and figure out how to join them to get the results.
Just like before, sql_agent.stream() lets us see the reasoning and SQL generation process in real time.
As the agent processes this query, it might go through steps like:
- Identify relevant tables: Employees, Orders
- Determine relationships: Orders.EmployeeID = Employees.EmployeeID
- Construct SQL dynamically:

SELECT e.FirstName || ' ' || e.LastName AS EmployeeName, COUNT(o.OrderID) AS OrdersProcessed
FROM Employees e
JOIN Orders o ON e.EmployeeID = o.EmployeeID
GROUP BY e.EmployeeID
ORDER BY OrdersProcessed DESC
LIMIT 5;

Execute the query and fetch the results.
Return the output in an easy-to-read, conversational format.

Final Thoughts

Natural Language to SQL using AI Agents marks a major step forward in how we interact with data.
By combining LangChain, SQLDatabaseToolkit, and powerful LLMs, we can now bridge the gap between human language and structured databases — making data exploration accessible to everyone, not just SQL experts.
This tutorial demonstrated how easily an AI Agent can interpret everyday questions, generate optimized SQL queries, and deliver precise answers — all in real time. Whether for data analytics, business intelligence, or QA automation, this approach transforms how teams query and understand their data.

As AI-driven systems continue to evolve, the ability to ask questions in natural language and get meaningful insights instantly will become a cornerstone of modern data workflows.

Thanks for reading the article!!!

Check out more project videos from the YouTube channel Hackers Realm