Today, we live in an era where we increasingly encounter AIs of various types and uses in our lives, whether due to hype or not, such as the well-known ChatGPT, Grok, DeepSeek, etc., for simple or complex text or audio chats. What they all have in common is the use of Large Language Models (LLMs), an acronym you’ve surely encountered during your studies of AI, but which, due to the hype and euphoria of developers and people outside our tech bubble, we often forget the real concept, application, and limitations of LLMs. In general terms: LLMs are designed to understand and generate text in natural language, as well as other forms of content, based on vast amounts of data from diverse sources (files in .txt, .pdf, etc. formats)
Therefore, the goal of this blog post is to show you how to run LLMs completely locally on your own computer – no pricey API bills, no sending your data to the cloud, and no usage limits (beyond your machine’s capacity).
We’ll keep everything simple, beginner-friendly, and 100 % practical. By the end, you’ll be able to ask questions in plain English (“Show me all customers who bought more than $500 last month”) and have the model instantly write perfect (or almost) SQL for you with results.
Use Case: How to use LLM to make SQL queries?
The use case we will solve is creating a simple chat where we can use natural language to query information in an SQLite database. In this chat, when the user asks for data from the database, the response will always include the generated SQL query and a table with the results.
To do this, we will implement and use:
The Ollama tool, which makes it easier to start using LLMs like Gemma 3, DeepSeek-R1, Qwen3, and others locally. In our case, we will use the LLM
qwen2.5:latest, which, as far as I could test, has better results for the programming and SQL domains;Model Context Protocol (MCP) Server API, which implements an open standard that allows AI applications to connect to data sources, tools, and workflows, similar to USB-C, which connects electronic devices in a standardized way.
And finally, a simple SPA app using React.js that will consume our MCP server.
The link to the complete source code for both the MCP server and the frontend is at the end of this post.
The goal throughout this post is to focus on the relevant parts of the server implementation and how to consume it on the front-end side.
Let’s code!
First, we need to install the Ollama tool on our machine. Just access this link and download it for your current OS. After the installation is complete, we need to download our LLM qwen2.5:latest, which is quite simple:
ollama run qwen2.5:latestAfter completion, you should be able to send messages to the downloaded model, like this:
>>> Send a message (/? for help)Note: Feel free to download and play with other models.
Once we have downloaded our model, let’s move on to implementing our MCP server. Below is an image that clearly demonstrates how this open protocol works.

The image above clearly illustrates how the MCP server works, showing that the MCP Server consumes external data sources from private or public APIs, databases, documents, and so on, and provides this data to MCP clients, such as Claude Desktop, Cursor IDE, and in our case, the SPA app with React.js.
We will use Node.js + Express to implement our server, which will consist of just two main files: server.ts and question-to-sql.ts. The latter is the most important because it will handle the use of our LLM and return the SQL query to be executed in our SQLite database as output.
server.ts
The goal of our MCP server is simple: to provide a /mcp endpoint that will be consumed by our MCP client, passing a simple question as text.
Let’s walk through the code.
const server = new McpServer({
name: "demo-server",
version: "1.0.0",
});1. Create the MCP Server Instance
- McpServer is the core class from the Model Context Protocol SDK.
- Provides it a name and version, these show up in the LLM’s tool selection UI (e.g. in Claude, Cursor, etc.), so the model knows what server it’s communicating to.
- At this point, the server exists but has no tools yet. So, let’s register a new one!
server.registerTool(
"sql-query-tool",
{
title: "SQL Query Tool",
description:
"Ask a question in natural language receives the SQL query and the result",
inputSchema: { question: z.string() },
outputSchema: { sql: z.string(), result: z.array(z.any()) },
},
async ({ question }) => {
// ... tool implementation
},
);2. Register the SQL Query Tool
This is where the magic happens. We register a tool named "sql-query-tool". With the following parameters
titleanddescription;inputSchema, uses Zod to declare that the tool expects a single string field called question.outputSchema, defines the structure of the response with what will be returned: an object with sql (string) and result (array of rows).
The Actual Tool Handler (the third argument)
async ({ question }) => {
const db = await openDB();
let output: { sql: string; result: any[] };
try {
if (memoCache.has(question)) {
// Cache hit, reuse the previously generated SQL
const sql = memoCache.get(question)!;
const rows = await db.all(sql);
output = { sql, result: rows };
} else {
// Cache miss, ask the LLM (or another service) to turn the question into SQL
const sql = await questionToSQL(question);
const rows = await db.all(sql);
// Store it for next time
memoCache.set(question, sql);
output = { sql, result: rows };
}
return {
content: [{ type: "text", text: JSON.stringify(output) }],
structuredContent: output,
};
} finally {
await db.close(); // Always close the DB, even if something throws
}
}Let’s check step-by-step what we have inside the tool:
const db = await openDB();
Opens a connection to the database. AssumesopenDB()is a simple function that returns a SQLite database handle.Check the cache (
memoCache, a simpleMapdeclared outside):- If we’ve seen this exact question before, we skip the expensive
questionToSQLcall. - Otherwise, generate the SQL once, cache it forever (or until server restart).
- If we’ve seen this exact question before, we skip the expensive
const sql = await questionToSQL(question);
This is the key part. questionToSQL is an async function that calls our local LLM with a prompt, it returns just the SQL string. It executes the SQL safely withdb.all(sql), getting all rows as an array of objects.
Full code:
const server = new McpServer({
name: "demo-server",
version: "1.0.0",
});
server.registerTool(
"sql-query-tool",
{
title: "SQL Query Tool",
description:
"Ask a question in natural language receives the SQL query and the result",
inputSchema: { question: z.string() },
outputSchema: { sql: z.string(), result: z.array(z.any()) },
},
async ({ question }) => {
const db = await openDB();
let output: { sql: string; result: any[] };
try {
if (memoCache.has(question)) {
const sql = memoCache.get(question)!;
const rows = await db.all(sql);
output = { sql, result: rows };
} else {
const sql = await questionToSQL(question);
const rows = await db.all(sql);
memoCache.set(question, sql);
output = { sql, result: rows };
}
return {
content: [{ type: "text", text: JSON.stringify(output) }],
structuredContent: output,
};
} finally {
await db.close();
}
},
);
app.post("/mcp", async (req, res) => {
const transport = new StreamableHTTPServerTransport({
sessionIdGenerator: undefined,
enableJsonResponse: true,
});
res.on("close", () => {
transport.close();
});
await server.connect(transport);
await transport.handleRequest(req, res, req.body);
});In summary, our server receives a question, checks if we already have that question cached, executes it in the database if necessary, and sends the results back.
question-to-sql.ts
In this file, we have the questionToSQL function, which is the core of our app, as it handles the generation of SQL queries from questions in natural language. And for this, this function:
- Defines a prompt with detailed instructions on how to interpret and transform the question into a valid SQL query.
- Uses the
generateChatAnswerfunction to obtain the chat response using our downloaded model,"qwen2.5:latest". - Processes the response to extract only the content of the SQL query.
import { Ollama } from "ollama";
const ollama = new Ollama({ host: " http://localhost:11434" });
const generateChatAnswer = async (prompt: string) => {
const response = await ollama.chat({
model: "qwen2.5:latest",
messages: [{ role: "user", content: prompt }],
});
return response.message.content.trim();
};
export async function questionToSQL(question: string): Promise<string> {
const prompt = `
You are an expert SQL query generator. Your task is to take a natural language question from the user and convert it into a valid SQL query that can be executed on one or more database tables.
You must infer the tables, their schemas, and appropriate join types (LEFT, RIGHT, INNER, OUTER) dynamically based on the context of the question, without relying on a static or predefined table schema.
### Instructions:
- Identify the most relevant table(s) and their likely columns based on the natural language question. Assume standard naming conventions for tables and columns (e.g., "users" for user-related data, "id" for primary keys, "name" for text fields, etc.) if not explicitly provided.
- If the question references specific table names, prioritize them. Otherwise, infer table names from context (e.g., "employees" for staff-related queries, "orders" for purchase-related queries, "departments" for organizational data).
- Assume reasonable column names and data types based on the question's context (e.g., "salary" as DECIMAL for employee pay, "created_at" as DATETIME for timestamps, "user_id" as INTEGER for foreign keys).
- Detect when multiple tables are implied and infer appropriate join types (LEFT, RIGHT, INNER, OUTER) based on the relationships described in the question. Use foreign keys (e.g., "user_id", "order_id") to establish relationships between tables when needed.
- Generate only the SQL query. Do not include explanations, additional text, or comments.
- Ensure the query is syntactically correct and uses standard SQL.
- Handle aggregations (e.g., COUNT, SUM), filters (e.g., WHERE), sorting (e.g., ORDER BY), and joins (LEFT, RIGHT, INNER, OUTER) only if explicitly or implicitly needed based on the inferred schema and question.
- If the question is too ambiguous or cannot be translated into a valid query based on reasonable assumptions, output: "Invalid question for the inferred table(s)."
- Output the query in a code block for easy copying.
User question: ${question}
Generated SQL query output (only the SQL raw query without break lines):
`;
const sqlContent = await generateChatAnswer(prompt);
const codeBlockMatch = sqlContent.trim().split("```sql\n").join("").split("\n```").join("");
return codeBlockMatch;
}Note: The more detailed our prompt is, the more accurate our final response will be. Try new prompts or improvements during your development.
Practical Example
If the input is a question such as: "Select all users with email with domain @yahoo.com", it will generate something similar to:
SELECT * FROM users WHERE email LIKE '%@yahoo.com';Frontend app
So, once our MCP server is implemented, we can understand how to integrate it into our chat app, nicknamed "GPTo Database Chat".
Our app is very straightforward in its features, basically allowing the user to:
- Send a question to our database and obtain the SQL query and the returned data;
- Automatically save the questions asked in the browser’s local storage history.
To implement the above features, React.js (with TypeScript) + framework’s React Router was used.
The core of the app relevant to this post is the integration with the API of our MCP server through the implementation of the custom React hook useMcpSql, a simple wrapper for the usage of hook useMcp, that will be used by our UI to interact with our MCP server.
import { useMcp } from "use-mcp/react";
export interface ChatResponse {
sql: string;
result: unknown[];
}
export function useMcpSql() {
const { state, retry, error, callTool } = useMcp({
url: "http://localhost:3001/mcp",
clientName: "MCP-SQL-Chat",
autoReconnect: true,
});
const isError = state === "failed";
const call = async (question: string) => {
try {
const result = await callTool("sql-query-tool", { question });
return result.structuredContent as ChatResponse;
} catch (error) {
console.error(error);
throw error;
}
};
return {
retry,
isError,
error,
call,
};
}The useMcp is very useful because which provides a very good abstraction to instantiate a MCP client server to consume any MCP server (locally or remotely) with states and helpful methods. And we only need these: state, retry, error, callTool
state: used to check if the request made withcallToolwas successful or not;retry: used by our UI in case of failure to repeat the same prompt/question request;error: it’s just the exception content;callTool: the principal method that will call our tool defined on the server (sql-query-tool) with the question param and it will be called by the "Ask Question" UI button.
Finally, the end result is the rendering of the SQL query and its results in table format for better visualization.

Wrapping up
It’s simple to use and integrate a local LLM via an MCP server, right? I hope this post has made it clearer what LLMs actually are and how we can go beyond common cloud usage (like ChatGPT, Grok, Copilot, etc.), adopting open-source models provided by the community that can be installed locally to solve everyday tasks—like the natural language SQL query demonstrated in this post, and without depending on super models that require your credit card!
That’s all for today, folks! Until next time!
Check out the complete code by clicking on this link
We want to work with you. Check out our Services page!

