nl2sql: Natural Language to SQL

Era of AI Day
Thursday December 11, 2025
Microsoft's Philadelphia Innovation Hub

"Natural Language to SQL: The Things You Might Not Have Thought About"

Being able to chat with your data gets your users closer to the promise of true self-service analytics. This is known as NL2SQL. These solutions can be created in under an hour against just about any database. However, just being able to have an LLM understand what you are asking and convert the request to SQL statements tends to be underwhelming. What you need are some advanced techniques that most people don't think about that will make your solution truly valuable for your business users. I'll show you how to embed business logic for YOUR business directly in the process. Then I'll show you how to write QUALITY tests to ensure your users are getting the RIGHT answers from your data and we'll even show you how to have your LLM predict similar questions that your users may ask and offer those SUGGESTIONS to your users. Then we'll show you how to ask PRESCRIPTIVE questions about your data that are nearly impossible to answer with just SQL, such as "Why did my sales go down last quarter and how can I prevent that next quarter?". Finally, I'll show you how to build an MCP server that your users can leverage to run their queries.

This is what we will hopefully build today:

This can be modified in just a few hours for YOUR data and use cases.

About Dave Wentzel

dave@davewentzel.com
davew@microsoft.com
Website (I haven't posted much here for years)
LinkedIn

As Microsoft Innovation Hub's analytics architect, my mantra is "Business Before Technology". At the Innovation Hub (nee Microsoft Technology Center/MTC) we strive to understand your business problems before deploying technology. I focus on building solutions using data, analytics, and AI. I work with our largest customers, from executives to business people to technologists, using a unique approach that helps to "shift left" analytics projects and remove risk. The days of 6 month capital projects are over, instead I try to solve business problems quickly to prove value. I try to build solutions that solve challenging problems prescriptively and repeatably. SUCCESS for me is solving challenging problems for respected companies, working with talented people.

Agenda

Quick demo of NL2SQL solutions using Microsoft Fabric (pre-baked, shows the Art of the Possible)
Why is NL2SQL still underwhelming wrt "self-service analytics"?
- we can definitely do nl2sql...that's easy, but there are VERY difficult problems we still need to solve
  - complex schemas (lots of tables with goofy naming conventions, non-intuitive join keys, complex filter conditions like adding WHERE status = 1 to all queries). we can handle this with judicious views
- natural language ambiguity: example: "how much" (dollar amounts) vs "how many" (counts) queries
  - the LLMs sometimes get confused as to which columns to pick for a given query.
  - we can provide few-shot prompts to describe which columns we should use when the query is asking how much vs how many
- understanding intent : The user really wanted something else but the LLM misinterpreted the question and hence generated the wrong SQL.
- tribal knowledge: understanding that certain phrases mean different things in different companies
  - active customer could mean "customers who bought something in the last 3 months" at one company, but 6 months at another company
  - we can again solve this with few-shot prompting.

Demos and Code You Can Do Yourself

Simplest Solution, get up and running in 10 minutes

We'll start with the simplest solutions to setup and move along the path to solutions that require a bit more work but provide better results.

The Fastest Time to Value: Fabric Data Agents

"Non-Agentic" Approach (and next easiest)...

The Easiest Approach if you aren't on Fabric: simply write a little code that directs an LLM as to how to query your db, then have a little code that executes the sql

Let's look at this in more detail...structured_data_retreival_nltosql.ipynb
Key Concepts:
- the schema and business logic is hardcoded right in the prompt
- we do few shot learning (give it a few examples of relevant SQL -- this is VERY helpful to the LLM and it a great technique for describing business logic directly in the SQL. ie, WHERE IsActive = 1)
- after the SQL is executed we store the results in a pandas dataframe. This has some benefits, namely:
  - I can display the data as a table
  - I can graph it using something like matplotlib or plotly in the browser
  - I can pass the results to the next step in the orchestration workflow
  - I can simply use the results to generate an answer to the original question.
Screenshots of the Demo

Agentic Solution with LangChain "family" of tooling

Use an orchestrator to build an agentic system: here's an example that is NOT code-complete, but shows how to do this with the LangChain family of tools. Let's walk through what this solution would look like.

The code will need some tweaking, but shows the Art of the Possible.
Example Code using the LangChain family of tools

Agentic Solution with Semantic Kernel -- Full State Machine Approach

Full State Machine Approach
- This will be the one we build out together using Semantic Kernel as the agentic orchestrator
- README: use the documentation here to build the solution on your own

Which Approach is Better? What are the Best Practices so I can do this myself, quickly?

prefer simplicity. If your schema is complex, can views solve a lot of the complexity?
latency/performance. The more agentic the solution becomes, the more expensive it will become, it both wallclock time for the user and token costs.
Feedback loops are important.
Work with your business users on "evaluations"

Next Steps (for your consideration) ... OR ... What does the future look like?

How might we make these solutions BETTER?

Augment the NL2SQL solution with RAG patterns

When is this valuable?

if you have a huge schema
if you have a schema that isn't sane. ie, think of SAP's schema. The table and col names are not "human-readable". So, you'll need to augment the NL2SQL solution so the LLM knows which tables/cols have the relevant data to answer the question.
- A simple way to avoid this is to code everything in VIEWS (a semantic tier). Views are your friend
complex business logic that is difficult to express in SQL
we can do this instead of few-shot learning examples that we would need to hard-code somewhere

Here is some example code to get you started. Let's discuss how this works together.

Augment the NL2SQL solution using your data catalog

This is a variant of the RAG pattern above. If you have a data catalog already that has good schema descriptions and business rules, just query that as an agentic step and pass the information to the next agent in the chain.

Evaluations (unit tests)

We want our business people that will USE the nl2sql solution build their own evaluations. Why?

As they use the solution they will construct NL queries that we haven't thought of that may not work. We want these captured and the correct responses provided so we can ensure that when we make changes to the system in the future (new LLMs, changes to schema, etc) that we are not breaking existing functionality.

GraphRAG

Using RAG patterns against a Knowledge Graph

Standard relational dbs can't answer what I call aggregation style analytics questions (this is SOLELY my term).

Imagine this scenario: I have call center transcripts in a database and I want to ask "probing" questions of the data? "What are the Top 5 topics that our customers call about?" And "Given those topics, what are the Top 5 things are customers RECOMMEND that we do to AVOID them calling us?"

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
Demo		Demo
adventureworks-sql-server-container		adventureworks-sql-server-container
img		img
langchain-nl2sql		langchain-nl2sql
sk-nl2sql		sk-nl2sql
.flake8		.flake8
.gitignore		.gitignore
.pydocstyle		.pydocstyle
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
nlsql-search-index.md		nlsql-search-index.md
structured_data_retreival_nltosql.ipynb		structured_data_retreival_nltosql.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nl2sql: Natural Language to SQL

"Natural Language to SQL: The Things You Might Not Have Thought About"

About Dave Wentzel

Agenda

Demos and Code You Can Do Yourself

Simplest Solution, get up and running in 10 minutes

"Non-Agentic" Approach (and next easiest)...

Agentic Solution with LangChain "family" of tooling

Agentic Solution with Semantic Kernel -- Full State Machine Approach

Which Approach is Better? What are the Best Practices so I can do this myself, quickly?

Next Steps (for your consideration) ... OR ... What does the future look like?

Augment the NL2SQL solution with RAG patterns

Augment the NL2SQL solution using your data catalog

Evaluations (unit tests)

GraphRAG

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nl2sql: Natural Language to SQL

"Natural Language to SQL: The Things You Might Not Have Thought About"

About Dave Wentzel

Agenda

Demos and Code You Can Do Yourself

Simplest Solution, get up and running in 10 minutes

"Non-Agentic" Approach (and next easiest)...

Agentic Solution with LangChain "family" of tooling

Agentic Solution with Semantic Kernel -- Full State Machine Approach

Which Approach is Better? What are the Best Practices so I can do this myself, quickly?

Next Steps (for your consideration) ... OR ... What does the future look like?

Augment the NL2SQL solution with RAG patterns

Augment the NL2SQL solution using your data catalog

Evaluations (unit tests)

GraphRAG

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages