Era of AI Day
Thursday December 11, 2025
Microsoft's Philadelphia Innovation Hub
Being able to chat with your data gets your users closer to the promise of true self-service analytics. This is known as NL2SQL. These solutions can be created in under an hour against just about any database. However, just being able to have an LLM understand what you are asking and convert the request to SQL statements tends to be underwhelming. What you need are some advanced techniques that most people don't think about that will make your solution truly valuable for your business users. I'll show you how to embed business logic for YOUR business directly in the process. Then I'll show you how to write QUALITY tests to ensure your users are getting the RIGHT answers from your data and we'll even show you how to have your LLM predict similar questions that your users may ask and offer those SUGGESTIONS to your users. Then we'll show you how to ask PRESCRIPTIVE questions about your data that are nearly impossible to answer with just SQL, such as "Why did my sales go down last quarter and how can I prevent that next quarter?". Finally, I'll show you how to build an MCP server that your users can leverage to run their queries.
This is what we will hopefully build today:
This can be modified in just a few hours for YOUR data and use cases.
- dave@davewentzel.com
- davew@microsoft.com
- Website (I haven't posted much here for years)
As Microsoft Innovation Hub's analytics architect, my mantra is "Business Before Technology". At the Innovation Hub (nee Microsoft Technology Center/MTC) we strive to understand your business problems before deploying technology. I focus on building solutions using data, analytics, and AI. I work with our largest customers, from executives to business people to technologists, using a unique approach that helps to "shift left" analytics projects and remove risk. The days of 6 month capital projects are over, instead I try to solve business problems quickly to prove value. I try to build solutions that solve challenging problems prescriptively and repeatably. SUCCESS for me is solving challenging problems for respected companies, working with talented people.
- Quick demo of NL2SQL solutions using Microsoft Fabric (pre-baked, shows the Art of the Possible)
- Why is NL2SQL still underwhelming wrt "self-service analytics"?
- we can definitely do nl2sql...that's easy, but there are VERY difficult problems we still need to solve
- complex schemas (lots of tables with goofy naming conventions, non-intuitive join keys, complex filter conditions like adding
WHERE status = 1to all queries). we can handle this with judicious views
- complex schemas (lots of tables with goofy naming conventions, non-intuitive join keys, complex filter conditions like adding
- natural language ambiguity: example: "how much" (dollar amounts) vs "how many" (counts) queries
- the LLMs sometimes get confused as to which columns to pick for a given query.
- we can provide few-shot prompts to describe which columns we should use when the query is asking how much vs how many
- understanding intent : The user really wanted something else but the LLM misinterpreted the question and hence generated the wrong SQL.
- tribal knowledge: understanding that certain phrases mean different things in different companies
active customercould mean "customers who bought something in the last 3 months" at one company, but 6 months at another company- we can again solve this with few-shot prompting.
- we can definitely do nl2sql...that's easy, but there are VERY difficult problems we still need to solve
We'll start with the simplest solutions to setup and move along the path to solutions that require a bit more work but provide better results.
- The Fastest Time to Value: Fabric Data Agents
The Easiest Approach if you aren't on Fabric: simply write a little code that directs an LLM as to how to query your db, then have a little code that executes the sql
- Let's look at this in more detail...structured_data_retreival_nltosql.ipynb
- Key Concepts:
- the schema and business logic is hardcoded right in the prompt
- we do
few shot learning(give it a few examples of relevant SQL -- this is VERY helpful to the LLM and it a great technique for describing business logic directly in the SQL. ie,WHERE IsActive = 1) - after the SQL is executed we store the results in a
pandasdataframe. This has some benefits, namely:- I can display the data as a table
- I can graph it using something like
matplotliborplotlyin the browser - I can pass the results to the next step in the orchestration workflow
- I can simply use the results to generate an answer to the original question.
- Screenshots of the Demo
Use an orchestrator to build an agentic system: here's an example that is NOT code-complete, but shows how to do this with the LangChain family of tools. Let's walk through what this solution would look like.
- The code will need some tweaking, but shows the Art of the Possible.
- Example Code using the LangChain family of tools
- Full State Machine Approach
- This will be the one we build out together using Semantic Kernel as the agentic orchestrator
- README: use the documentation here to build the solution on your own
- prefer simplicity. If your schema is complex, can views solve a lot of the complexity?
- latency/performance. The more agentic the solution becomes, the more expensive it will become, it both wallclock time for the user and token costs.
- Feedback loops are important.
- Work with your business users on "evaluations"
How might we make these solutions BETTER?
When is this valuable?
- if you have a huge schema
- if you have a schema that isn't sane. ie, think of SAP's schema. The table and col names are not "human-readable". So, you'll need to augment the NL2SQL solution so the LLM knows which tables/cols have the relevant data to answer the question.
- A simple way to avoid this is to code everything in VIEWS (a semantic tier). Views are your friend
- complex business logic that is difficult to express in SQL
- we can do this instead of few-shot learning examples that we would need to hard-code somewhere
Here is some example code to get you started. Let's discuss how this works together.
This is a variant of the RAG pattern above. If you have a data catalog already that has good schema descriptions and business rules, just query that as an agentic step and pass the information to the next agent in the chain.
We want our business people that will USE the nl2sql solution build their own evaluations. Why?
- As they use the solution they will construct NL queries that we haven't thought of that may not work. We want these captured and the correct responses provided so we can ensure that when we make changes to the system in the future (new LLMs, changes to schema, etc) that we are not breaking existing functionality.
Using RAG patterns against a Knowledge Graph
Standard relational dbs can't answer what I call aggregation style analytics questions (this is SOLELY my term).
Imagine this scenario: I have call center transcripts in a database and I want to ask "probing" questions of the data? "What are the Top 5 topics that our customers call about?" And "Given those topics, what are the Top 5 things are customers RECOMMEND that we do to AVOID them calling us?"

