Learning by Building, Part 2: Caltrain Bot, SQL with LLMs, and DSPy
In the previous post, I described what I learned about setting up a Python project and how I plan to set up all my Python projects, including Caltrain, going forward using uv, ruff, and ty. In this second post, Iād like to focus on data and prompting: what I learned while trying to preprocess the Caltrain schedule into a format suitable for SQL queries, and how I learned to "program" LLMs instead of relying on prompts.
GTFS & Caltrain Schedule
When I was just starting to think about this project, I expected Iād need to parse either the schedule page or a PDF file provided by Caltrain. But to my surprise, when I googled ādeveloper-friendly Caltrain schedule,ā one of the first search results was a link to the schedule in GTFS format.
But what exactly is GTFS? I didnāt know either. Apparently, GTFS stands for General Transit Feed Specification - an open standard, originally developed by Google, for distributing transit system information to riders. Caltrain publishes its schedule in GTFS, and the entire schedule comes as a roughly 170KB ZIP archive containing all the information I needed.
Python offers several libraries for working with GTFS. After reviewing ChatGPTās top three suggestions, I decided that pygtfs was the best fit for me. There was one minor inconvenience in the libraryās implementation: it doesnāt work with Path objects from pathlib. Still, I hope my PR will be merged soon.
But wow, the format is complex... which brings me to an old idea: generating SQL with LLMs.
Preprocessing GTFS for Easier Querying
When I first looked at the tables, I was confused by how complex they were:
$ sqlite3 caltrain.sqlite ".tables"
_feed fare_attributes stop_times
_stop_translations fare_rules stops
_trip_shapes feed_info train_stop_timeline
agency frequencies transfers
calendar routes translations
calendar_dates shapes trips
But thatās to be expected, right? There are different service types, routes, directions, weekend and weekday schedules, public holiday schedules, and so on. So, is there a shortcut that lets me spend less than a few hours preprocessing the data into something thatās easier to query? After all, I donāt want to become an expert in the GTFS format. Itās just a blocker, and Iām not interested in learning the specification or schema in more depth.
As youād expect, LLMs are really good at generating SQL, and coding agents are very good at running those queries and feeding the results back to the LLM. I simply gave the Codex app access to the SQLite database file and explained what kinds of questions I wanted VIEWS to support.
After 10ā15 minutes of the Codex app working on my requests (GPT 5.4, extra-high effort, fast mode), I got almost exactly what I wanted. After a bit of re-prompting and giving it more examples of the kinds of questions I wanted answered with SQL, it generated two views with exactly the data I needed: origin station, destination station, and departure date/time for use with the BETWEEN operator.
SQLite doesnāt support materialized views, but I wanted my queries to be extremely fast. Since the database is small, I already keep it in memory. So in the end, I turned the second view into a table and removed the columns I didnāt really need. You can see the final result of our joint effort - me and the Codex app - in the repo.
Iām still amazed by how much easier it has become with LLMs and agents to discover, learn, and transform data for specific query patterns.
DSPy
I like to think of LLMs and prompting techniques as tools I can drop into the middle of my program to handle natural language. In a way, some parts of my programs become ācodedā in English. However, English is not my native language, and prompt engineering is definitely not one of my strengths.
I had heard about DSPy for a long time, but I couldnāt find a good playground that motivated me to learn and practice it. Caltrain Bot turned out to be a perfect use case, because I needed more than just basic prompting techniques - I actually needed ReAct to check whether there was meaningful data in the database. I had reproduced the ReAct prompting technique many times in toy examples, but I had never used it as a real component in one of my projects. After all, most ChatGPT-like UIs already do that for me, so I never really needed to.
It was really interesting and helpful to use ReAct module and build intuition: reading explicit reasoning traces with tool calls in the LLMās responses, seeing multiple turns, and analyzing where and why things went sideways.
I started by taking a short DeepLearning.AI course, āDSPy: Build and Optimize Agentic Apps.ā Then, within a day or two of finishing it, I had already forgotten the differences between some of the abstractions - which brought me back to the importance of practice.
Once I finished the initial version of the bot, I realized it would not only answer schedule questions, but also talk to you about the meaning of being š, and so on. However, when you program prompts rather than just write them, this kind of classification problem becomes trivial:
class QuestionsClassifier(dspy.Signature):
"""Classify whether a question is asking about train schedules or not."""
question: str = dspy.InputField()
is_schedule_question: bool = dspy.OutputField(
desc="Whether the question is about train schedules"
)
Return a New Type Object
Speaking of signatures, dspy.Signature encourages users to be specific about types. For example, a field might be a string or a number. But what if you want a list of strings to be limited to a specific set of values? Thatās also possible with the Literal type.
The only catch in my case is that I donāt actually know the available literal values ahead of time - the train station names come from a database. Of course, I could copy and paste them and hardcode the values, but this is just a hobby project, and I donāt want to maintain it after deployment.
Thankfully, everything in Python is an object, including type objects. As the Python docs put it:
class type(_name_, _bases_, _dict_, _/_, _kwargs_)
With three arguments, it returns a new type object.
To support dynamically generating a Literal from a list of strings pulled from a database, I had to give up some conciseness in the part of the code responsible for the signature. Itās a bit messy, but itās easy to understand once you recognize why itās needed:
def build_station_extraction_signature(
stations: Sequence[str],
) -> type[dspy.Signature]:
"""Build a DSPy signature constrained to the known station names."""
station_names = tuple(stations)
if not station_names:
raise ValueError("At least one station name is required.")
# Build Literal["a", "b", ...] from DB-loaded station names at runtime.
# Literal[...] syntax needs the values during annotation construction, so we
# call __getitem__ explicitly instead of hardcoding the station list.
station_literal = Literal.__getitem__(station_names)
return type(
"ExtractDepartureAndArrivalStations",
(dspy.Signature,),
{
"__doc__": "Extract departure station and arrival station from the question.",
"__annotations__": {
"question": str,
"departure_station": station_literal,
"arrival_station": station_literal,
"departure_time": datetime,
},
"question": dspy.InputField(),
"departure_station": dspy.OutputField(desc="Departure station"),
"arrival_station": dspy.OutputField(desc="Arrival station"),
"departure_time": dspy.OutputField(desc="Approximate departure time"),
},
)
PEP 636 ā Structural Pattern Matching
The bot supports multiple LLM backends: Ollama and OpenRouter. At first, I didnāt plan to support OpenRouter, but Iāll explain why I changed my mind in the next post. One last thing I want to share in this post from this learning-by-doing process is the match operator.
Python has been my primary programming language forever, and I wrote about that here. For complex cases, I always used elif statements or simple numeric matching with match. When I tried to follow my intuition and use match for branching between different LLM providers, I got confused almost immediately. To my surprise, I didnāt understand why my code (recommended by ChatGPT) worked:
def _validate_provider(settings: LLMSettings) -> None:
match settings:
case OpenRouterSettings():
if not settings.api_key.strip():
raise ValueError("OPENROUTER_API_KEY environment variable is required.")
case OllamaSettings():
pass
case _:
raise ValueError(f"Unsupported LLM provider: {settings.provider}")
So I had to go back to basics and properly learn how Python processes match statements - and why I wasnāt calling a function or creating an instance. In short, Python does not execute MyClass() to create an instance. Instead, it interprets MyClass() as a class pattern.
In the next post I will share how I failed to run ReAct locally or Raspberry Pi with decent success rate and how hard it is to find some meaningful documentation for custom hardware. Stay tuned!