Dima's Blog

Learning by Building, Part 2: Caltrain Bot, SQL with LLMs, and DSPy

In the previous post, I described what I learned about setting up a Python project and how I plan to set up all my Python projects, including Caltrain, going forward using uv, ruff, and ty. In this second post, I’d like to focus on data and prompting: what I learned while trying to preprocess the Caltrain schedule into a format suitable for SQL queries, and how I learned to "program" LLMs instead of relying on prompts.

GTFS & Caltrain Schedule

When I was just starting to think about this project, I expected I’d need to parse either the schedule page or a PDF file provided by Caltrain. But to my surprise, when I googled ā€œdeveloper-friendly Caltrain schedule,ā€ one of the first search results was a link to the schedule in GTFS format.

But what exactly is GTFS? I didn’t know either. Apparently, GTFS stands for General Transit Feed Specification - an open standard, originally developed by Google, for distributing transit system information to riders. Caltrain publishes its schedule in GTFS, and the entire schedule comes as a roughly 170KB ZIP archive containing all the information I needed.

Python offers several libraries for working with GTFS. After reviewing ChatGPT’s top three suggestions, I decided that pygtfs was the best fit for me. There was one minor inconvenience in the library’s implementation: it doesn’t work with Path objects from pathlib. Still, I hope my PR will be merged soon.

But wow, the format is complex... which brings me to an old idea: generating SQL with LLMs.

Preprocessing GTFS for Easier Querying

When I first looked at the tables, I was confused by how complex they were:

$ sqlite3 caltrain.sqlite ".tables"

_feed                fare_attributes      stop_times
_stop_translations   fare_rules           stops
_trip_shapes         feed_info            train_stop_timeline
agency               frequencies          transfers
calendar             routes               translations
calendar_dates       shapes               trips

But that’s to be expected, right? There are different service types, routes, directions, weekend and weekday schedules, public holiday schedules, and so on. So, is there a shortcut that lets me spend less than a few hours preprocessing the data into something that’s easier to query? After all, I don’t want to become an expert in the GTFS format. It’s just a blocker, and I’m not interested in learning the specification or schema in more depth.

As you’d expect, LLMs are really good at generating SQL, and coding agents are very good at running those queries and feeding the results back to the LLM. I simply gave the Codex app access to the SQLite database file and explained what kinds of questions I wanted VIEWS to support.

After 10–15 minutes of the Codex app working on my requests (GPT 5.4, extra-high effort, fast mode), I got almost exactly what I wanted. After a bit of re-prompting and giving it more examples of the kinds of questions I wanted answered with SQL, it generated two views with exactly the data I needed: origin station, destination station, and departure date/time for use with the BETWEEN operator.

SQLite doesn’t support materialized views, but I wanted my queries to be extremely fast. Since the database is small, I already keep it in memory. So in the end, I turned the second view into a table and removed the columns I didn’t really need. You can see the final result of our joint effort - me and the Codex app - in the repo.

I’m still amazed by how much easier it has become with LLMs and agents to discover, learn, and transform data for specific query patterns.

DSPy

I like to think of LLMs and prompting techniques as tools I can drop into the middle of my program to handle natural language. In a way, some parts of my programs become ā€œcodedā€ in English. However, English is not my native language, and prompt engineering is definitely not one of my strengths.

I had heard about DSPy for a long time, but I couldn’t find a good playground that motivated me to learn and practice it. Caltrain Bot turned out to be a perfect use case, because I needed more than just basic prompting techniques - I actually needed ReAct to check whether there was meaningful data in the database. I had reproduced the ReAct prompting technique many times in toy examples, but I had never used it as a real component in one of my projects. After all, most ChatGPT-like UIs already do that for me, so I never really needed to.

It was really interesting and helpful to use ReAct module and build intuition: reading explicit reasoning traces with tool calls in the LLM’s responses, seeing multiple turns, and analyzing where and why things went sideways.

I started by taking a short DeepLearning.AI course, ā€œDSPy: Build and Optimize Agentic Apps.ā€ Then, within a day or two of finishing it, I had already forgotten the differences between some of the abstractions - which brought me back to the importance of practice.

Once I finished the initial version of the bot, I realized it would not only answer schedule questions, but also talk to you about the meaning of being šŸ˜†, and so on. However, when you program prompts rather than just write them, this kind of classification problem becomes trivial:

class QuestionsClassifier(dspy.Signature):
    """Classify whether a question is asking about train schedules or not."""

    question: str = dspy.InputField()
    is_schedule_question: bool = dspy.OutputField(
        desc="Whether the question is about train schedules"
    )

Return a New Type Object

Speaking of signatures, dspy.Signature encourages users to be specific about types. For example, a field might be a string or a number. But what if you want a list of strings to be limited to a specific set of values? That’s also possible with the Literal type.

The only catch in my case is that I don’t actually know the available literal values ahead of time - the train station names come from a database. Of course, I could copy and paste them and hardcode the values, but this is just a hobby project, and I don’t want to maintain it after deployment.

Thankfully, everything in Python is an object, including type objects. As the Python docs put it:
class type(_name_, _bases_, _dict_, _/_, _kwargs_)
With three arguments, it returns a new type object.

To support dynamically generating a Literal from a list of strings pulled from a database, I had to give up some conciseness in the part of the code responsible for the signature. It’s a bit messy, but it’s easy to understand once you recognize why it’s needed:

def build_station_extraction_signature(
    stations: Sequence[str],
) -> type[dspy.Signature]:
    """Build a DSPy signature constrained to the known station names."""
    station_names = tuple(stations)
    if not station_names:
        raise ValueError("At least one station name is required.")

    # Build Literal["a", "b", ...] from DB-loaded station names at runtime.
    # Literal[...] syntax needs the values during annotation construction, so we
    # call __getitem__ explicitly instead of hardcoding the station list.
    station_literal = Literal.__getitem__(station_names)

    return type(
        "ExtractDepartureAndArrivalStations",
        (dspy.Signature,),
        {
            "__doc__": "Extract departure station and arrival station from the question.",
            "__annotations__": {
                "question": str,
                "departure_station": station_literal,
                "arrival_station": station_literal,
                "departure_time": datetime,
            },
            "question": dspy.InputField(),
            "departure_station": dspy.OutputField(desc="Departure station"),
            "arrival_station": dspy.OutputField(desc="Arrival station"),
            "departure_time": dspy.OutputField(desc="Approximate departure time"),
        },
    )

PEP 636 – Structural Pattern Matching

The bot supports multiple LLM backends: Ollama and OpenRouter. At first, I didn’t plan to support OpenRouter, but I’ll explain why I changed my mind in the next post. One last thing I want to share in this post from this learning-by-doing process is the match operator.

Python has been my primary programming language forever, and I wrote about that here. For complex cases, I always used elif statements or simple numeric matching with match. When I tried to follow my intuition and use match for branching between different LLM providers, I got confused almost immediately. To my surprise, I didn’t understand why my code (recommended by ChatGPT) worked:

def _validate_provider(settings: LLMSettings) -> None:
    match settings:
        case OpenRouterSettings():
            if not settings.api_key.strip():
                raise ValueError("OPENROUTER_API_KEY environment variable is required.")
        case OllamaSettings():
            pass
        case _:
            raise ValueError(f"Unsupported LLM provider: {settings.provider}")

So I had to go back to basics and properly learn how Python processes match statements - and why I wasn’t calling a function or creating an instance. In short, Python does not execute MyClass() to create an instance. Instead, it interprets MyClass() as a class pattern.

In the next post I will share how I failed to run ReAct locally or Raspberry Pi with decent success rate and how hard it is to find some meaningful documentation for custom hardware. Stay tuned!