Vacuum robots and self-driving taxi robots navigate physical spaces and avoid obstacles. That’s a super broad characterization. The more interesting question is why everything else about them is so different.
If you’ve ever installed PostGIS and opened the documentation, you’ve run into the type decision right away: geometry or geography? They look similar, they both store spatial coordinates, and they share many function names. The difference matters more than it first appears. Choosing the wrong one leads to silently incorrect distance calculations.
I have been writing about shortest-path algorithms and A* heuristics in the context of road networks and pgRouting. But the same graph search concepts show up in a device that millions of people own and never think twice about: the robot vacuum.
In my previous post on routing, I used Dijkstra’s algorithm without much discussion of alternatives. The Dijkstra algorithm works for network routing, and for many problems it is the right choice. But pgRouting also ships with pgr_aStar, an implementation of the A* algorithm that can find the same shortest path while exploring fewer edges. The difference comes down to one thing: a heuristic that tells the algorithm which direction to look.
In my previous post on pgRouting, I showed how to run shortest-path queries directly inside PostgreSQL. That approach works well when your road data is already in Postgres and your network is moderate-sized. But what happens when you need live traffic data, global coverage, or routing at thousands of queries per second? That is where external routing APIs and dedicated routing engines come in.
If your application needs to answer “what is the fastest route between two points,” you might reach for an external routing API like Mapbox Directions. But if your spatial data is already stored in PostgreSQL, the Postgres extension pgRouting lets you run graph-based routing queries right where the data is.
While many authors have written about database tuning, systems tuning, or code optimization, I haven't seen any come together to cover the whole stack in such a comprehensive way, targeting both software engineers and database architects.
Having a shared vocabulary across database, software, and infrastructure teams is critical when working together to tune latency issues. I’ve been in many incident rooms where the only report is “the application is slow” and had to unwind a series of questions: What do you mean by slow? Where do you see this? What parts are slow? If everyone in the room had read Enberg’s Latency, solving these kinds of incidents would be much faster.
Query optimization is a critical aspect of database performance tuning. While YugabyteDB’s YSQL API provides powerful tools for analyzing query performance through EXPLAIN plans, sometimes we need to experiment with different indexing strategies without the overhead of actually creating the indexes. This is where HypoPG comes in handy.
In 2018, I wrote about using SQL functions to generate random test data in MySQL. While that approach served its purpose, the landscape of test data generation has evolved significantly. Today, I want to share my experience with using the Faker library, which has become my go-to tool for creating realistic test datasets.