Retail site selection consultants charge thousands of dollars for trade area reports. The underlying work is a drive-time polygon overlaid with census demographics, with competitor locations plotted on top. All of it is reproducible with PostGIS, free government data, and open commercial data from Overture Maps.
A geofence is a virtual boundary defined around a real-world geographic area. When a tracked object (a phone, a truck, a piece of farm equipment) crosses that boundary, something triggers: a push notification, a dispatch alert, a compliance log entry, an irrigation valve.
Most database problems are fixable. A slow query gets an index. A hot table gets partitioned. A node runs out of memory and you add another. The feedback loop is tight enough that you can experiment your way to a better configuration without touching application code or migrating data.
The last couple of posts in this series described navigation algorithms: robot vacuums covering a living room floor, then self-driving taxis doing the same at city scale. In that second post, I mentioned that Waymo vehicles sweep lidar continuously to sense the road around them. That’s lidar as a real-time perception tool. There’s another side to it: aerial terrain surveys that produce large static datasets, published and publicly available, and queryable with SQL.
Database performance problems are often mysterious. Queries slow down, CPU usage spikes, or users complain about latency, but pinpointing the cause requires visibility into what your database is actually doing. pg_stat_statements is PostgreSQL’s answer to this challenge.
In my previous post on routing, I used Dijkstra’s algorithm without much discussion of alternatives. The Dijkstra algorithm works for network routing, and for many problems it is the right choice. But pgRouting also ships with pgr_aStar, an implementation of the A* algorithm that can find the same shortest path while exploring fewer edges. The difference comes down to one thing: a heuristic that tells the algorithm which direction to look.
Vacuum robots and self-driving taxi robots navigate physical spaces and avoid obstacles. That’s a super broad characterization. The more interesting question is why everything else about them is so different.
A developer asked me recently which open data sources he could use for real-time traffic in his application. The list is shorter than he expected, and the main reason is licensing.
Query optimization is a critical aspect of database performance tuning. While YugabyteDB’s YSQL API provides powerful tools for analyzing query performance through EXPLAIN plans, sometimes we need to experiment with different indexing strategies without the overhead of actually creating the indexes. This is where HypoPG comes in handy.
In 2018, I wrote about using SQL functions to generate random test data in MySQL. While that approach served its purpose, the landscape of test data generation has evolved significantly. Today, I want to share my experience with using the Faker library, which has become my go-to tool for creating realistic test datasets.