A developer asked me recently which open data sources he could use for real-time traffic in his application. The list is shorter than he expected, and the main reason is licensing.
The last couple of posts in this series described navigation algorithms: robot vacuums covering a living room floor, then self-driving taxis doing the same at city scale. In that second post, I mentioned that Waymo vehicles sweep lidar continuously to sense the road around them. That’s lidar as a real-time perception tool. There’s another side to it: aerial terrain surveys that produce large static datasets, published and publicly available, and queryable with SQL.
If you’ve ever installed PostGIS and opened the documentation, you’ve run into the type decision right away: geometry or geography? They look similar, they both store spatial coordinates, and they share many function names. The difference matters more than it first appears. Choosing the wrong one leads to silently incorrect distance calculations.
I have been writing about shortest-path algorithms and A* heuristics in the context of road networks and pgRouting. But the same graph search concepts show up in a device that millions of people own and never think twice about: the robot vacuum.
In my previous post on routing, I used Dijkstra’s algorithm without much discussion of alternatives. The Dijkstra algorithm works for network routing, and for many problems it is the right choice. But pgRouting also ships with pgr_aStar, an implementation of the A* algorithm that can find the same shortest path while exploring fewer edges. The difference comes down to one thing: a heuristic that tells the algorithm which direction to look.
In my previous post on pgRouting, I showed how to run shortest-path queries directly inside PostgreSQL. That approach works well when your road data is already in Postgres and your network is moderate-sized. But what happens when you need live traffic data, global coverage, or routing at thousands of queries per second? That is where external routing APIs and dedicated routing engines come in.
If your application needs to answer “what is the fastest route between two points,” you might reach for an external routing API like Mapbox Directions. But if your spatial data is already stored in PostgreSQL, the Postgres extension pgRouting lets you run graph-based routing queries right where the data is.
In 2018, I wrote about using SQL functions to generate random test data in MySQL. While that approach served its purpose, the landscape of test data generation has evolved significantly. Today, I want to share my experience with using the Faker library, which has become my go-to tool for creating realistic test datasets.
I’ve had the chance to share my database expertise in a variety of venues: speaking at meetups and conferences, leading hands-on workshops, mentoring new technologists, and of course writing.
A database transformation and migration project takes solid planning and testing. I’ve found that three common changes required when transforming a SQL Server database to YugabyteDB YSQL are related to syntax, performance, and stored procedures. These will get you started on your transformation project.