Database Engineering

What Lidar Data Looks Like and How to Query It with DuckDB

Valerie Parham-Thompson
The last couple of posts in this series described navigation algorithms: robot vacuums covering a living room floor, then self-driving taxis doing the same at city scale. In that second post, I mentioned that Waymo vehicles sweep lidar continuously to sense the road around them. That’s lidar as a real-time perception tool. There’s another side to it: aerial terrain surveys that produce large static datasets, published and publicly available, and queryable with SQL.

Using pg_stat_statements for Query profiling and performance tuning

pg_stat_statements is an extension that tracks execution statistics for every normalized SQL statement.

Valerie Parham-Thompson
Database performance problems are often mysterious. Queries slow down, CPU usage spikes, or users complain about latency, but pinpointing the cause requires visibility into what your database is actually doing. pg_stat_statements is PostgreSQL’s answer to this challenge.

How A* Improves on Dijkstra (and When It Doesn't)

Valerie Parham-Thompson
In my previous post on routing, I used Dijkstra’s algorithm without much discussion of alternatives. The Dijkstra algorithm works for network routing, and for many problems it is the right choice. But pgRouting also ships with pgr_aStar, an implementation of the A* algorithm that can find the same shortest path while exploring fewer edges. The difference comes down to one thing: a heuristic that tells the algorithm which direction to look.

Query Optimization with HypoPG

Using HypoPG to test hypothetical indexes for query optimization in YugabyteDB

Valerie Parham-Thompson
Query optimization is a critical aspect of database performance tuning. While YugabyteDB’s YSQL API provides powerful tools for analyzing query performance through EXPLAIN plans, sometimes we need to experiment with different indexing strategies without the overhead of actually creating the indexes. This is where HypoPG comes in handy.

Random Data Generation: Then and Now

Modern approaches to generating test data with Python Faker

Valerie Parham-Thompson
In 2018, I wrote about using SQL functions to generate random test data in MySQL. While that approach served its purpose, the landscape of test data generation has evolved significantly. Today, I want to share my experience with using the Faker library, which has become my go-to tool for creating realistic test datasets.