In my previous blog post, I used analyze after defining extended statistics. But why do we run this statement?
Something you probably already know about Postgres: The planner needs to know things about the data to come up with a good access plan. For example, it needs to decide whether to use an index or a sequential scan. It’s looking at things like row count or distinct values when it comes up with this plan.
Most database problems are fixable. A slow query gets an index. A hot table gets partitioned. A node runs out of memory and you add another. The feedback loop is tight enough that you can experiment your way to a better configuration without touching application code or migrating data.
The last couple of posts in this series described navigation algorithms: robot vacuums covering a living room floor, then self-driving taxis doing the same at city scale. In that second post, I mentioned that Waymo vehicles sweep lidar continuously to sense the road around them. That’s lidar as a real-time perception tool. There’s another side to it: aerial terrain surveys that produce large static datasets, published and publicly available, and queryable with SQL.
Having a shared vocabulary across database, software, and infrastructure teams is critical when working together to tune latency issues. I’ve been in many incident rooms where the only report is “the application is slow” and had to unwind a series of questions: What do you mean by slow? Where do you see this? What parts are slow? If everyone in the room had read Enberg’s Latency, solving these kinds of incidents would be much faster.
Database performance problems are often mysterious. Queries slow down, CPU usage spikes, or users complain about latency, but pinpointing the cause requires visibility into what your database is actually doing. pg_stat_statements is PostgreSQL’s answer to this challenge.
In my previous post on routing, I used Dijkstra’s algorithm without much discussion of alternatives. The Dijkstra algorithm works for network routing, and for many problems it is the right choice. But pgRouting also ships with pgr_aStar, an implementation of the A* algorithm that can find the same shortest path while exploring fewer edges. The difference comes down to one thing: a heuristic that tells the algorithm which direction to look.
Vacuum robots and self-driving taxi robots navigate physical spaces and avoid obstacles. That’s a super broad characterization. The more interesting question is why everything else about them is so different.
A developer asked me recently which open data sources he could use for real-time traffic in his application. The list is shorter than he expected, and the main reason is licensing.
Query optimization is a critical aspect of database performance tuning. While YugabyteDB’s YSQL API provides powerful tools for analyzing query performance through EXPLAIN plans, sometimes we need to experiment with different indexing strategies without the overhead of actually creating the indexes. This is where HypoPG comes in handy.