Vacuum robots and self-driving taxi robots navigate physical spaces and avoid obstacles. That’s a super broad characterization. The more interesting question is why everything else about them is so different.
I have been writing about shortest-path algorithms and A* heuristics in the context of road networks and pgRouting. But the same graph search concepts show up in a device that millions of people own and never think twice about: the robot vacuum.
Today’s global and distributed applications often need to serve user requests from a single data source across different regions. While providing data scaling and protection against network outages, ensuring low-latency access to data is critical for providing a seamless user experience. YugabyteDB, a distributed SQL database, is designed to handle global data workloads efficiently. In this blog post, I’ll share some techniques to optimize read and write latency in a multi-region YugabyteDB cluster.
Modern distributed databases split large tables into tablets to enable parallel processing and efficient data distribution. Finding the right tablet size impacts everything from query performance to operational overhead. Let’s explore how to approach tablet sizing systematically to achieve optimal performance.
In both MySQL and Postgres, expiring records after a set period of time takes a couple of timestamps and a little creativity. With Cassandra, or in this case the YugabyteDB ycql API, TTL (time to live) can be leveraged to handle this functionality, simplifying both the table definition and amount of work required by your code.
I’ve been experimenting with processing data with Pandas this week, specifically historical NOAA weather data, and storing it in a local YugabyteDB cluster. This open data set contains max/min/precipitation for years back to 1750 (not all data points are available for all years or locations). It’s available here: https://www.ncei.noaa.gov/metadata/geoportal/rest/metadata/item/gov.noaa.ncdc:C00861/html
I was recently setting up a demo to show off query logging features. Two common extensions, pg_stat_statements and pg_stat_monitor, store data locally. In the case of a distributed database, it is helpful to combine the query runtimes on all nodes.
A distributed database is designed to withstand outages to a good degree. However, you should also maintain backups in case of “oops” scenarios like a dropped table.
Quick post to share my presentation last week at the YugabyteDB Friday Tech Talk. It was on fuzzy matching, and more generally string searches. Got to nerd out on two of my favorite topics: words (broadly, linguistics and specifically, names) and databases. Check it out!