Distributed Systems

Query Optimization with HypoPG

Query Optimization with HypoPG

Using HypoPG to test hypothetical indexes for query optimization in YugabyteDB

Valerie Parham-Thompson

Query optimization is a critical aspect of database performance tuning. While YugabyteDB’s YSQL API provides powerful tools for analyzing query performance through EXPLAIN plans, sometimes we need to experiment with different indexing strategies without the overhead of actually creating the indexes. This is where HypoPG comes in handy.

Understanding HypoPG

HypoPG is a PostgreSQL extension that allows you to create hypothetical indexes and see how they would affect your query plans without actually creating the indexes. This is particularly useful when:

Finding the Right Yugabyte Api Endpoint

Finding the Right Yugabyte Api Endpoint

Tour through the YugabyteDB YBA API endpoints with a real-world example

Valerie Parham-Thompson

As YugabyteDB continues to evolve, its extensive API ecosystem offers powerful capabilities for database management and automation. However, with hundreds of API endpoints across overlapping categories, locating exactly the right API endpoint can be challenging. In this guide, I’ll walk you through several proven strategies for efficiently finding the API endpoints you need, along with real-world examples and pro tips I’ve learned from working with YugabyteDB’s API ecosystem.

Method 1: Navigating Categories in the API Documentation

The API documentation (api-docs.yugabyte.com) provides a well-organized categorical view of available endpoints. Understanding how to navigate these categories effectively will significantly speed up your API discovery process:

College Scorecard API

College Scorecard API

Mapping College Scorecard data using the API

Valerie Parham-Thompson

After I finished the YugabyteDB universe network mapping example, I started thinking about other things to map. Anything with latitude and longitude will work. College locations from my previous work on the College Scorecard data set were an obvious choice.

Previously, I had exported the data and transformed it to allow for sorting and analysis. That’s still a valid method if you want to play with the pull data set, since the API allows only page size of max 100 at a time. However, with the right filters, that might be enough, and the API is a quicker path to getting the data.

Plotly Network Map

Plotly Network Map

Using the Plotly library to work with geographic data

Valerie Parham-Thompson

I’ve added a new feature to the day 2 ops tool.

With the diagram command, you can create a map of your Yugabyte cluster overlaid on a map of the world. Here’s an example:

Yugabyte Network Map

The Plotly library is very powerful, with a lot of options. I used the network map option, which allows you to define nodes and the edges between the nodes. In this case, the nodes are an abstraction of the database instances in a YugabyteDB cluster, and the edges represent the network connections between them.

Count Large Partitions in YCQL

Count Large Partitions in YCQL

Counting large partitions in the YugabyteDB Cassandra API

Valerie Parham-Thompson

One thing that can really wreck your performance in Cassandra and the similar YugabyteDB YCQL is large partitions due to an imbalanced key. Without the robust nodetool commands of Cassandra, it can be challenging to find these large partitions in YugabyteDB.

dsbulk is a tool used for migrating data, and YugabyteDB has a fork that takes into consideration slight differences from Cassandra. That tool can be leveraged to list the top largest partitions.

Optimizing Read and Write Latency

Optimizing Read and Write Latency

Reducing latency in reads and writes in YugabyteDB

Valerie Parham-Thompson

Today’s global and distributed applications often need to serve user requests from a single data source across different regions. While providing data scaling and protection against network outages, ensuring low-latency access to data is critical for providing a seamless user experience. YugabyteDB, a distributed SQL database, is designed to handle global data workloads efficiently. In this blog post, I’ll share some techniques to optimize read and write latency in a multi-region YugabyteDB cluster.

Tablet Sizing Strategies

Tablet Sizing Strategies

Valerie Parham-Thompson

Modern distributed databases split large tables into tablets to enable parallel processing and efficient data distribution. Finding the right tablet size impacts everything from query performance to operational overhead. Let’s explore how to approach tablet sizing systematically to achieve optimal performance.

Understanding Tablet Impact

Each tablet in your distributed database represents an independent unit of data distribution. When you create tablets, you influence system behavior at multiple levels. The database uses tablets to parallelize operations, manage resources, and handle data growth. Your tablet strategy directly affects query response times, write throughput, and overall system health.

Open Source Database

Open Source Database

What does it mean to be a database engineer of multiple open-source databases?

Valerie Parham-Thompson

I’m an open-source database consultant. But which open-source database? Well, several of them.

I made the decision several years ago to take every opportunity to work with multiple databases. Why?

  1. Learning a new language teaches you more about your own. For example, taking time to understand sstables in Cassandra gave me more insight into how storage works in MySQL. Having these experiences across multiple databases forced me to question what I knew about internals, therefore deepening my understanding overall.

Database Scaling for Seasonal Increases

Database Scaling for Seasonal Increases

Understanding seasonal patterns when planning for database scaling

Valerie Parham-Thompson

Timing of seasonal demand depends on the industry, but a cyclical increase in traffic applies to all industries. Maybe your cycle is shorter than a full year. Or maybe it’s related to things like weather patterns or fashion.

You know when your business gets the most traffic.

  • Retail? Black Friday/Cyber Monday.
  • Health and fitness? New Year’s Day.
  • Pizza restaurant? Halloween and Thanksgiving, surprisingly.

You do know your business well. But do you know how to make sure your database infrastructure can keep up with traffic during these busy periods?

Leveraging time to live (TTL)

Leveraging time to live (TTL)

Using TTL to expire records in YugabyteDB and Cassandra

Valerie Parham-Thompson

In both MySQL and Postgres, expiring records after a set period of time takes a couple of timestamps and a little creativity. With Cassandra, or in this case the YugabyteDB ycql API, TTL (time to live) can be leveraged to handle this functionality, simplifying both the table definition and amount of work required by your code.

Here’s a short test to demonstrate. Reminder that you can set up a quick 3-node cluster using the code here: https://github.com/dataindataout/xtest_ansible.