Performance Tuning

Using pg_stat_statements for Query profiling and performance tuning

pg_stat_statements is an extension that tracks execution statistics for every normalized SQL statement.

Valerie Parham-Thompson

Database performance problems are often mysterious. Queries slow down, CPU usage spikes, or users complain about latency, but pinpointing the cause requires visibility into what your database is actually doing. pg_stat_statements is PostgreSQL’s answer to this challenge.

pg_stat_statements is an extension that tracks execution statistics for every normalized (fingerprinted) SQL statement. Instead of logging millions of nearly-identical queries, it groups similar statements together (with constants replaced by placeholders), aggregating their execution metrics into a single fingerprint. This approach provides comprehensive query-level insights with minimal performance overhead and storage cost.

Query Optimization with HypoPG

Query Optimization with HypoPG

Using HypoPG to test hypothetical indexes for query optimization in YugabyteDB

Valerie Parham-Thompson

Query optimization is a critical aspect of database performance tuning. While YugabyteDB’s YSQL API provides powerful tools for analyzing query performance through EXPLAIN plans, sometimes we need to experiment with different indexing strategies without the overhead of actually creating the indexes. This is where HypoPG comes in handy.

Understanding HypoPG

HypoPG is a PostgreSQL extension that allows you to create hypothetical indexes and see how they would affect your query plans without actually creating the indexes. This is particularly useful when:

Random Data Generation: Then and Now

Random Data Generation: Then and Now

Modern approaches to generating test data with Python Faker

Valerie Parham-Thompson

In 2018, I wrote about using SQL functions to generate random test data in MySQL. While that approach served its purpose, the landscape of test data generation has evolved significantly. Today, I want to share my experience with using the Faker library, which has become my go-to tool for creating realistic test datasets.

The Traditional SQL Approach

The traditional approach to generating test data relied heavily on SQL functions like RAND() and string manipulation. This method worked but had limitations:

Count Large Partitions in YCQL

Count Large Partitions in YCQL

Counting large partitions in the YugabyteDB Cassandra API

Valerie Parham-Thompson

One thing that can really wreck your performance in Cassandra and the similar YugabyteDB YCQL is large partitions due to an imbalanced key. Without the robust nodetool commands of Cassandra, it can be challenging to find these large partitions in YugabyteDB.

dsbulk is a tool used for migrating data, and YugabyteDB has a fork that takes into consideration slight differences from Cassandra. That tool can be leveraged to list the top largest partitions.

Optimizing Read and Write Latency

Optimizing Read and Write Latency

Reducing latency in reads and writes in YugabyteDB

Valerie Parham-Thompson

Today’s global and distributed applications often need to serve user requests from a single data source across different regions. While providing data scaling and protection against network outages, ensuring low-latency access to data is critical for providing a seamless user experience. YugabyteDB, a distributed SQL database, is designed to handle global data workloads efficiently. In this blog post, I’ll share some techniques to optimize read and write latency in a multi-region YugabyteDB cluster.

Tablet Sizing Strategies

Tablet Sizing Strategies

Valerie Parham-Thompson

Modern distributed databases split large tables into tablets to enable parallel processing and efficient data distribution. Finding the right tablet size impacts everything from query performance to operational overhead. Let’s explore how to approach tablet sizing systematically to achieve optimal performance.

Understanding Tablet Impact

Each tablet in your distributed database represents an independent unit of data distribution. When you create tablets, you influence system behavior at multiple levels. The database uses tablets to parallelize operations, manage resources, and handle data growth. Your tablet strategy directly affects query response times, write throughput, and overall system health.

Database transformation from SQL Server to YugabyteDB

Database transformation from SQL Server to YugabyteDB

Migrating data from SQL Server to YugabyteDB

Valerie Parham-Thompson

A database transformation and migration project takes solid planning and testing. I’ve found that three common changes required when transforming a SQL Server database to YugabyteDB YSQL are related to syntax, performance, and stored procedures. These will get you started on your transformation project.

Syntax

Transforming a schema from MS SQL to YugabyteDB requires some minor syntax changes. This is true for any cross-database transformation. The YugabyteDB YSQL API utilizes PostgreSQL syntax.

Correct Partition Endpoints

Correct Partition Endpoints

Using the correct endpoints in YugabyteDB database partitioning

Valerie Parham-Thompson

I was recently reviewing a database partitioning definition in YugabyteDB (the postgres “ysql” API), and realized the partition distribution might not be what the developer intended.

What is database partitioning?

Database partitioning is used to divide large tables into smaller tables (partitions). While the data is physically separate, the application can access the data logically as a single table.

This can help performance through a process called partition pruning. The database planner skips partitions that don’t hold the data. For example, if a table is partitioned on months of the year, a query on a single month only has to access the rows in the single partition for that month.

Why You Need a Default Partition

Why You Need a Default Partition

Required default partitions to avoid lost data in Postgres and YugabyteDB

Valerie Parham-Thompson

Postgres and YugabyteDB allow you to define partitions of parent tables. Partitions are useful in at least two ways:

  1. You can take advantage of partition pruning. The database doesn’t need to look at partitions it knows won’t meet the parameters of the query.
  2. You can easily archive data by disconnecting and/or dropping partitions instead of managing expensive delete queries.

Here’s one gotcha I ran into recently. What happens if you insert a row into a partitioned table, but there’s no partition for it? The insert fails with an error – see below for a reproduction of this scenario.

Generate Random Data

Generate Random Data

Generating random data for testing in YugabyteDB

Valerie Parham-Thompson

I had to create a 10 million row table for testing recently, and put together a query to generate random data for it.

INSERT INTO my_table
(id,
mydatetime,
string1,
string2)

SELECT
(random() * 70 + 10)::int,
TIMESTAMP '2024-01-01 00:00:00.000000' + interval '1 millisecond' * (random() * 86400 * 1000 * 365),
(array['alligator','bear','cat','dog'])[(random() * 3 + 1)::int],
substr(md5(random()::text), 1, 10)

FROM generate_series(1, 10);

The id field is just a random integer in this example, but you’d probably use an identity column.