Database Engineering

Using pg_stat_statements for Query profiling and performance tuning

pg_stat_statements is an extension that tracks execution statistics for every normalized SQL statement.

Valerie Parham-Thompson

Database performance problems are often mysterious. Queries slow down, CPU usage spikes, or users complain about latency, but pinpointing the cause requires visibility into what your database is actually doing. pg_stat_statements is PostgreSQL’s answer to this challenge.

pg_stat_statements is an extension that tracks execution statistics for every normalized (fingerprinted) SQL statement. Instead of logging millions of nearly-identical queries, it groups similar statements together (with constants replaced by placeholders), aggregating their execution metrics into a single fingerprint. This approach provides comprehensive query-level insights with minimal performance overhead and storage cost.

Query Optimization with HypoPG

Query Optimization with HypoPG

Using HypoPG to test hypothetical indexes for query optimization in YugabyteDB

Valerie Parham-Thompson

Query optimization is a critical aspect of database performance tuning. While YugabyteDB’s YSQL API provides powerful tools for analyzing query performance through EXPLAIN plans, sometimes we need to experiment with different indexing strategies without the overhead of actually creating the indexes. This is where HypoPG comes in handy.

Understanding HypoPG

HypoPG is a PostgreSQL extension that allows you to create hypothetical indexes and see how they would affect your query plans without actually creating the indexes. This is particularly useful when:

Random Data Generation: Then and Now

Random Data Generation: Then and Now

Modern approaches to generating test data with Python Faker

Valerie Parham-Thompson

In 2018, I wrote about using SQL functions to generate random test data in MySQL. While that approach served its purpose, the landscape of test data generation has evolved significantly. Today, I want to share my experience with using the Faker library, which has become my go-to tool for creating realistic test datasets.

The Traditional SQL Approach

The traditional approach to generating test data relied heavily on SQL functions like RAND() and string manipulation. This method worked but had limitations:

Code as Instructional Technology

Code as Instructional Technology

Writing an interactive command-line tool as a learning tool for YugabyteDB REST APIs

Valerie Parham-Thompson

I’ve had the chance to share my database expertise in a variety of venues: speaking at meetups and conferences, leading hands-on workshops, mentoring new technologists, and of course writing.

I had been brewing a new idea for sharing content when a great opportunity landed in my lap.

The idea was: share what I know about managing a specific database product in code. Instead of creating a runbook for how to set up replication, I would write code that sets up replication. The key part is that it would have to be well-organized, commented, and documented to be useful to learners. Making it interactive would help users understand the options and parameters as they chose the commands and added flags. Even the error statements would give them insight into how it all works.

Tablet Sizing Strategies

Tablet Sizing Strategies

Valerie Parham-Thompson

Modern distributed databases split large tables into tablets to enable parallel processing and efficient data distribution. Finding the right tablet size impacts everything from query performance to operational overhead. Let’s explore how to approach tablet sizing systematically to achieve optimal performance.

Understanding Tablet Impact

Each tablet in your distributed database represents an independent unit of data distribution. When you create tablets, you influence system behavior at multiple levels. The database uses tablets to parallelize operations, manage resources, and handle data growth. Your tablet strategy directly affects query response times, write throughput, and overall system health.

Audit Logging

Audit Logging

YugabyteDB Friday Tech Talk on fuzzy matching for string searches

Valerie Parham-Thompson

Gearing up for my next YFTT presentation next month. It will be on fuzzy matching, a chance to show out some neat string search features.

Meanwhile, here’s the deck for my last YFTT. The topic was audit logging.

https://info.yugabyte.com/hubfs/YFTT%20Slide%20Decks/2022_12_02_YFTT_Valerie%20Parham-Thompson_Audit%20Logging%20in%20YugabyteDB.pdf

Audit logging is just one of the security features available in YugabyteDB. You can use it to tell you the “who, what, when, where” of actions on your systems. The logs can be then sent to a log analysis system for archiving and correlation with other logs.

Fuzzy matching in YugabyteDB

Fuzzy matching in YugabyteDB

Techniques for matching similar strings in YugabyteDB

Valerie Parham-Thompson

Fuzzy string matching in YugabyteDB can be done with wildcard lookups, phonetic algorithms (Soundex, Metaphone), and trigram similarity. I’ll show a demo of practical examples using artist names, highlighting the performance differences between wildcard searches and phonetic indices. A combination of indexed double metaphone and trigram methods works best for both speed and precision. Also, while YugabyteDB supports PostgreSQL-style extensions, some indexing optimizations behave differently due to its distributed storage layer.

Audit Logging in YugabyteDB | YugabyteDB Friday Tech Talks

Audit Logging in YugabyteDB | YugabyteDB Friday Tech Talks

Yugabyte's session-level and object-level audit logging improves security and compliance

Valerie Parham-Thompson

Audit logging is essential for tracking the “who, what, when, and where” of database access and changes, supporting both security and compliance requirements. It helps organizations know who accessed or modified data, schemas, roles, or grants.

YugabyteDB offers both session-level and object-level audit logging. The system builds upon standard PostgreSQL logging (including the pg-audit extension) but adds distributed system-specific details, such as cloud, region, availability zone, and host information, to distinguish actions across nodes in a distributed environment.

Optimizing YugabyteDB Memory Tuning for YSQL

Optimizing YugabyteDB Memory Tuning for YSQL

Learn how to configure YugabyteDB memory for YSQL workloads by adjusting process ratios and key performance parameters

Valerie Parham-Thompson

Memory configuration in YugabyteDB for YSQL workloads involves partitioning among the tserver, master, and postgres processes, each with default ratios. Adjusting these ratios based on workload characteristics helps avoid out-of-memory events. Monitoring memory usage is crucial, and tuning parameters like max_connections, work_mem, and temp_file_limit can optimize both performance and resource utilization.

Read more!

Best Practices for Monitoring YugabyteDB

Best Practices for Monitoring YugabyteDB

Operational visibility is key to understanding your database across dimensions of uptime, performance, and capacity planning

Valerie Parham-Thompson

At DSS 2021, I provided a comprehensive orientation to monitoring YugabyteDB, focusing on how to interpret and leverage built-in metrics for operational visibility.

Three key dimensions of database monitoring:

  • Uptime (Alerting): Ensuring the system is running and healthy through critical alerts.
  • Performance (Trending): Tracking historical performance to detect changes and optimize queries.
  • Capacity Planning (Forecasting): Using metrics to project future infrastructure needs based on current utilization and expected growth.

Types of monitoring: