Data Engineering

Query Optimization with HypoPG

Query Optimization with HypoPG

Using HypoPG to test hypothetical indexes for query optimization in YugabyteDB

Valerie Parham-Thompson
Query optimization is a critical aspect of database performance tuning. While YugabyteDB’s YSQL API provides powerful tools for analyzing query performance through EXPLAIN plans, sometimes we need to experiment with different indexing strategies without the overhead of actually creating the indexes. This is where HypoPG comes in handy.
Random Data Generation: Then and Now

Random Data Generation: Then and Now

Modern approaches to generating test data with Python Faker

Valerie Parham-Thompson
In 2018, I wrote about using SQL functions to generate random test data in MySQL. While that approach served its purpose, the landscape of test data generation has evolved significantly. Today, I want to share my experience with using the Faker library, which has become my go-to tool for creating realistic test datasets.
College Scorecard Application

College Scorecard Application

Building an application to serve College Scorecard open data

Valerie Parham-Thompson
Still working on the College Scorecard dataset. Previously I explored the dataset in a real-world application, talked about how to clean the data, and worked with the data API.
Handling Reserved Keywords in DSBulk for Seamless Data Migration

Handling Reserved Keywords in DSBulk for Seamless Data Migration

How to handle reserved keywords using Datastax DSBulk in YugabyteDB migration

Valerie Parham-Thompson
Migrating to YugabyteDB offers significant advantages in terms of high availability, global distribution, and horizontal scalability—features essential for managing modern database workloads. However, data migration can be a complex process, particularly when transforming your schema definition. Differences in datatype support, query syntax, and core features across systems can complicate the transformation.
College Scorecard API

College Scorecard API

Mapping College Scorecard data using the API

Valerie Parham-Thompson
After I finished the YugabyteDB universe network mapping example, I started thinking about other things to map. Anything with latitude and longitude will work. College locations from my previous work on the College Scorecard data set were an obvious choice.
Code as Instructional Technology

Code as Instructional Technology

Writing an interactive command-line tool as a learning tool for YugabyteDB REST APIs

Valerie Parham-Thompson
I’ve had the chance to share my database expertise in a variety of venues: speaking at meetups and conferences, leading hands-on workshops, mentoring new technologists, and of course writing.
Count Large Partitions in YCQL

Count Large Partitions in YCQL

Counting large partitions in the YugabyteDB Cassandra API

Valerie Parham-Thompson
One thing that can really wreck your performance in Cassandra and the similar YugabyteDB YCQL is large partitions due to an imbalanced key. Without the robust nodetool commands of Cassandra, it can be challenging to find these large partitions in YugabyteDB.
Database transformation from SQL Server to YugabyteDB

Database transformation from SQL Server to YugabyteDB

Migrating data from SQL Server to YugabyteDB

Valerie Parham-Thompson
A database transformation and migration project takes solid planning and testing. I’ve found that three common changes required when transforming a SQL Server database to YugabyteDB YSQL are related to syntax, performance, and stored procedures. These will get you started on your transformation project.
Correct Partition Endpoints

Correct Partition Endpoints

Using the correct endpoints in YugabyteDB database partitioning

Valerie Parham-Thompson
I was recently reviewing a database partitioning definition in YugabyteDB (the postgres “ysql” API), and realized the partition distribution might not be what the developer intended.