Automation

Random Data Generation: Then and Now

Random Data Generation: Then and Now

Modern approaches to generating test data with Python Faker

Valerie Parham-Thompson

In 2018, I wrote about using SQL functions to generate random test data in MySQL. While that approach served its purpose, the landscape of test data generation has evolved significantly. Today, I want to share my experience with using the Faker library, which has become my go-to tool for creating realistic test datasets.

The Traditional SQL Approach

The traditional approach to generating test data relied heavily on SQL functions like RAND() and string manipulation. This method worked but had limitations:

Finding the Right Yugabyte Api Endpoint

Finding the Right Yugabyte Api Endpoint

Tour through the YugabyteDB YBA API endpoints with a real-world example

Valerie Parham-Thompson

As YugabyteDB continues to evolve, its extensive API ecosystem offers powerful capabilities for database management and automation. However, with hundreds of API endpoints across overlapping categories, locating exactly the right API endpoint can be challenging. In this guide, I’ll walk you through several proven strategies for efficiently finding the API endpoints you need, along with real-world examples and pro tips I’ve learned from working with YugabyteDB’s API ecosystem.

Method 1: Navigating Categories in the API Documentation

The API documentation (api-docs.yugabyte.com) provides a well-organized categorical view of available endpoints. Understanding how to navigate these categories effectively will significantly speed up your API discovery process:

How do you keep up with technology?

How do you keep up with technology?

Using Obsidian for close reading and annotation of technical articles

Valerie Parham-Thompson

One of my favorite interview questions is, “How do you keep up with technology?” The answer to this question shows a lot about a candidate. Do they use down time at work to read up on the recent blogs? Are they asking for new assignments to stretch their skillsets? What about the thought leaders are in the space – are they connected?

But we do have to face that there are more new technologies and news about technology than anyone could possibly read about in a day. In my chosen slice of technology, what is a small slice of open source databases, there seems to be a new major database every 6 months. That’s not even counting feature updates, security bugs, and the broad ecosystem around databases. My inbox has been full to overflowing of invitations to review, attend conferences, notice this hot new feature in a top technology, etc.

Processing Data with Pandas

Processing Data with Pandas

Processing weather data from NOAA in YugabyteDB with the Pandas library

Valerie Parham-Thompson

I’ve been experimenting with processing data with Pandas this week, specifically historical NOAA weather data, and storing it in a local YugabyteDB cluster. This open data set contains max/min/precipitation for years back to 1750 (not all data points are available for all years or locations). It’s available here: https://www.ncei.noaa.gov/metadata/geoportal/rest/metadata/item/gov.noaa.ncdc:C00861/html

I leveraged my existing demo framework to provision a local YugabyteDB cluster, and then used Pandas to import data from txt and csv files. The txt lookup files were countries, states, stations, and inventory. The csv files were available in different formats. The code I’ve linked below imports all weather data for a single year.

Foreign Data Wrappers

Foreign Data Wrappers

Using foreign data wrappers to combine metrics across a distributed database cluster

Valerie Parham-Thompson

I was recently setting up a demo to show off query logging features. Two common extensions, pg_stat_statements and pg_stat_monitor, store data locally. In the case of a distributed database, it is helpful to combine the query runtimes on all nodes.

YugabyteDB supports foreign data wrappers, so I decided to use this feature to combine query statistics from each of my three test nodes.

The libraries for the pg_stat_monitor extension are already installed, so the extension just needs to be created:

Provision Ansible Postgres on Mac

Provision Ansible Postgres on Mac

Using Ansible to create a local testing environment for Postgres on Mac

Valerie Parham-Thompson

I added a new database to my demo platform: Postgres. This code helps me provision Ansible Postgres on Mac for demo purposes or simple functional testing, and it is an extension of previous work I shared: https://valerieparhamthompson.com/posts/string-search/.

The script does a postgres install via Homebrew for Mac M1 and starts it up, then creates the database, user, etc. needed for the demo. Finally, it populates using my “million table” sql.

Most of this uses the Community.Postgres Ansible module found here: https://docs.ansible.com/ansible/latest/collections/community/postgresql/index.html