Distributed-Systems

Plotly Network Map

Plotly Network Map

Using the Plotly library to work with geographic data

Valerie Parham-Thompson

I’ve added a new feature to the day 2 ops tool.

With the diagram command, you can create a map of your Yugabyte cluster overlaid on a map of the world. Here’s an example:

Yugabyte Network Map

The Plotly library is very powerful, with a lot of options. I used the network map option, which allows you to define nodes and the edges between the nodes. In this case, the nodes are an abstraction of the database instances in a YugabyteDB cluster, and the edges represent the network connections between them.

Optimizing Read and Write Latency

Optimizing Read and Write Latency

Reducing latency in reads and writes in YugabyteDB

Valerie Parham-Thompson

Today’s global and distributed applications often need to serve user requests from a single data source across different regions. While providing data scaling and protection against network outages, ensuring low-latency access to data is critical for providing a seamless user experience. YugabyteDB, a distributed SQL database, is designed to handle global data workloads efficiently. In this blog post, I’ll share some techniques to optimize read and write latency in a multi-region YugabyteDB cluster.

Tablet Sizing Strategies

Tablet Sizing Strategies

Valerie Parham-Thompson

Modern distributed databases split large tables into tablets to enable parallel processing and efficient data distribution. Finding the right tablet size impacts everything from query performance to operational overhead. Let’s explore how to approach tablet sizing systematically to achieve optimal performance.

Understanding Tablet Impact

Each tablet in your distributed database represents an independent unit of data distribution. When you create tablets, you influence system behavior at multiple levels. The database uses tablets to parallelize operations, manage resources, and handle data growth. Your tablet strategy directly affects query response times, write throughput, and overall system health.

Leveraging time to live (TTL)

Leveraging time to live (TTL)

Using TTL to expire records in YugabyteDB and Cassandra

Valerie Parham-Thompson

In both MySQL and Postgres, expiring records after a set period of time takes a couple of timestamps and a little creativity. With Cassandra, or in this case the YugabyteDB ycql API, TTL (time to live) can be leveraged to handle this functionality, simplifying both the table definition and amount of work required by your code.

Here’s a short test to demonstrate. Reminder that you can set up a quick 3-node cluster using the code here: https://github.com/dataindataout/xtest_ansible.

Processing Data with Pandas

Processing Data with Pandas

Processing weather data from NOAA in YugabyteDB with the Pandas library

Valerie Parham-Thompson

I’ve been experimenting with processing data with Pandas this week, specifically historical NOAA weather data, and storing it in a local YugabyteDB cluster. This open data set contains max/min/precipitation for years back to 1750 (not all data points are available for all years or locations). It’s available here: https://www.ncei.noaa.gov/metadata/geoportal/rest/metadata/item/gov.noaa.ncdc:C00861/html

I leveraged my existing demo framework to provision a local YugabyteDB cluster, and then used Pandas to import data from txt and csv files. The txt lookup files were countries, states, stations, and inventory. The csv files were available in different formats. The code I’ve linked below imports all weather data for a single year.

Foreign Data Wrappers

Foreign Data Wrappers

Using foreign data wrappers to combine metrics across a distributed database cluster

Valerie Parham-Thompson

I was recently setting up a demo to show off query logging features. Two common extensions, pg_stat_statements and pg_stat_monitor, store data locally. In the case of a distributed database, it is helpful to combine the query runtimes on all nodes.

YugabyteDB supports foreign data wrappers, so I decided to use this feature to combine query statistics from each of my three test nodes.

The libraries for the pg_stat_monitor extension are already installed, so the extension just needs to be created:

YugabyteDB Snapshots

YugabyteDB Snapshots

Taking snapshots and backups in YugabyteDB

Valerie Parham-Thompson

A distributed database is designed to withstand outages to a good degree. However, you should also maintain backups in case of “oops” scenarios like a dropped table.

The yb-admin tool can be used to manage snapshots. Here’s a brief walkthrough.

Some caveats about using snapshots… They are stored on the same server, so this method doesn’t protect against file system corruption. Also, this doesn’t snapshot the schema, just data.

If you don’t already have a test environment, check out a quick test setup here https://github.com/dataindataout/xtest_ansible.

String Search

String Search

Link to YFTT fuzzy matching for string searches

Valerie Parham-Thompson

Quick post to share my presentation last week at the YugabyteDB Friday Tech Talk. It was on fuzzy matching, and more generally string searches. Got to nerd out on two of my favorite topics: words (broadly, linguistics and specifically, names) and databases. Check it out!

(Code for scenarios in my repo, here: https://github.com/dataindataout/xtest_ansible/tree/main/scenarios/fuzzy)

https://www.youtube.com/watch?v=vmHRnR1nFdQ

Replication scenarios

Replication scenarios

Using Ansible to model async replication scenarios for YugabyteDB xcluster

Valerie Parham-Thompson

I recently put together a platform to demo a handful of scenarios related to YugabyteDB cross-cluster replication.

The code is here: https://github.com/dataindataout/xtest_ansible

This works for Mac (Apple M1) and should work on later versions of Mac and Linux. Unsure if it will work on Windows.

You will need a copy of YugabyteDB (2.16 or 2.17, depending on which branch of the demo code you use). Note that xcluster functionality improves greatly at 2.17, so test at that version or beyond if you can.

Development Environment for YugabyteDB on Mac M1

Development Environment for YugabyteDB on Mac M1

Setting up a development environment for YugabyteDB on a Mac M1

Valerie Parham-Thompson

Here’s a very quick way to set up YugabyteDB on your Mac for functional testing. It assumes you already have Homebrew installed.

brew tap yugabyte/yugabytedb
brew install yugabytedb

In the future, you can upgrade the version by running this:

brew upgrade yugabytedb

Verify the installation and check the version:

yugabyted version

Set up local networking:

sudo ifconfig lo0 alias 127.0.0.2
sudo ifconfig lo0 alias 127.0.0.3

Then you can set up a three-node YugabyteDB cluster. Change the data directory if you’d like.