Count Large Partitions in YCQL

Counting large partitions in the YugabyteDB Cassandra API

Count Large Partitions in YCQL

One thing that can really wreck your performance in Cassandra and the similar YugabyteDB YCQL is large partitions due to an imbalanced key. Without the robust nodetool commands of Cassandra, it can be challenging to find these large partitions in YugabyteDB.

dsbulk is a tool used for migrating data, and YugabyteDB has a fork that takes into consideration slight differences from Cassandra. That tool can be leveraged to list the top largest partitions.

In the following example, I’ve created a small table with a “large” partition where a=3.

CREATE TABLE IF NOT EXISTS table1 (
a int,
b int,
c blob,
PRIMARY KEY(a, b)
);

insert into table1 (a, b, c) values (1,1,0x0b5db8b91bfdeb0a111b372dd8dda123b3fd1ab6);
insert into table1 (a, b, c) values (2,2,0x0b5db8b91bfdeb0a112b372dd8dda123b3fd1ab6);
insert into table1 (a, b, c) values (3,3,0x0b5db8b91bfdeb0a113b372dd8dda123b3fd1ab6);
insert into table1 (a, b, c) values (3,4,0x0b5db8b91bfdeb0a113b372dd8dda123b3fd1ab7);
insert into table1 (a, b, c) values (3,5,0x0b5db8b91bfdeb0a113b372dd8dda123b3fd1ab8);

Using dsbulk, I can get the top (configurable number) partitions.

dsbulk count -k keyspace1 -t table1 -stats partitions -partitions 3
Operation directory: /private/tmp/data1/yb-data/tserver/data/rocksdb/table-1741e9048b414a2a93bcef866e6df115/logs/COUNT_20240520-192501-426003
total | failed | rows/s | p50ms | p99ms | p999ms
    5 |      0 |     19 |  3.35 |  8.85 |   8.85
Operation COUNT_20240520-192501-426003 completed successfully in .

The output is the partition key, the count within the key, and the size in bytes.

3 3 60.00
1 1 20.00
2 1 20.00