Data volume reduction

Evaluating Data Volume in ClickHouse

Download and execute the script to evaluate the volume of data in ClickHouse:

wget https://docs.mitigator.ru/collector/v24.10/maintenance/reduce_db/utilization.sql && \
docker-compose exec -T clickhouse clickhouse-client -mn < utilization.sql
Info

Running this script may take a long time.

This script calculates the following metrics:

  • min_date - the earliest timestamp in the database;
  • flow_quantity - the number of flow records stored in the database;
  • single_flow_size - the average size of one flow record in bytes;
  • max_flow_size - the maximum available volume for storing flow records in bytes;
  • max_flow_quantity - The maximum available number of flow records that can be stored in the database;
  • max_flow_rate - The observed maximum rate of writing flow records per second;
  • avg_flow_rate - The observed average rate of writing flow records per second;
  • ttl_for_max_flow_rate - the recommended time-to-live (TTL) for records at the observed maximum write rate;
  • ttl_for_avg_flow_rate - the recommended TTL for records at the observed average write rate.

Based on these calculated metrics, choose new data compression periods and record TTLs.

ParameterCalculation MethodDefault Value
First Compression PeriodDesired period of working with operational data. Less than the recommended TTL at maximum write rate1 month
Second Compression Period0.5 * Recommended TTL at average write rate6 months
Record TTLRecommended TTL at average write rate1 year

Procedure for setting recompression and Time-to-Live periods

  1. Evaluate the data volume in ClickHouse and choose new recompression and TTL periods.

  2. Check the number of flow records stored in the database does not exceed the maximum available number of flow records that can be stored in the database. If exceeded, manually remove old records.

  3. Download the script for setting recompression and TTL periods:

    wget https://docs.mitigator.ru/collector/v24.10/maintenance/reduce_db/volume_reduction.sql
  4. Edit the script to specify new recompression and TTL periods for all tables.

  5. Run the script:

    docker-compose exec -T clickhouse clickhouse-client -mn < volume_reduction.sql
Info

The new recompression and TTL periods will be applied within 4 hours. During application, there may be increased CPU load and disk usage. This will not affect processing of new data.

Manual data deletion in ClickHouse

To reduce the size of the database, you can delete old records by running the following commands:

docker-compose exec clickhouse clickhouse-client --query="ALTER TABLE collector.flows DELETE WHERE timestamp_sec < toUnixTimestamp('1970-01-01');" && \
docker-compose exec clickhouse clickhouse-client --query="ALTER TABLE collector.rate_metrics DELETE WHERE date < toDateTime('1970-01-01');" && \
docker-compose exec clickhouse clickhouse-client --query="ALTER TABLE collector.snmp_counters DELETE WHERE date < toDateTime('1970-01-01');"

where 1970-01-01 is the desired date.