ClickHouse: Drop Old Data

Last updated: November 20, 2025

Use this job to drop data older than X days.

Warnings

1. Please run during off-peak hours (nights, weekends). These can be expensive operations.

2. These queries perform DELETE operations. It is recommended to take a backup of your persistent volume before performing these operations.

3. Run these queries at your own risk. LangChain is not responsible for any lost data.

Step 1: Find parts to target

SELECT
    r._part,
    min(r.inserted_at) AS min_inserted_at,
    max(r.inserted_at) AS max_inserted_at,
    countIf(r.inserted_at < now() - INTERVAL 100 DAYS) AS rows_before_cutoff
FROM runs AS r
JOIN system.parts AS p
    ON p.database = currentDatabase()
   AND p.table = 'runs'
   AND p.name = r._part
   AND p.active
WHERE r.trace_first_received_at IS NOT NULL
  AND r.ttl_seconds IS NOT NULL
GROUP BY r._part
ORDER BY rows_before_cutoff DESC;

This returns output like:

    ┌─_part───────────────────────────┬────────────min_inserted_at─┬────────────max_inserted_at─┬─rows_before_cutoff─┐
 1. │ all_0_5533517_534_9235849       │ 2025-07-13 19:43:30.100585 │ 2025-09-21 14:53:27.728860 │            1562791 │
 2. │ all_8190823_8937443_419_9235849 │ 2025-10-02 15:12:43.225511 │ 2025-10-18 02:38:53.053977 │                  0 │
 3. │ all_9001802_9082188_152_9235849 │ 2025-10-20 14:09:13.299046 │ 2025-10-23 17:12:03.223946 │                  0 │
 4. │ all_8937444_9001801_104_9235849 │ 2025-10-18 02:38:59.031352 │ 2025-10-20 14:09:12.324759 │                  0 │
 5. │ all_9230975_9235606_324_9235849 │ 2025-10-30 13:37:21.609983 │ 2025-10-30 18:48:03.960415 │                  0 │
 6. │ all_9212826_9226335_174_9235849 │ 2025-10-29 21:54:33.545948 │ 2025-10-30 09:04:11.649532 │                  0 │
 7. │ all_9226336_9229183_27_9235849  │ 2025-10-30 09:04:13.245519 │ 2025-10-30 11:38:03.496015 │                  0 │
 8. │ all_5533518_8190822_519_9235849 │ 2025-09-21 14:53:29.635032 │ 2025-10-02 15:12:43.026718 │                  0 │
 9. │ all_9185532_9197540_164_9235849 │ 2025-10-28 20:02:34.214949 │ 2025-10-29 09:01:58.561627 │                  0 │
10. │ all_9230038_9230974_23_9235849  │ 2025-10-30 12:41:07.511320 │ 2025-10-30 13:37:18.976109 │                  0 │
11. │ all_9167893_9185531_112_9235849 │ 2025-10-27 23:57:18.997924 │ 2025-10-28 20:02:07.192944 │                  0 │
12. │ all_9236339_9236339_0           │ 2025-10-30 19:17:13.213276 │ 2025-10-30 19:17:13.213276 │                  0 │
13. │ all_9236338_9236338_0           │ 2025-10-30 19:17:13.141552 │ 2025-10-30 19:17:13.141552 │                  0 │
14. │ all_9235994_9236282_158         │ 2025-10-30 19:07:11.710410 │ 2025-10-30 19:16:08.836892 │                  0 │
15. │ all_9082189_9167892_245_9235849 │ 2025-10-23 17:12:04.777948 │ 2025-10-27 23:57:18.634470 │                  0 │
16. │ all_9229184_9230037_22_9235849  │ 2025-10-30 11:38:13.896638 │ 2025-10-30 12:41:05.571393 │                  0 │
17. │ all_9197541_9212825_226_9235849 │ 2025-10-29 09:01:58.915913 │ 2025-10-29 21:54:29.526225 │                  0 │
18. │ all_9236283_9236337_14          │ 2025-10-30 19:16:09.678017 │ 2025-10-30 19:17:11.889849 │                  0 │
19. │ all_9235607_9235993_35_9235849  │ 2025-10-30 18:48:15.001440 │ 2025-10-30 19:06:43.890718 │                  0 │
    └─────────────────────────────────┴────────────────────────────┴────────────────────────────┴────────────────────┘

Target parts with the largest number of rows to delete first.

Step 2: Delete

Pick a part name and run the following query to trigger a DELETE mutation:

ALTER TABLE default.runs
DELETE WHERE _part = '<your_part_name_here>'
  AND inserted_at < now() - interval X DAYS;

Mutations run asynchronously, so the client will immediately return.

Step 3: Track the mutation

Track the status of the mutation with:

SELECT * FROM system.mutations WHERE is_done = 0 AND is_killed = 0;

You should see a row (maybe 2 at some point). Once the mutation is complete, the is_done field changes to 1 and this query will not return anything.

WARNING: Please wait for each mutation to complete before proceeding, otherwise your query times may be significantly affected.

If you are seeing some significant CPU/memory spikes to the point where it is impacting queries, you can run the following to stop the active mutation:

KILL MUTATION WHERE mutation_id = <mutation_id_from_select_query>

Step 4: Repeat

After a mutation, parts may be renamed. Re-run Step 1 to fetch updated part names, identify a part to delete, delete it, and track the mutation.

Repeat until you have your desired disk space.

You can repeat with the following tables. Use trace_first_received_at instead of inserted_at:

- `runs_metadata_kv`
- `runs_inputs_kv`
- `runs_outputs_kv` 
- `runs_reference_example_id` 
- `runs_run_id_v2` 
- `runs_tags` 
- `runs_token_counts` 
- `runs_trace_id` 
- `runs_lite` 
- `root_runs_lite` 
- `runs_feedbacks_rmt_wide` 
- `runs_token_counts_sources_wide` 
- `runs_token_counts_rolled_up_wide`