Sample Rate vs Aggregation Uncertainty

🎯

Purpose

Your app generates a lot of telemetry. You want to drop redundant data to save money, but you still want live aggregations over that data to be accurate. The answer is sampling: keep enough for accurate aggregations, and drop the rest. But how much can you sample before you can't trust your graphs? The answer is different per situation, depending on data volume and the probably distribution of the numbers you're aggregating. This tool can give you a sense for it.

🔬

Interactive Demo

Adjust the volume, sample rate, and distribution parameters below to see real-time simulations of how sampling affects aggregation accuracy.

⚠️

Disclaimer

This was coded with a LOT of AI assistance. We haven't checked the math. The simulations and aggregations seem right.

10,000
1:10

Choose a Distribution of the Values we are Measuring 🌊

{number of events} events go in

to {number of simulations} simulations

Sampled at a rate of 1:{sample rate}

Saving you {percentage reduction}

About {number of events / sample rate} events come out

Then we aggregate the sampled events.

P99 For Each Simulation, Before and After Sampling

COUNT For Each Simulation, Before and After Sampling

SUM For Each Simulation, Before and After Sampling

AVERAGE For Each Simulation, Before and After Sampling

Running simulations...