top of page

You have more data than you think

Writer: Fernando CuencaFernando Cuenca

"I don't have enough data to make any meaningful analysis... I need to wait until I collect more..."


Actually, you need less data than you think:


💡 5 data points are enough to know the order of magnitude of the distribution's SCALE (are we talking about days? months? weeks? years?) 


💡 12 data points: take the central 6 data points, they determine the "range of the median" (the "typical case", "this is how long things usually take")


💡 30 data points: things get more interesting:

  • take the lowest 6 data points: range of the "best case" (10th percentile, "this is how fast we can be")

  • take the highest 6 data points: range of the "worst case" (90th percentile, "this is how bad it can get")

  • take the central 10 data points: range of the "typical case" (median, or 50th percentile)



In all these cases you can compare the ranges you get from the data to the expectation of your customer/stakeholders, and use that as a guide to stimulate improvement. 


To end with an "alexeism": 

"An improved service is better than a more precise model of an unsatisfactory service" 😉 -- Alexei Zheglov

 

Some additional clarifications

A clarification on the meaning of the ranges you find with this technique: they refer to a high confidence range (90%) for the location of the given percentile. So, for example, the 6 central data points in a dataset of 12 samples gives us a range where, with 90% confidence, we can expect to find the median.


In the example diagram above, for 12 data points the range would be 3 to 9, meaning: I can say with 90% confidence that the median for that distribution will be located there. Of course, I can't say how close to 3 or how close to 9, and there's a small chance it will fall outside the range.


The point here is not high degree of accuracy, but to show that a few data points are enough to have some informed starting point for a conversation. For example, if someone claims that the "work here takes months", the 5 data points in the example above are enough for me to respond that it's likely not the case, that we should be discussing "weeks" and not "months".

 
 
 

Comentários


bottom of page