"My Lead Time distribution shows some clear outliers. I can ignore them, right? they skew my data...."
💡 A data point is not an "outlier" (i.e. something that can be ignored) simply because it's an extreme value. You need to know first if it was produced by "common cause" variation (natural fluctuations that are the direct result of how a system works), or "special cause" variation (fluctuations produced by events or conditions external to the system).
👉 If you determine that you're in the presence of "special cause" variation, it might be adequate to ignore the data point. That said, the very fact that you're seeing that extreme data point means that your system is able to produce such extremes, and discarding it means discarding potentially valuable information.
Additionally, often times distinguishing between "common cause" and "special cause" may not be straightforward to do. Understanding the reasons behind an extreme data point can give you insights about risks that you're exposed to, helping you introduce mitigation measures and other process improvements.
So, no, I wouldn't ignore the "outliers" - at least not automatically 😉
Comments