Normalizing Data

When analyzing data, it is important to do some housekeeping on the raw data before you perform your analysis.  Normalizing data is the process of removing “outliers” and other oddities in the data.  These rogue pieces of data  can corrupt your analysis.  This can lead to inaccurate metrics and ultimately poor decisions.Normalizing Data - Removing Outliers

 Normalizing data is essential to proper metric analysis.

So what is involved in normalizing data?  Let’s assume our business is a locally owned candy/ice cream store.  You want to calculate your average gross margin, average dollar purchases per customer, and top selling items.   Follow these easy steps and your analysis will be far more accurate and useful.

  1. Segment your data.   If our candy store business has both individual customers as well as other businesses such as restaurants, realtors, etc.  It is best to separate the data for each segment.  That will allow analysis for both types of customers rather than a blend.  Blended data is generally bad because it is not typical of the individual groups.  The smaller the segments the better.
  2. Identify outliers.  Once you have properly identified your segments, it is much easier to identify the outliers.  Let’s assume in the candy store example the average individual purchase is $6 to $40.  A non-business customer places a one-time order for $250 for a birthday party.  This one time order will skew the average individual orders to a higher than normal level.  Moving this order out of the individual data will increase the accuracy of it.
  3. Adjust for seasonality.   If you have certain seasons that have a strong influence on your metrics, they must be accounted for.  In the candy business, you probably have some high seasons like pre-Christmas, Valentines day, etc.  Business probably drops significantly in January due to New Year’s resolutions.  Over time you can identify the impact of this seasonality, and adjust for it.  For example, you may determine that every January, sales are down 20% on average versus the 4th quarter.  When January rolls around and your sales are flat over the 4th quarter, in reality you had a great month.
  4. Look for other oddities.  Sometimes you are not aware of what qualifies as an oddity until you see it.  In other words, spend some time just perusing your data with no specific intent.  Just explore!  You may be surprised with what you may notice.  The side benefit of this exercise is you will become more familiar with the data.  This means you will will be more likely to notice unusual pieces of data in the future.