Data Normalization

Data Normalization is the act of taking various forms of data and putting them into a standard format for ease of use. Since data can be gathered from a variety of sources, they are often in many different forms and need to be adjusted before being used for comparison analysis or as a basis for projecting future costs.

Definition: Data normalization is a method of organizing data into structured formats in order to reduce redundancy, create simplicity, and improve integrity.

Purpose of Data Normalization

The purpose of Data Normalization (or cleansing) is to make a given data set consistent with and comparable to other data used in the estimate.

Data Normalization Objective

The objective of data normalization is to improve data consistency so that comparisons and projections are more valid and other data can be used to increase the number of data points. Data are normalized in several ways. [1]

Examples of Data Normalization

Below are examples of certain data types that can be normalized.

Cost Units

Cost units primarily adjust for inflation. Because the cost of an item has a time value, it is important to know the year in which funds were spent. For example, an item that cost $100 in 1990 is more expensive than an item that cost $100 in 2005 because of the effects of inflation over the 15 years that would make the 1990 item more expensive when converted to a 2005 equivalent cost. Costs may also be adjusted for currency conversions. In addition to inflation, the cost estimator needs to understand what the cost represents. For example, does it represent only direct labor or does it include overhead and the contractor’s profit? Finally, cost data have to be converted to equivalent units before being used in a data set. That is, costs expressed in thousands, millions, or billions of dollars must be converted to one format—for example, all costs expressed in millions of dollars. [1]

Sizing Units

Sizing units normalize data to common units—for example, cost per foot, cost per pound, dollars per software line of code. When normalizing data for unit size, it is very important to define exactly what the unit represents: What constitutes a software line of code? Does it include carriage returns or comments? The main point is to clearly define what the sizing metric is so that the data can be converted to a common standard before being used in the estimate. Key Groupings Key groupings normalize data by similar missions, characteristics, or operating environments by cost type or work content. Products with similar mission applications have similar characteristics and traits, as do products with similar operating environments. For example, space systems exhibit characteristics different from those of submarines, but the space shuttle has characteristics distinct from those of a satellite even though they may share common features. Costs should also be grouped by type. For example, costs should be broken out between recurring and nonrecurring or fixed and variable costs. [1]

Technology Maturity

Technology maturity normalizes data for where a program is in its life cycle; it also considers learning and rate effects. The first unit of something would be expected to cost more than the 1,000th unit, just as a system procured at one unit per year would be expected to cost more per unit than the same system procured at 1,000 units per year. Technology normalization is the process of adjusting cost data for productivity improvements resulting from technological advancements that occur over time. [1]

In effect, technology normalization is the recognition that technology continually improves, so a cost estimator must make a subjective attempt to measure the effect of this improvement on historical program costs. For instance, an item developed 10 years ago may have been considered state of the art and the costs would be higher than normal. Today, that item may be available off the shelf and therefore the costs would be considerably less.

Therefore, technology normalization is the ability to forecast technology by predicting the timing and degree of change of technological parameters associated with the design, production, and use of devices. Being able to adjust the cost data to reflect where the item is in its life cycle, however, is very subjective, because it requires identifying the relative state of technology at different points in time.

Homogeneous Groups

Using homogeneous groups normalizes for differences between historical and new program WBS elements in order to achieve content consistency. To do this type of normalization, a cost estimator needs to gather cost data that can be formatted to match the desired WBS element definition. This may require adding and deleting certain items to get an apples-to-apples comparison. A properly defined WBS dictionary is necessary to avoid inconsistencies. [1]

AcqLinks and References:

Updated: 7/26/2021

Rank: G73.9

Cost Estimating