Examining and Cleaning Data | by Chijioke Godwin | Dec, 2020

[ad_1]


Understanding levels of measurements

The four levels of measurements in statistics are Nominal, Ordinal, Interval and Ratio.

Most categorical data fall into the nominal level. The categories are just names and classes with no progressive relationship between them. Examples are names of cities or colours. They are disjoint and every observation falls into one category. Categorical data (nominal data) can be assigned numeric values and encoded as numbers for classification algorithms. These numeric values are arbitrary and arithmetic operations or comparisons including ordering cannot be performed on the encoding numbers. Only frequency counts can be performed. You will understand this when we use the .describe(), .unique() and .value_counts() while inspecting our data.

Ordinal levels are categorical but ordering can be performed on them. So when encoded using numbers, progressive or consecutive numbers must be used because the order matters. A common example is user feedbacks; very bad, bad, indifferent, good, very good which is usually encoded as 1 = very bad, 2 = bad, 3 = indifferent, 4 = good, 5 = very good. Another good example is educational level; less than high school, high school graduate, Bachelors degree, Masters degree, Doctorate degree. This measurement is progressive. However, we cannot tell anything about the nature of difference between each category and so arithmetic operations cannot be performed since the scale difference cannot be interpreted.

Numerical data can be converted to categorical data (ordinal level of measurement). For example, weight could be converted to very heavy, heavy, medium, light, very light.

Converting numerical data to categorical data | Image by Author

This level is for numerical data only, it has regular, constant intervals like a scale or number lines. This scale has a zero point but this zero point does not represent the absence of that value. The values here are relative and taken with reference to a point, which is usually the zero point. Example is temperature measured in Celsius; height measured with reference to sea level; date and years measured as BC and AD. For this measurement level, division and multiplications by each other are meaningless, only differences between them are relevant. Mean, standard deviation, range and some other statistical analysis can be performed. Values can also be sorted

Numerical values with absolute zeros, that is zero score represents the total absence of that quantity, are classified as ratio level of measurements. These values have the same characteristics as real numbers. Example include price, age, salary, height and weight of an object.

Read More …

[ad_2]


Write a comment