Why is mean affected by outliers




















Every score therefore affects the mean. Note: In the distribution above, there are 26 homework scores for this student. If the teacher made fewer homework assignments, a zero would have a greater impact on the mean. We can see this in the distribution below. This distribution has only 10 scores. The one grade of 0 moves the mean into the C grade range. In this example, we look at how skewness in a data set affects the mean and median.

The following histogram shows the personal income of a large sample of individuals drawn from U. Notice that it is strongly skewed to the right. This type of skewness is often present in data sets of variables such as income.

Here again we see that the mean income does not represent the typical income for this sample very well. The small number of people with higher incomes increase the mean. In particular, the smaller the dataset, the more that an outlier could affect the mean. Ten men are sitting in a bar. Suddenly one man walks out and Bill Gates walks in. This example shows how one outlier Bill Gates could drastically affect the mean. An outlier can affect the mean by being unusually small or unusually large.

In the previous example, Bill Gates had an unusually large income, which caused the mean to be misleading. However, an unusually small value can also affect the mean. To illustrate this, consider the following example:. The mean score is The one unusually low score of one student drags the mean down for the entire dataset. The smaller the sample size of the dataset, the more an outlier has the potential to affect the mean.

For example, suppose we have a dataset of exam scores where all of the students scored at least a 90 or higher except for one student who scored a zero:. In other words, each element of the data is closely related to the majority of the other data. If not, the data set may have information that is too scattered to be useful in any analysis. In some data sets there may be a point or two that can be out of context with the bulk of the data. These are referred to as outliers, which are out of line with the normal data set.

The outlier can push the mean of the data out of its usual position. For example, the data set 3, 4, 5, 6, 7 has a mean of 5 , found by dividing the sum of the data by the number of data elements:. If the 4 was mistakenly recorded as a 14 , the 14 would be unusual for the data set and it would be an outlier.

And we can see the outlier has moved the mean of the data set. To solve this problem the unusual data element can either be re-investigated and corrected, or removed from the data set with an explanation.

The former solution may bring back our original 4 after error checking is completed. The latter will return our mean closer to a representative evaluation of the data. How does an outlier affect the mean of a data set? May 17,



0コメント

  • 1000 / 1000