AstroWerewolf

Can their be a negative outlier?

Zachary Litvinenko

Yes, absolutely.

For example, let's consider

-19, -1, (0), 5, 7, (9), 12, 12, (12), 13, 13

Low threshold Q1-1.5*(Q3-Q1) = 0 - 1.5*12 = -18. Our min value -19 is less than -18, so it is an outlier.

Now, let's shift our numbers in such a way, that there's no more negative numbers:

0, 18, (19), 24, 26, (28), 31, 31, (31), 32, 32 - the same sequence, but with numbers shifted to be positive.

Low threshold = Q1 - 1.5*(Q3-Q1) = 19 - 1.5*(31-19) = 19-1.5*12 = 19-18 = 1.

Our difference is the same here, -19 - (-18) = 0 - 1 = -1, therefore, negative numbers can be used in our data sets as well as positive.If you think about it, there's no difference in negative or positive numbers as no difference between coordinates on the (x, y) plane. For example, you can get distance between 2 points, doesn't matter where those 2 points lie. This is not exception.

gotwake.jr

In this example, and in others, KhanAcademy calculates Q3 as the midpoint of all numbers above Q2. Q2, or the median of the dataset, is excluded from the calculation. The same is true for Q1: it is calculated as the midpoint of all numbers below Q2.

Using Excel, I notice Q1 and Q3 are calculated inclusive of Q2...so Q3 equals the median of the dataset from Q2 to Max, inclusive. Q1 equals the median of the dataset from Min to Q2 inclusive. This changes the IQR from 5 (per KhanAcademy) to 3.5.

Which is correct? Does it depend on whether or not the number of points in the data set is odd or even?

Gav1777

Great Question. The 5 is the correct answer for the question. Like you said in your comment, The Quartile values are calculated without including the median

ravi.02512

what if most of the data points lies outside the iqr??

Charles Breiling

Although you can have "many" outliers (in a large data set), it is impossible for "most" of the data points to be outside of the IQR. The IQR, or more specifically, the zone between Q1 and Q3, by definition contains the middle 50% of the data. Extending that to 1.5*IQR above and below it is a very generous zone to encompass most of the data.

Amri

thanks. now I’m a step farther from the “stressed about not knowing this” zone lol

Saxon Knight

Why wouldn't we recompute the 5-number summary without the outliers?

cossine

If you want to remove the outliers then could employ a trimmed mean, which would be more fair, as it would remove numbers on both sides.

zeynep cemre sandallı

I have a point which seems to be the outlier in my scatter plot graph since it is nowhere near to other points. My maths teacher said I had to prove the point to be the outlier with this IQR method. Now the y-coordinate of the point is definetely an outlier (which is why the point is at the very bottom of the graph) but x-coordinate is not. Can I still identify the point as the outlier?

gul.ozgur

Hi Zeynep, I think you're looking for finding outliers in 2D ie aka Directional quantile envelopes. Check out https://mathematica.stackexchange.com/questions/114012/finding-outliers-in-2d-and-3d-numerical-data and/or https://mathematicaforprediction.wordpress.com/2014/11/03/directional-quantile-envelopes/

Jessica Lynn Balser

How did you get the value 5 for IQR?

Robert

IQR, or interquartile range, is the difference between Q3 and Q1. Here Q1 was found to be 19, and Q3 was found to be 24. So subtracting gives, 24 - 19 =

**5**. Hope that helps!

23_dgroehrs

In the bonus learning, how do the extra dots represent outliers? Wouldn't 5 be the lowest point, not an outlier.

Chuck B

For the box-and-whisker showing outliers, the whiskers are modified to depict a span from a low of Q1-1.5*IQR to Q3+1.5*IQR. In other words, the whiskers are modified to represent the non-outliers. Any values outside that range are outliers, and are then depicted with dots.

Rachel.D.Reese

How do I draw the box and whiskers? Do I start from Q1 with all the calculations and end at Q3?

taylor.forthofer

On question 3 how are you using the Q1-1.5_Iqr how does that have to do with the chart

