Statistics for It Managers
Autor: Kishore R • July 29, 2016 • Study Guide • 7,268 Words (30 Pages) • 779 Views
STATISTICS FOR IT MANAGERS
STUDY GUIDE
This study guide is intended to provide you an idea of the content coverage that I have in mind when writing the comprehensive final exam. The study guide is not meant to be perfectly exhaustive of topics or ideas that can appear but is designed to provide a general guideline. That said, it will only occur by unintentional accident if a question on the final pertains to content that was not covered in this guide.
- Descriptive Statistics
Measures of central tendency pertain to the data’s tendency to cluster or center about certain values
- The mean is the average of values: We sum the observations and divide by the number of observations
- We label the n observations from a sample as x1, x2, ....xn where x1 is first, x2 is second and xn is last
- Sample mean:
_ n[pic 3]
- Population mean:
x = i=1 xi n
µ = i=1 xi N[pic 4][pic 5]
- Median: The middle observation when data are placed in ascending or descending
order
- Mode: This is the observation that occurs with greatest frequency
- A dataset is skewed if one tail has more extreme observations than the other tail
- Right-skewed data: The right tail or the high end of the distribution has more extreme observations
- Left-skewed data: The left tail or the low end of the distribution has more extreme observations.
- The extreme values tend to pull the mean leftward away from the median.
1
Measures of variability pertain to the spread of the data.
- The two most important measures of variability are variance and standard deviation
- The sample variance for a sample of n measurements is equal to the sum of squared deviations from the mean divided by (n − 1).
n 2
s2 =
- The population variance:
σ2 =
i=1(xi − x)
n − 1[pic 6]
i=1(xi − µ)2[pic 7][pic 8]
N
- The standard deviation is the square root of the variance
s = √s2[pic 9]
- Empirical Rule:
- (1) Approximately 68% of all observations fall within 1 s.d. of the mean
- (2) Approximately 95% of all observations fall within 2 s.d. of the mean
- (2) Approximately 99.7% of all observations fall within 3 s.d. of the mean
- If the data are not bell-shaped, the empirical rule does not apply
- Regardless the shape of the distribution we can use Chebysheff’s Theorem: the proportion of observations in any sample or population that lies within k standard
deviations of the mean is at least 1 − 1[pic 10]
for k > 1
- Ex. If k = 2, then 75% of observations are within 2 s.d. of the mean
- Ex. If k = 3, then 88.9% of observations are within 3 s.d. of the mean
- Box plot:
- A box plot uses five statistics to represent the data:
- The minimum and maximum observations help us construct vertical (horizon- tal) lines called “whiskers”
- The 1st, 2nd, and 3rd quartiles are represented by 3 horizontal (vertical) lines.
- Any point outside the whiskers is an outlier
- The whiskers extend to the smaller of 1.5 times the IQR or to the most extreme observation that is not an outlier.
- To build a box plot, draw the ends (hinges) at the lower and upper quartiles,
Q1 and Q3 (or Ql and Qu), respectively
- The points at distances of 1.5× IQR from each hinge define the inner fences
- Lines (whiskers) are drawn from each hinges to the most extreme measurement inside the inner fence:
- Lower inner fence: Q1 − 1.5(IQR)
- Upper inner fence: Q3 + 1.5(IQR)
- Outer fences lie at a range of 3 IQR from the hinges.
- Lower outer fence: Q1 − 3(IQR)
- Upper outer fence: Q3 + 3(IQR)
- No lines are drawn. A symbol like ∗ might indicate an observation between inner and outer fences.
- A symbol like 0 might indicate observations beyond outer fences.
- We can identify suspected outliers using the boxplot:
- Observations between 1.5(IQR) and 3(IQR) are suspicious
- Observations beyond outer fences 3(IQR) are very suspicious
- We can also use the z−score to find outliers:
- Sample z−score:[pic 11]
- Population z−score:
z = x − x
s
z = x − µ
σ
- |z| > 2 indicates possible outlier
- |z| > 3 indicates an outlier
- Normal Distribution
- The normal distribution is symmetric about its mean,µ
- Its spread is determined by the standard deviation, σ
- In order to graph, we need µ and σ.
- To find the probability a normal random variable falls into an interval we must compute the area in the interval under the curve
- We reduce the number of tables by standardizing: subtract the mean and divide by the standard deviation
- When the variable is normal, the transformed variable is called a standard normal random variable and denoted Z i.e. Z = X−µ[pic 12]
- The probability statement about X is transformed into one about Z.
- The standard normal distribution is the normal distribution with µ = 0 and
σ = 1.
- A random variable with the standard normal distribution is denoted z and called a standard normal random variable.
∗ Ex. P (x ≤ 12) = P (z ≤ 1.25) = .8944
∗ Ex. P (x > 12) = P (z > 1.25) = 0.1056
∗ Ex. P (x ≤ −12) = P (z ≤ −1.25) = 0.1056
∗ Ex. P (−1.04 ≤ x ≤ 12) = P (−0.38 ≤ z ≤ 1.25) = 0.8944
...