Quant Notes Mba
Autor: Abhishek Chhabria • August 24, 2016 • Course Note • 2,439 Words (10 Pages) • 786 Views
Quant
Data Description
Graph & Outliers
Arrange data graphically to reveal patterns of data behavior and to help identify outliers. Investigate outliers before making decision to leave them alone, remove them, or change them.
Central Values
Mean is the most common measure used to describe the "center" of a data set. However, it isn't always the best value to represent data. Outliers can exercise undue influence and pull the mean value towards one extreme. In addition, if the distribution has a tail that extends out to one side — a skewed distribution — the values on that side will pull the mean towards them.
Median is not biased by outliers and is often a better value to represent skewed data.
Mode data may cluster around two or more points that occur especially frequently, giving the histogram more than one peak. A distribution that has two peaks is called a bimodal distribution.
Variability
Standard Deviation measures how much a data set varies from its mean. A large standard deviation indicates that the data are widely dispersed. A smaller standard deviation tells us that the data points are more tightly clustered together, and that the mean is more representative of the data.
[pic 1]
Coefficient of Variation compares the variability in different data sets. CV shows the relative magnitude of the variation in a data set, we compare the standard deviation of the data to the data's mean.
[pic 2]
Relationship Between Variables
Relationship between two variables: plotting a scatter diagram can help illustrate the relationship between two variables. But, a relationship is not proof of causality. We should be alert to the possibility of hidden variables.
Time Series helps us recognize seasonal patterns and yearly trends. But we shouldn't rely only on visual analysis when looking for relationships and patterns.
False Relationships
The relationship we see on graph might just be a coincidence, or relates to other hidden variables.
Correlation coefficient quantifies the extent to which there is a linear relationship between two variables. The correlation coefficient takes on values between -1 and +1. A correlation coefficient near 0 indicates a weak or nonexistent linear relationship. [Excel: CORREL function]
- Outliers: correlation give more weight to points distant from the center of the data, outliers can strongly influence the correlation coefficient of the entire set. So our intuition and the measure we use to quantify our intuition can be quite different. We should always attempt to reconcile those differences by returning to the data.
Confidence Interval
Random Sample
Avoid Biased Result
- Phrase questions neutrally to avoid bias
- Pursue high response rates: better to have a smaller sample with a high response
- Understand incentives and motivations of respondents and pollsters
It is not true that the larger the population, the larger the sample size needed to achieve a given level of accuracy.
...