% filename: inference_example_4.m % This script plots the likelihood function for Example 4 in the Introduction to Inference presentation. In the example, five randomly sampled* MLB baseball players' salaries in $K were 750, 5250, 12333, 359, and 340. Assuming them to have come from a normal distribution with mean mu and standard deviation sigma, the goal is to give plausible values for mu and sigma. But note the moral of this cautionary tale: The assumption that MLB players' salaries are normally distributed is not reasonable; we know this from anecdotal evidence as well as from the given data set itself. % A more reasonable model might be that the salaries have a lognormal distribution, which just means that their logarithms are normally distributed. % * Actually, only four of these were truly randomly selected. Chipper Jones was planeted into the data set deliberately so as to exaggerate the effect of the outlier, for demonstration purposes. % ********** % create the data vector and vectors of candidates for mu and sigma. This time the mu and sigma vectors are defined in terms of the sample mean and sample standard deviation of the data; automating this step saves time later 'tweaking' the graphs. data = [750 5250 12333 359 340]; mu0 = mean(data); sd0 = std(data); mu = [mu0-2*sd0:(6*sd0)/50:mu0+2*sd0]; sigma = [0:(5*sd0)/50:3*sd0]; % create a matrix of likelihoods, one for each combination of mu and sigma in their vectors. We accomplish that here by first creating an appropriately-sized matrix of zeros, then filling in the likelihoods one by one using embedded for loops. L = zeros(length(mu), length(sigma)); for i = 1:length(mu) for j = 1:length(sigma) L(i,j) = prod(normpdf(data, mu(i), sigma(j))); end end % make a surface plot of the likelihood function. Note that we're plotting L' instead of L. That's a side effect of two inconsistencies with matrices: when indexing them, you always name the row first and then the column; but when you want to plot them, you (might often) want to plot the rows as "y's" and the columns as "x's". figure surf(mu, sigma, L') xlabel('\mu') ylabel('\sigma') % make a contour plot of the likelihood function with level curves drawn at 10%, 5%, and 1% of the maximum likelihood. Also plot the MLE as a red circle ('ro'). figure maxL = max(max(L)); levels = maxL*[.1 .05 .01]; contour(mu, sigma, L', levels) xlabel('\mu ($K)') ylabel('\sigma') title('MLB player salaries, \mu and \sigma') [index_mu, index_sigma] = find(L == maxL); hold on plot(mu(index_mu), sigma(index_sigma),'ro') hold off