maximum likelihood estimation

suppose that the likelihood function depends on parameters . choose as estimates those values of the parameters that maximize the likelihood .

to emphasize the fact that the likelihood function is a function of the parameters , we sometimes write the likelihood function as . it is common to refer to maximum-likelihood estimators as MLEs.

a binomial experiment consisting of trials resulted in observations , where if the th trial was a success and otherwise. find the MLE of , the probability of a success.

the likelihood of the observed sample is the probability of observing . hence, we now wish to find the value of that maximizes . if , , and is maximized when . analogously, if , and is maximized when . if , then is zero when and and is continuous for values of between 0 and 1. thus, for , we can find the value of that maximizes by setting the derivative equal to 0 and solving for .

note that is a monotonically increasing function of . hence, both and are maximized for the same value of . because is a product of functions of and finding the derivative of products is tedious, it is easier to find the value of that maximizes . we have if , the derivative of with respect to , is for , the value of that maximizes (or minimizes) is the solution of the equation solving, we obtain the estimate . you can easily verify that this solution occurs when (and hence ) achieves a maximum.

because is maximized at when , at when and at when , whatever the observed value of , is maximized when .

the MLE, , is the fraction of successes in the total number of trials n. hence, the MLE of is actually the intuitive estimator for .

the method of maximum likelihood is, by far, the most popular technique for deriving estimators. recall that if are an iid sample from a population with pdf or pmf , the likelihood function is defined by

for each sample point , let be a parameter value at which attains its maximum as a function of , with held fixed. a maximum likelihood estimator (MLE) of the parameter based on a sample is .

notice that, by its construction, the range of the MLE coincides with the range of the parameter. we also use the abbreviation MLE to stand for maximum likelihood estimate when we are talking of the realized value of the estimator.

intuitively, the MLE is a reasonable choice for an estimator. The MLE is the parameter point for which the observed sample is most likely.

there are two inherent drawbacks associated with the general problem of finding the maximum of a function, and hence of maximum likelihood estimation. the first problem is that of actually finding the global maximum and verifying that, indeed, a global maximum has been found. in many cases this problem reduces to a simple differential calculus exercise but, sometimes even for common densities, difficulties do arise. the second problem is that of numerical sensitivity. that is, how sensitive is the estimate to small changes in the data? unfortunately, it is sometimes the case that a slightly different sample will produce a vastly different MLE, making its use suspect. we consider first the problem of finding MLEs.

if the likelihood function is differentiable (in ), possible candidates for the MLE are the values of that solve

note that the solutions to eq-mle-1 are only possible candidates for the MLE since the first derivative being 0 is only a necessary condition for a maximum, not a sufficient condition. furthermore, the zeros of the first derivative locate only extreme points in the interior of the domain of a function. if the extrema occur on the boundary the first derivative may not be 0. thus, the boundary must be checked separately for extrema.

points at which the first derivatives are 0 may be local or global minima, local or global maxima, or inflection points. our job is to find a global maximum.

let be iid , and let denote the likelihood function. then the equation reduces to which has the solution . hence, is a candidate for the MLE. to verify that is, in fact, a global maximum of the likelihood function, we can use the following argument. first, note that is the only solution to ; hence is the only zero of the first derivative. second, verify that thus, is the only extreme point in the interior and it is a maximum. to finally verify that is a global maximum, we must check the boundaries, . by taking limits it is easy to establish that the likelihood is 0 at . So is a global maximum and hence is the MLE. (actually, we can be a bit more clever and avoid checking . since we established that is a unique interior extremum and is a maximum, there can be no maximum at . if there were, then there would have to be an interior minimum, which contradicts uniqueness.) another way to find an MLE is to abandon differentiation and proceed with a direct maximization. this method is usually simpler algebraically, especially if the derivatives tend to get messy, but is sometimes harder to implement because there are no set rules to follow. one general technique is to find a global upper bound on the likelihood function and then establish that there is a unique point for which the upper bound is attained.