Definition

Consider a random sample $\tilde{X}$ where $X_{1} \dots X_{n}$ are iid. rvs. from a distribution with a probability density function $f_{X} (x; θ)$ , $θ \in Ω$ . The joint pdf of $\tilde{X}$ is $\prod^{n} f_{X} (x_{i}; θ)$ .

Definition

Likelihood function is defined by
$L (θ; \tilde{X}) = \prod n f_{X} (x_{i}; θ) \geq 0$
and can be interpreted as probability that observed data $\tilde{X}$ can occur.

Our aim here is to maximize that likelihood function.

Definition

For a given observations $\tilde{X}$ a value $\hat{θ} \in Ω$ at which $L (θ)$ is a maximum is called a maximum likelihood estimate for $θ$ .

That is $\hat{θ}_{MME}$ is the value of $θ$ that satisfies
$f (\tilde{X}; \hat{θ}) = θ \in Ω max L (θ)$

To find such $θ$ one should:

First solve $\frac{d}{d θ} L (θ) = 0$ for $\hat{θ}$
Then check maximum by $\frac{\partial ^{2}}{\partial θ ^{2}} ln L (θ) < 0$ .

In most cases differentiating the $L (θ)$ is hard to do so. Therefore $ln L (θ)$ is used instead. Since $ln L (θ)$ is strictly increasing when $L (θ) > 0$ , maximizing it will also maximize the $L (θ)$ .

Multiple Parameters

If $θ$ is a vector to be estimated, then solve

\frac{\partial ln L ( θ )}{\partial θ _{1}} \frac{\partial ln L ( θ )}{\partial θ _{2}} ⋮ \frac{\partial ln L ( θ )}{\partial θ _{k}} = 0 = 0 = 0

solve $k$ equations for $k$ estimations.

Invariance property

If $\hat{θ}$ is MLE for $θ$ and $g (θ)$ is a function of $θ$ , then $g (\hat{θ})$ is MLE for $g (\hat{θ})$ .

MLE at the boundary of $Ω$

In such cases MLE exists but can not be obtained as a solution to the derivative.

Example

Take $X_{1} \dots X_{n} \sim U (0, θ)$ . What is the MLE of $θ$ ?
$\frac{d}{d θ} ln L (θ) = \frac{d}{d θ} ln \prod n (\frac{1}{θ}) = \frac{d}{d θ} - n ln θ = 0$ $⟹ - \frac{n}{θ} = 0$
there is no finite solution for $θ$ .

But observe that $L (θ) = \frac{1}{θ ^{n}}$ implying that minimizing $θ$ would maximize the likelihood. But one should consider that $x_{(i)} \leq θ$ . So choosing the minimum value that covers all the values in $\tilde{X}$ would ensure the maximum likelihood. Then, $\hat{θ}_{M L E} = X_{(n)}$

Advantages-disadvantages

Advantages

It makes sense.

Widely used

Can also be used where observed values are not independent or iid.

Gives good measures for large sample sizes.

Disadvantages

$f (x; θ)$ must be known

MLE might not exist or may not be unique

Numerical methods might be needed.

Melih Akay 🦾

Explorer

Recent writing

Wheelie Roadmap

Invertibility of a matrix

About me

Maximum likelihood estimation

Definition

Multiple Parameters

Invariance property

MLE at the boundary of $Ω$

Advantages-disadvantages

Graph View

Table of Contents

Backlinks

Melih Akay 🦾

Explorer

Recent writing

Wheelie Roadmap

Invertibility of a matrix

About me

Maximum likelihood estimation

Definition

Multiple Parameters

Invariance property

MLE at the boundary of Ω

Advantages-disadvantages

Graph View

Table of Contents

Backlinks

MLE at the boundary of $Ω$