Ok… so the density function, the multi-dimensional integral of which is the likelihood is ludicrously small, and I still haven’t thought up a method of dealing with it.

 However, and get this: why then was I able to evaluate likelihood using the quadrature approximation? For example, with some rigged numbers, the density gives a log value of -1168, where as the log likelihood is -325. The difference is a factor of e^{843} in the actual values (as opposed to the log values). The region over which the integral is no “as close to 0 as you can imagine” is not that big…

 This suggests that my earlier likelihood function is doing something wrong. Let’s see what.

P.S. Maybe I should try some smaller cases where the numbers are more manageable. Maybe.

Update: I tried with smaller sample sizes. This didn’t really make the numbers any bigger. However, I realised that it doesn’t matter some of the numbers are coming out at -Inf, because R treats exp(-Inf) as 0, so while I do loose a bit of accuracy, the Monte Carlo method is largely unaffected.

Further more, I fixed a bug in my earlier code, so the two log-likelihood functions agree now on small sample sizes now. However, it became apparent that the MC method became hopelessly slow for higher dimensional cases (even 5th dimensional!), so is not really useful for large samples.

In some sense, I’m back to where I started a few days back with the misbehaving function. I’m going to have a look at the older Fridman paper to see if that sheds any light on my problem.

I’m still trying to fix the last problem I encountered. I got the idea that I should try to code the likelihood function in an entirely different method as to allow me to check my main code. (BTW, I don’t think what I suspect last time is the problem).

 Since the likelihood function is essentially a multi-dimensional definite integral, I thought I’d try naive Monte Carlo simulation. The actually coding went by quickly, but it didn’t work. It soon became apparent that everything is badly under-flowing. The smallest value R can keep track of is in the region of -10^{-309}, which one can quickly reach when you are multiplying thousands of numbers between 0 and 1 together. The standard tactic to get around it is to log everything. Products become sums and ludicrously small numbers become merely small.

So you end up with code like this:

logdensitySV <- function(y,h,theta){

gamma0 = theta[1];
gamma1 = theta[2];
sigma = theta[3];

h0 = gamma0/(1-gamma1);
T = length(h);

transition = vector("numeric",T);
transition[1] = dnorm((h[1]-gamma0-gamma1*h0)/sigma);
transition[2:T] = dnorm((h[2:T]-gamma0-gamma1*h[1:T-1])/sigma);

condprob = dnorm(y/exp(.5*h));

sum(log(transition))+sum(log(condprob));
}

Which is fine, except when these guys are underflowing too! To consistently get these functions to return a usable value, I have to add about 100 000 to the log value! I need to think carefully about whether doing this will cause an overflow down the line…

I still haven’t gotten the likelihood function to behave properly. The problem is the first parameter. It’s essentially the one representing average or “base” volatility. The problem is that the likelihood function as I have it is an increasing function of the first parameter.

This means that the higher I make the first parameter (unbounded), the better the model fits, which doesn’t seem right.

It took a while to realise what’s the underlying problem, tha the distriubtion function of the standard normal has its only maxima at 0. So in one sense, the smaller closer the value, the less likely. We have y_t = e^{\frac{h_t}{2}}\mu_t. If h_t is horrifically big, and y_t is not too big, we need \mu_t to be quite close to 0. So the bigger h_t is, the closer \mu_t has to be to 0, the higher the likelihood.

Something is wrong here.

P.S. Found the page describing how to incorporate \TeX into WordPress!

Once again, there was over a month of no post. Thankfully I have been working though, just not as productive as I would liked to have been. 

I’m currently making some progress as to implementing what Batolucci and De Luca wrote about in their 2001 paper. Which is essentially fitting of a no leverage SV model through maximisation of the approximate likelihood. They called it the quadrature approximation to the latent state model. I say discrete approximation of a Hidden Markov model. Same thing really.

The way they approximated the likelihood is pretty much the same as how my supervisor and his research buddy has, though it took some time to confirm this (i.e. consolidate difference in notation). However, instead of using a black box maximiser, they implemented the straight forward Newton Raphson’s method in multiple variables. Now, R’s nlm is also Newton based, but this approach does seem a bit more pleasing for these reasons:

  • No longer black box;
  • Analytical evaluation of the derivatives of the approximated function instead of numerical approximation of the approximated function. The professional communications people would have killed for for that sentence. The conceptual idea is that you are not approximating twice;
  • Not as concerned about boundary values. Since the optimisation algorithm is visible, one doesn’t worry as much about constraints. It seems preferred to transforming parameters back and forth. I’ll have to see how robust this is though.

By the same token, it’s bad because:

  • More work;
  • One more thing to go wrong;
  • Computationally expensive to calculate the derivative matrices. I’ll have to check up on this, as it may still be faster than what nlm does. In fact, if it allows for faster convergence, then it might be worth it.

So far, I’ve coded up the mess that it requires to evaluate the likelihood. I don’t think it’s quite working as the function is misbehaving. It isn’t “topping” out near the proper parameters.

What does this all mean?

This is what I’ve been getting out. llh is the log likelihood function, y observed values and the second argument the three parameters for the non-leverage case of SV.

Problem is I don’t know what value llh is supposed to give. I’m not sure what the intermediate values are in the evaluation. I just know roughly where it should be maxing out. I’ll write more once I’ve fixed this.

P.S. Blogging seems easier than actually working on the thesis, even when I’m writing about roughly the same thing.

Update: After some fiddling around, browsing the web, shopping and dinner, I realised I made a key mistake in my code. Now the likelihood function behaves reasonably well with regards to two of the parameters. Put mathematically, it would appear that two of the partial derivatives are close to 0 near where I want a maxima to be.

It should be common knowledge that Wolfram makes the Mathematica functionality to do indefinate integrals in one variable available via the Wolfram Integrator.

 I needed to evaluate c(10000,5000)/2^10000, a calculation enough to generate overflows in any naive solution. The only tool I had available was R, and I can’t even find the binomial function in it, even were it to work for such large parameters. (That’s another rant all together about trying to Google for help on R).

So, I know Mathematica is good at this sort of thing, but I didn’t have it installed. Solution? To pop

Binomial[10000,5000]/2^10000

into the integrator:

The Integrator put to work

Impressive, but wasn’t that helpful. Using the Round function did the trick though: putting

Round[100000*Binomial[10000,5000]/2^10000]

in yielded 798x, which translates to 0.00798 for what I needed. Neat.

I’ve been away (vacation work etc.), but have been back for two weeks now.

Started looking at Clark (1973) as an earlier paper on SV, but got sidetracked into the two volumes by Feller (1950, 1966) on Probability Theory. Chapter III has some fairly interesting Combinatorial results to do with random walks. I came across this identity, which baffled me for a while:

 2*ArcSin[x^0.5] = Pi/2 + ArcSin[2x-1] for 0 <= x <= 1

 Quite an unexpected result. Thought it was a arithmetic of mine for a while.

 In other news, I’ve got to get back onto the main trunk of readings. One can easily spend years following streams of references.

Heteroskedasticity. I’ve been trying to pronounce this word for the last few days. I can say it in parts, but it just has too many syllables. It is the H in ARCH, and means non-constant variance.

ARCH in turn, is the vastly more popular way to model heteroskedasticity, with a vast amount of literature on it. Robert Engle even won a Nobel prize for his work on it.

The supervisor has suggested that I do not go into ARCH at the moment, and should concentrate of stochastic volatility instead. I agree.

It’s been a while since the last post, and to be honest, not a whole lot has happened. Following my supervisor’s suggestion, I have started to gather what I have learnt in TeX. This helps a bit in terms of giving me a better sense of the direction I’d like to proceed in, and allows me to consolidate what I think I know.

The one paper that I’ve been trying to read for a while is “Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models“ by Kim, Shephard and Chib. Shephard seems to be an Oxford guy who’s worked quite a lot with Stochastic Volatility. What this paper seems to present is a method of estimating the SV model using rather complex Monte Carlo simulation. The problem that my supervisor has is that his method of using hidden Markov models seems a lot more straightforward, and seems to be giving identical results on the sample data. Now, Shephard is not stupid, so we must be missing the point of his method. I’ll need to work through most of this paper at some stage.

 In addition, an often cited paper on SV is Hull and White’s 1986 paper. However, as I discovered, that is a continuous time treatment, which is not especially relevant to my project. Hull is hardcore though, being the author of the biblical Options, futures and other derivatives.

 But I ramble on. It should be fun re-familiarising myself with LaTeX again.

It’s been a week and I haven’t progressed much on the theory side of things. I’ve been doing consolidation of stuff  I read earlier (looking at Prof. Walter Zucchini’s notes on hidden Markov models and actually doing some problems).

 Here’s one bit of annoyance in R that I would like to share: I’m working with a function called dpois(x, lambda), which is simply the pdf of a Poisson distribution. Now, just as Matlab assumes everything is a matrix, R assumes everything is a vector. Naturally then, both x and lambda can be entered as vectors. The strange part is what you get out if both are vectors. The function returns a vector, not a matrix, so it won’t return the densities of m values in n distributions. Some short experimentation suggests that if x is strictly shorter than lambda, then the function disregards all but the first element of x, i.e. a vector of values from different distributions. Otherwise, the function disregards all but the first element of lambda, and returns a vector of values from the same distribution.

How’s that with consistency? Of course, the standard documentation makes no mention of this.

 Update: I was wrong with what dpois outputs. After closer observation, it outputs a vector of size equal to that of the longer of x and lambda. As R seems to do quite often, it repeats the shorter of the two vectors to match the longer in length, and evaluates the density pairwise. Guess that is more consistent and can be cohered into doing useful stuff with some thought (like generating the m x n matrix mentioned earlier).

First post on Random Walk. Guess this is as good a place as any to outline my intentions for the blog.

I would like to use the blog to document my progress through my masters thesis. The working title is “Estimation of Stochastic Volatility”. It has taken me 2 month of (not so hard) work so far to find out what that means. As such, I don’t really expect anyone other than me to read it, which is fine. It just seems easier to keep it on-line. In addition, the idea of it being publicly available means I will go to at least some efforts to keep it to a reasonable standard.

With some luck, I should be focused enough to keep my non-academic life out of this.

I spend the productive part of today trying to get started in R (read about it at www.r-project.org),  which is the preferred coding language of my supervisor. As I’m more used to Java and C++, I’m still getting a feel for interpreted languages. My limited experience in Matlab doesn’t really help here.

Piece of R code I’m trying to figure out.

I’m supposed to be using it to evaluate likelihood functions in order to estimate parameters, but I’m currently still stuck in the generic tutorial.

Follow

Get every new post delivered to your Inbox.