IDL Tutorial: Models and Fitting (2024)

IDL Tutorial: Models and Fitting (1)
Departmentof Physics and Astronomy

Scientificresearch will very often involve trying to understand a set ofobservations by comparison to a numerical model. For example, inthe Parallaxlaboratory,we're going to model the apparent motion of a star (actually, themotion of the earth) as a sinusoidal function of time. You'll havea set of observations, the position of the star on the sky as afunction of time, that you'll want to fit a sine curve to. Theamplitude of the sine curve yields a very important measurement:the distance to that star.

Ofcourse, data are noisy, and so there will be no one unique sinecurve that will fit your data; there will be a range of sinecurves that will equally well describe the data. More precisely,there will be a range of values for the parameters,the amplitude and the phase, that define the shape of the sinecurve. How do we find the appropriate range of acceptableparameters? For parallax, what is the best measure of thedistance, and what is its uncertainty?

You should readNorton chapters 13-16 for an overview of the general problem, buthere's a simple example. Suppose we have a simple set ofmeasurements of the (1-D) speed of a falling object vs. time(measured using the LabPro devices you saw in PHYS 211/212). Weknow the slope of this speed function gives the acceleration. Hereare the data:

Time
(sec)

Speed
(m/s)

Uncertainty
(m/s)

12.1

1.1

1

20

1.1

2

31.3

2.1

3

36.9

3

4

51.4

4

5

53.4

5

6

67.6

6

7

85.3

7

8

83.7

7.9

9

93.1

8.9

Wow, that falling object was bookingby the last measurement. I'd like to see the crater after theimpact.

Anyway, load these data into the IDLvariables t, s, and ds(here ds refersto the uncertainty in the speed). Plot s vs.t using thefollowing command,

ploterror, t, s, ds, psym=4,xtitle='Time (s)', ytitle='Speed (m/s)', xrange=[-1,10]

Now, let's fita straight line to the data using robust_linefit (fromIDLastro):

coeff = robust_linefit(t, s,sfit, ds, dcoeff)

This functionfits the model y = m x + b, a straight line with slope m= coeff[1] and intercept b = coeff[0]. Theuncertainty of the fitted parameters is dcoeff, so, forexample, the best-fit slope is coeff[1] +/- dcoeff[1].The resulting model is placed in the variable sfit. To viewthe model atop the data, enter the following command,

oplot, t, sfit

And, with anyluck, the resulting model should make sense. To display thecoefficients, try,

print, 'Slope = ',coeff[1], ' +/-', dcoeff[1], 'm/s^2'

Is the slopeconsistent with what you would expect near the earth's surface?Did the falling object start from rest?

Whatis going on?

There's not much to it. Our goal is to tweak some model, y,to match up as closely as possible some set of data, d.Howclose is close? Well, you'd like the model to be close to a givendata point relative to the uncertainty of a given data point. If agiven data point has a large uncertainty, we cut the model alittle slack. If the data point has a very small uncertainty (themeasurement is very accurate), then we want the model to match upwell.

In the end, wewant to evaluate the quantity chi-squared, which is expressed as,

IDL Tutorial: Models and Fitting (2)

wherethe sum is taken over all of the individual data points (subscripti).Here's away to interpret chi-squared in words:

chi-squared = the sum of thedistances between data and model, squared, but weighted down bythe uncertainties

The smaller thesum, or equivalently the smaller the value of chi-squared, thebetter the model matches the data. Often the numerical method offinding the best model is iterative:

(1) Make aguess at the correct parameters (for example, the slope andintercept of a line, or the amplitude, period, and phase of a sinewave).

(2)Generate the model y.

(3) Calculatechi-squared

(4)Look at small changes of the parameters (for example, change theslope of the model line slightly, recalculate the model y,),and see if that improves chi-squared.

(5) Repeatuntil small changes of the parameters no longer reduce the valueof chi-squared.

(6) From thisprocedure, determine which parameters gave the smallest value ofchi-squared.

(7)Determine how much you can change the parameters before you get asignificant change in chi-squared. (Note that, if you have trulyfound the minimum chi-squared, any change of the model parameterswill increasethevalue of chi-squared). The result is the uncertainty of theparameters.

Inthe case of linear models (straight-line models), or models thatcan be made linear, you can find the parameters analytically -there's no need to iterate. I recommend having a look at chapter15 in Press et al., NumericalRecipes 3rd Edition,or Bevington, DataReduction and Error Analysis tolearn more about linear fitting. At this stage, we're moreinterested in operational fitting techniques.

I'mnecessarily leaving out a lot of important details. In step (7),what is a significant change of chi-squared? Suppose you have Mparametersin your model; in the case of a straight line, M=2 (slope and intercept). As a reasonable approximation, asignificant change of chi-squared for a model is roughly = M.

For example,suppose your best fit chi-squared = 5.2 for a straight-line. Todetermine the uncertainties, change the slope and intercept untilthe new value of chi-squared = 7.2 (roughly); the differencebetween the original slope and the new slope gives a measure ofthe uncertainty. Of course, this can be done more precisely, andfitting codes will usually do a good job of determining theuncertainties of the model parameters, just as we saw in theexample above.

Non-Linear Models

It'soften the case that you cannot linearizeamodel; you cannot modify the data to make the fitting analytical,and you have to resort to an iterative solution as describedabove. We will be using Craig Markwardt's MPFIT fitting tools(http://www.physics.wisc.edu/~craigm/idl/),which you may have to download if they are not already installed.Try compiling the primary procedure mpfit:

.run mpfit

If it compiles,you are in good shape. If not, you should download a copy to yourworking directory, or to your personal IDL library.

To give you afeel for how to use mpfit, we're going to fit a simple, non-linearmodel to noisy data, in this case, a gaussian curve. Recall, agaussian takes the form,

IDL Tutorial: Models and Fitting (3),

where A is theamplitude of the gaussian, x_c isthe location of the center of the gaussian, and sigma measures thewidth of the gaussian. To be clear: the three parametersof the model are amplitude,centroid, andwidth.

We'll use a variant form ofmpfit called mpfitfun.To use mpfitfun, wehave to write a function that will take an array of parameters,values x, andreturn f(x).Here's one way to write the function, and it will give you a niceshell that you can use to make other functions for mpfitfun.

function jacksgaussian, x, p
;x = abscissae
; p = parameters = [amplitude, centroid,width]
argument = (x - p[1]) / p[2]
gaussian = p[0] *exp(-argument^2)
return, gaussian
end

Now,load the data in noisygaussian.txtintovariables, x,y,and dy (dycontains uncertainties). Here's a plot for comparison.

IDL Tutorial: Models and Fitting (4)

Now, to fit these data, makethe file jacksgaussian.proasdefined above. Since mpfitfun is iterative, we need to give it aninitial guess as to what the parameters should be. From inspectionof the plot, above, it looks like the centroid is at about x=10, the width is ~ 10 units in x,and the amplitude looks to be about 1. So, our starting parameterscan be defined,

startparms = [1.0, 10.0, 10.0]

Then, runmpfitfun, and watch as it iteratively improves our initial guessto minimize chi-squared.

parms = mpfitfun('jacksgaussian',x, y, dy, startparms, perror = dparms, yfit=yfit)

Thebest fit parameters will be stored in parms,and their uncertainties will be stored in dparms.The model prediction for each value of xisstored in yfit.You can compare the model with the data by overplotting:

plot, x, y, psym=3

oplot, x, yfit

Whilempfitfun isrunning, it will give you some interesting statistics about eachiteration, namely, the current parameter values, thecurrent value of chi-squared,and the DOF, or degreesof freedom, where DOF= # of data points - number of model parameters. In general, for amodel to be a valid representation of the data, you would expectchi-squared ~ DOF. Isthat the case for this particular modelfit?

Use of DoublePrecision

Sometimes iterative fittingtechniques will hang on a bad solution because the numbers arestored in the computer with insufficient precision; standardfloating-point precision is 7 significant figures (in decimalnumbers). In reality, there may be a better solution, but thecomputer cannot find it because rounding to 7 significant figuresprecludes finding a better solution.

To get around that problem, it's worth working in doubleprecision, where you get roughly 16 significant figures (these arerough conversions because, of course, the computer uses binarynumbers).

To use mpfitfun in double precision, just send theinitial guesses in double precision.

startparms = [1.0d, 10.0d, 10.0d]

Notice that all I did was to add a "d" at the end ofeach decimal number.

Advanced Fitting:Controlling and Limiting the Parameters

The mpfit routines allow for someadditional control over the fitting. For example, what if you hadsome information that the amplitude of the gaussian had to begreater than zero? What if you knew that the centroid wasprecisely 10.2? You want to provide the fitting routine with thisinformation so that it doesn't go off on the wrong path.

Control information is passedthrough the structurevariable parinfo,whichis easier to learn by example rather than by explanation (there isdetailed documentation in the mpfitfun.profile).Here's an example of parinfoforthe situation I just described.

parinfo = replicate({value:0.0,fixed:0, limited:[0,0], limits:[0.0,0.0]}, 3)

This statementinitializes the control array to its defaults; i.e., allparameters vary freely. The last number is the number ofparameters in your model; for the gaussian, 3.

Now, we want tofix the amplitude to be greater than 0.

parinfo[0].limited[0] = 1;parameter 0, the amplitude, has a lower bound

parinfo[0].limits[0] = 0.0; donot accept values less than zero

Now lets forcethe centroid to be 10.2.

parinfo[1].fixed = 1

parinfo[1].value = 10.2

Bythe way, it's a good idea not to use startparms if youare going to use parinfo.Instead, set your initial guesses in parinfo.

parinfo[0].value = 1.0

parinfo[2].value = 10.0

And, of course,we've already set parinfo[1].value when we fixed the centroid.

Now, retry thefit and see what happens.

startparms[1] = 10.2

parms = mpfitfun('jacksgaussian',x, y, dy, perror = dparms, yfit=yfit, parinfo=parinfo)

Exercise

Here'sa new data file: 'datatomodel.txt'.Write your own model function to fit the data using mpfitfun.

IDL Tutorial: Models and Fitting (2024)

References

Top Articles
Latest Posts
Article information

Author: Nathanael Baumbach

Last Updated:

Views: 5932

Rating: 4.4 / 5 (75 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Nathanael Baumbach

Birthday: 1998-12-02

Address: Apt. 829 751 Glover View, West Orlando, IN 22436

Phone: +901025288581

Job: Internal IT Coordinator

Hobby: Gunsmithing, Motor sports, Flying, Skiing, Hooping, Lego building, Ice skating

Introduction: My name is Nathanael Baumbach, I am a fantastic, nice, victorious, brave, healthy, cute, glorious person who loves writing and wants to share my knowledge and understanding with you.