Advertisement

[Solved] 2D linear regression problem

Started by August 29, 2012 09:44 AM
2 comments, last by Vaclav 12 years, 5 months ago
Edit: The problem has been solved, if anyone has the same problem see post #4.

I've been looking for a simple linear regression method, I've implemented one but it doesn't work the way I would like. Can someone either recommend me a free, easy to use C or C++ library which has least squares linear regression and tell me how to use it as well, or point out the mistake in my code?
At the moment my code doesn't use least squares, just a simple linear regression, but I don't think it should be a problem, should it?
This is the algorithm I've implemented: http://easycalculati...-regression.php

My problem with the current linear regression implementation:
The purple line fitted to the white points should be vertical, not horisontal:
linreg3.jpg
However, if I swap x and y data, it works:
linreg2.jpg
However, in the game where I use this function, I can't swap x and y, because the function must work every direction. Same problems in-game:
The purple line is fitted by my function to the white points inside the green box, but the correctly fitted line would be like the orange one which I have drawn by hand:
linreg0.jpglinreg1.jpg

Here is my code:
(It's the algorithm from the link above, I've only added two checks not to divide by zero.)
[source lang="cpp"]//return: equation of line: y = *retA + (*retB)*x
//p is a vector containing the points, CVector2 is my class containing two double values, x and y
void gLinReg( std::vector< CVector2 >* p, double* retA, double* retB) {

unsigned int i;
std::vector<double> xy, x2;
double sx=0.0, sy=0.0, sxy=0.0, sx2=0.0;

//the number of values
unsigned int n = p->size();
if( n == 0 ) { return; }

//find X*Y and X^2 for all values
for(i=0; i<n; i++) {
xy.push_back( p->at(i).x*p->at(i).y );
x2.push_back( p->at(i).x*p->at(i).x );
}

//find sumx, sumy, sumxy, sumx2
for(i=0; i<n; i++) {
sx += p->at(i).x;
sy += p->at(i).y;
sxy += xy.at(i);
sx2 += x2.at(i);
}

//get slope
//B = (NSXY - (SX)(SY)) / (NSX2 - (SX)2)
double denom = (n*sx2 - sx*sx); //the denominator
//prevent dividing by 0 in the next step
if( denom == 0.0 ) {
*retB = signF((n*sxy - sx*sy))*100.0; //*100 instead of / (NSX2 - (SX)2) to get a big slope
//signF is just a sign function returning 1 or -1
}
else
*retB = (n*sxy - sx*sy) / (n*sx2 - sx*sx);

//get the interception point
//Intercept(a) = (SY - b(SX)) / N
*retA = (sy - (*retB)*sx) / n;

}[/source]

And Here is how I use it:
[source lang="cpp"]//...
std::vector< CVector2 > p;
p.push_back(CVector2(60,10));
p.push_back(CVector2(60,20));
p.push_back(CVector2(60,80));
p.push_back(CVector2(60,90));
p.push_back(CVector2(60,100));
p.push_back(CVector2(55,30));
p.push_back(CVector2(45,40));
p.push_back(CVector2(40,50));
p.push_back(CVector2(40,60));
p.push_back(CVector2(50,70));

int c;
double rA, rB;
gLinReg( &p, &rA, &rB);
drawLine( game.screen, rA, rB, 50);
for(c=0; c < p.size(); c++)
pixelRGBA( game.screen, p.at©.x, p.at©.y, 0xFF, 0xFF, 0xFF, 0xFF );

//...

//And here is how I draw the line for testing:

//px is the center of the line
void drawLine( SDL_Surface* dest, double A, double B, int px) {

CVector2 lineA, lineB;
lineA.set( px-50, A+B*(px-50));
lineB.set( px+50, A+B*(px+50));
lineRGBA( dest, lineA.x, lineA.y,
lineB.x, lineB.y, 0xFF, 0, 0xFF, 0xFF );
}[/source]

Thank you for your help.
Vaclav
I recommend lapack, or its C interface, lapacke. The function you are looking for is "LAPACK_dgels", http://www.netlib.org/lapack/lapacke.html
Advertisement
The most common form of linear regression minimizes the sum of the squares of the distances between the line and the data points measured vertically. What you want is called total least squares, which minimizes the sum of the squares of the distances using both coordinates.
@ alvaro Thank you, you're right. It's total least squares linear regression.

@ Spline Thanks, but that library and others I found don't mention total least squares. Also to me learning how to install, use and include these seem to be more time than writing one myself. Also I forgot to mention that my project is cross platform (windos and linux), I would need a cross platform library.

If anyone will need it later as well, after watching these videos I understand least squares regression much better:


(parts 1-4)


I've also found that total least squares regression is also called Deming regression: http://en.wikipedia....ming_regression

On this wiki page, at the Solution section, the formulas are very similar to my previous ones but they are extended.
So in the end I'll get the equotion on the line : y = ?0 + ?1x
I only have to calculate ?1 and ?0 from the given formulas, substituting the means and the sums. The only problem might be with delta in the formula.
About delta at the Specification section the wiki writes: "In practice the variance of the x and y parameters is often unknown which complicates the estimate of delta but where the measurement method for x and y is the same they are likely to be equal so that delta = 1 for this case." Though in my case the x and y values aren't measured ones.

EDIT: I've implemented the deming regression and it works exactly how I wanted it to (first post). Delta = 1 proved to be good.

This topic is closed to new replies.

Advertisement