design cannot be n x 1

Bug #434407 reported by Skipper Seabold
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
statsmodels
Fix Released
Undecided
Unassigned

Bug Description

Just so I remember. GLS does not currently work for a n x 1 design array.

import scikits.statsmodels as sm
data = sm.datasets.longley.Load()
data.exog = sm.add_constant(data.exog)
ols_res = sm.OLS(data.endog, data.exog).fit()
res = ols_res.resid
res_regression = sm.OLS(res[1:],res[:1]).fit()

<snip>

ValueError: matrices are not aligned

The one time I tried to work on this, it took a little more attention than I expected or had time for.

summary: - design cannot be 1d
+ design cannot be n x 1
Revision history for this message
joep (josef-pktd) wrote :

In a related issue, I am a bit puzzled why class Model defines self.exog as row vector if original exog is 1d

class Model
def __init__(self, endog, exog=None):
        self.endog = np.asarray(endog)
        self.exog = np.atleast_2d(np.asarray(exog))

I think in GLSAR.__init__, I used a (n,1) exog when I only have the constant:
if exog is None:
            super(GLSAR, self).__init__(endog, np.ones((endog.shape[0],1)))

I guess we need tests to make sure every class works consistently with 1d row or column vectors as exog. Similarly, I'm not sure what the dimension requirements for endog are, 1d or also 2d column vector.

Revision history for this message
joep (josef-pktd) wrote :

res_regression = sm.OLS(res[1:],res[:1]).fit()

here exog has only a single element. It looks like we don't have a check for consistent number of observation in endog and exog.

>>> sm.OLS(res[1:,None], res[:-1,None]).fit().params
array([[-0.36676742]])
>>> sm.OLS(res[1:], res[:-1,None]).fit().params
array([-0.36676742])
>>> sm.OLS(res[1:], res[:-1]).fit().params # 1d exog fails
Traceback (most recent call last):
...
ValueError: matrices are not aligned

But it looks like 1d exog fails,

Revision history for this message
Skipper Seabold (jsseabold) wrote :

Good catch. That was a typo. Should be res[:,-1] I think (not on a machine where I can test).

Part of fixing this "bug" should be adding the checks for shapes. I would like the user to never have to worry about whether the shape is 1d or 2d ie., (N,) vs (N,1). Part of the inconsistency now is from pinv and dot not being able to distinguish between the two (though with good reason in this case). That way, it should be the first test that's written for new models. It's not clear right now how much can be moved up to the parent class wrt your comments on GLSAR.

Revision history for this message
Skipper Seabold (jsseabold) wrote :

I have commited a fix for this and added some tests for shapes in regression to test_regression. It currently tests that the outputs are the same for the shapes of (n,1) or (n,) for endog or (n,1) or (n,) for exog. There is also test that endog.shape[0] = exog.shape[0]. Changes are in my branch only at the moment.

Changed in statsmodels:
status: New → Fix Committed
Changed in statsmodels:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.