Unhelpful error from bad exog matrix in model.py

Bug #688775 reported by Wes McKinney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
statsmodels
New
Undecided
Unassigned

Bug Description

These lines in model.py could be improved:

            if np.any(exog.var(0) == 0):
                # assumes one constant in first or last position
                const_idx = np.where(exog.var(0) == 0)[0].item()

example traceback

  File "/home/wesm/code/pandas/pandas/stats/tests/test_ols.py", line 67, in checkOLS
    reference = sm.OLS(endog, sm.add_constant(exog)).fit()
  File "/usr/lib/epd-6.2/lib/python2.6/site-packages/scikits.statsmodels-0.3.0dev-py2.6.egg/scikits/statsmodels/regression.py", line 511, in __init__
    super(OLS, self).__init__(endog, exog)
  File "/usr/lib/epd-6.2/lib/python2.6/site-packages/scikits.statsmodels-0.3.0dev-py2.6.egg/scikits/statsmodels/regression.py", line 404, in __init__
    super(WLS, self).__init__(endog, exog)
  File "/usr/lib/epd-6.2/lib/python2.6/site-packages/scikits.statsmodels-0.3.0dev-py2.6.egg/scikits/statsmodels/regression.py", line 163, in __init__
    super(GLS, self).__init__(endog, exog)
  File "/usr/lib/epd-6.2/lib/python2.6/site-packages/scikits.statsmodels-0.3.0dev-py2.6.egg/scikits/statsmodels/model.py", line 94, in __init__
    super(LikelihoodModel, self).__init__(endog, exog)
  File "/usr/lib/epd-6.2/lib/python2.6/site-packages/scikits.statsmodels-0.3.0dev-py2.6.egg/scikits/statsmodels/model.py", line 57, in __init__
    const_idx = np.where(exog.var(0) == 0)[0].item()
ValueError: can only convert an array of size 1 to a Python scalar

example exog matrix that causes blowup:

array([[ 24. , 2. , 4. , 0. , 1. ],
       [ 21. , 1.7 , 2.89 , 0. , 1. ],
       [ 24. , 2.8 , 7.84 , 0. , 1. ],
       [ 26. , 2.4 , 5.76 , 0. , 1. ],
       [ 33. , 3. , 9. , 0. , 1. ],
       [ 34. , 4.8 , 23.04 , 0. , 1. ],
       [ 33. , 3.18 , 10.1124, 0. , 1. ],
       [ 21. , 1.5 , 2.25 , 0. , 1. ],
       [ 25. , 3. , 9. , 0. , 1. ],
       [ 27. , 2.28 , 5.1984, 0. , 1. ]])

Revision history for this message
joep (josef-pktd) wrote :

I fixed this in my branch earlier this week. I think, to

const_idx = np.where(exog.var(0) == 0)[0][0].item()

I haven't merged my branch into devel yet this week.

Revision history for this message
Wes McKinney (wesmckinn) wrote :

But isn't that still kind of an issue with the X matrix I listed in the bug report?

Revision history for this message
joep (josef-pktd) wrote :

Yes and no. Which result would you expect in this case?

For me, It got rid of the exception, and I tried a few cases where it worked.

const_idx is currently used only in the summary print, so it doesn't mess up any calculations. I'm starting to use a boolean, whether we have a constant, a bit more in my sandbox code for degrees of freedom correction, and I think that should be properly included everywhere it matters for dofs.

We are currently not warning about singular design matrices including multiple constants. But my thinking goes in the direction of setting a "sm.warning_level" that, if set to true or to raise, then we do additional checks that the model makes sense. I prefer not to raise any exceptions until we have a way to turn them off, both because it can be costly to check and singular design matrices are sometimes useful.
This I want to discuss on the mailing list before we release 0.3

One possiblility for const_idx is to also check whether one of the zero variance columns has mean == 1, that should be easy to add. But the reported summary is still not really useful because pinv splits up the contribution of the constant. I haven't thought much about a column of zeros, and how it would affect the results of pinv, but it looks too much of a special case to treat it separately.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.