Convergence Failure with Newton, Bug in discrete GLM Poisson model

Bug #673197 reported by Hanno Starling
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
statsmodels
New
Undecided
Unassigned

Bug Description

updated diagnosis:
Newton, the default optimizer, does not converge to the correct solution. I didn't look at the details but I guess the stepsize selection is not robust. The gradient in the example is very large. The example converges with Nelder-Mead.

original description:
The attachment has three columns with discrete data. Try scikits.statsmodels.discretemod.Poisson on the data.
The residuals of the middle column will all be negative, which is wrong.
This is because the slope is very close to zero which causes some bug in the algorithm.
The first and last column give good results.

Revision history for this message
Hanno Starling (hanno-spreeuw) wrote :
Revision history for this message
joep (josef-pktd) wrote :

Hi Hanno

Thanks for the report. Can you add the example or the code that shows the bug? For example, it's not clear to me what is the endogenous variable and which are the regressors, exog.

Trying some combination of columns, I'm not able to get an error. We fixed a 0*log(0) error a while ago in trunk, maybe you see this problem. Are you using 0.2.0. ?

Revision history for this message
Hanno Starling (hanno-spreeuw) wrote : Re: [Bug 673197] Re: Bug in discrete GLM Poisson model

Hi Joep,

Below is the exogeous variable, time_array_plus
import scikits.statsmodels as sm
time_array_plus=sm.add_constant(time_array)
tolerance=1e-6
all_output=(sm.discretemod.Poisson(y,time_array_plus)).fit(tol=tolerance)
Replace y by the middle column, you will get a strange result.
The first and last column should give good results.
For the middle column I get
b=all_output.params[0]<1e-14.
This is correct but seems to cause an incorrect a=all_output.params[1].

Hanno.

+1951.00000  +1.00000
+1952.00000  +1.00000
+1953.00000  +1.00000
+1954.00000  +1.00000
+1955.00000  +1.00000
+1956.00000  +1.00000
+1957.00000  +1.00000
+1958.00000  +1.00000
+1959.00000  +1.00000
+1960.00000  +1.00000
+1961.00000  +1.00000
+1962.00000  +1.00000
+1963.00000  +1.00000
+1964.00000  +1.00000
+1965.00000  +1.00000
+1966.00000  +1.00000
+1967.00000  +1.00000
+1968.00000  +1.00000
+1969.00000  +1.00000
+1970.00000  +1.00000
+1971.00000  +1.00000
+1972.00000  +1.00000
+1973.00000  +1.00000
+1974.00000  +1.00000
+1975.00000  +1.00000
+1976.00000  +1.00000
+1977.00000  +1.00000
+1978.00000  +1.00000
+1979.00000  +1.00000
+1980.00000  +1.00000
+1981.00000  +1.00000
+1982.00000  +1.00000
+1983.00000  +1.00000
+1984.00000  +1.00000
+1985.00000  +1.00000
+1986.00000  +1.00000
+1987.00000  +1.00000
+1988.00000  +1.00000
+1989.00000  +1.00000
+1990.00000  +1.00000
+1991.00000  +1.00000
+1992.00000  +1.00000
+1993.00000  +1.00000
+1994.00000  +1.00000
+1995.00000  +1.00000
+1996.00000  +1.00000
+1997.00000  +1.00000
+1998.00000  +1.00000
+1999.00000  +1.00000
+2000.00000  +1.00000
+2001.00000  +1.00000
+2002.00000  +1.00000
+2003.00000  +1.00000
+2004.00000  +1.00000
+2005.00000  +1.00000
+2006.00000  +1.00000
+2007.00000  +1.00000
+2008.00000  +1.00000
+2009.00000  +1.00000

--- On Tue, 11/9/10, joep <email address hidden> wrote:

From: joep <email address hidden>
Subject: [Bug 673197] Re: Bug in discrete GLM Poisson model
To: <email address hidden>
Date: Tuesday, November 9, 2010, 9:49 PM

Hi Hanno

Thanks for the report. Can you add the example or the code that shows
the bug?  For example, it's not clear to me what is the endogenous
variable and which are the regressors, exog.

Trying some combination of columns, I'm not able to get an error. We
fixed a 0*log(0) error a while ago in trunk, maybe you see this problem.
Are you using 0.2.0. ?

--
Bug in discrete GLM Poisson model
https://bugs.launchpad.net/bugs/673197
You received this bug notification because you are a direct subscriber
of the bug.

Status in scikits.statsmodels: New

Bug description:
The attachment has three columns with discrete data. Try scikits.statsmodels.discretemod.Poisson on the data.
The residuals of the middle column will all be negative, which is wrong.
This is because the slope is very close to zero which causes some bug in the algorithm.
The first and last column give good results.

To unsubscribe from this bug, go to:
https://bugs.launchpad.net/statsmodels/+bug/673197/+subscribe

Revision history for this message
Hanno Starling (hanno-spreeuw) wrote :

PS. This error occurs in 0.2.0 and 0.3.0dev

--- On Tue, 11/9/10, joep <email address hidden> wrote:

From: joep <email address hidden>
Subject: [Bug 673197] Re: Bug in discrete GLM Poisson model
To: <email address hidden>
Date: Tuesday, November 9, 2010, 9:49 PM

Hi Hanno

Thanks for the report. Can you add the example or the code that shows
the bug?  For example, it's not clear to me what is the endogenous
variable and which are the regressors, exog.

Trying some combination of columns, I'm not able to get an error. We
fixed a 0*log(0) error a while ago in trunk, maybe you see this problem.
Are you using 0.2.0. ?

--
Bug in discrete GLM Poisson model
https://bugs.launchpad.net/bugs/673197
You received this bug notification because you are a direct subscriber
of the bug.

Status in scikits.statsmodels: New

Bug description:
The attachment has three columns with discrete data. Try scikits.statsmodels.discretemod.Poisson on the data.
The residuals of the middle column will all be negative, which is wrong.
This is because the slope is very close to zero which causes some bug in the algorithm.
The first and last column give good results.

To unsubscribe from this bug, go to:
https://bugs.launchpad.net/statsmodels/+bug/673197/+subscribe

Revision history for this message
joep (josef-pktd) wrote : Re: Bug in discrete GLM Poisson model

there are some convergence problems, I don't know yet why.
Leaving out the trend, then the example with the second column works.
If I use a similar starting value, then it also works.
If the starting value for the trend is too large, then it doesn't iterate. My guess is that Newton optimization is giving up to early if it doesn't find an improvement.

Switching to method='nm' with a larger maxiter (Nelder-Mead), it also converges. When in doubt about convergence, Nelder-Mead is the most robust in my experience.

We need to see if we can improve Newton for cases like this.

Josef

import numpy as np
import scikits.statsmodels as sm

y = np.loadtxt('three_stations_data.txt')

x = np.arange(y.shape[0])
x = sm.add_constant(x)

res = sm.discretemod.Poisson(y[:,0],x).fit()
print res.params

'''
>>> sm.discretemod.Poisson(y[:,1],x).fit(start_params=[0.05,1.255]).params
Optimization terminated successfully.
         Current function value: 123.985170
         Iterations 7
array([ -6.52157518e-19, 1.25518135e+00])
>>> sm.discretemod.Poisson(y[:,1],x).fit(start_params=[0.05,1.]).params
Optimization terminated successfully.
         Current function value: 123.985170
         Iterations 7
array([ 9.29652797e-20, 1.25518135e+00])
>>> sm.discretemod.Poisson(y[:,1],x).fit(start_params=[0.5,1.]).params
Optimization terminated successfully.
         Current function value: 9991462849916.316400
         Iterations 1
array([ 5.00000000e-01, 3.02330960e-09])

>>> sm.discretemod.Poisson(y[:,1],x).fit(start_params=[0.5,1.], method='nm', maxiter=1000).params
Optimization terminated successfully.
         Current function value: 123.985170
         Iterations: 49
         Function evaluations: 93
array([ 7.16034471e-07, 1.25517021e+00])
'''

joep (josef-pktd)
summary: - Bug in discrete GLM Poisson model
+ Convergence Failure with Newton, Bug in discrete GLM Poisson model
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.