Repeat calls to community_leading_eigenvector cause "Unknown ARPACK error" for some graphs

Bug #404256 reported by Michael Bommarito
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
igraph
Confirmed
Undecided
Unassigned

Bug Description

Code worked fine for the past few weeks running on Launchpad trunk from a month or two ago.
Updated today and the following example now crashes.

>> g=igraph.Graph([(2, 0), (3, 0), (4, 1), (4, 1), (5, 2), (6, 4), (7, 5), (7, 4), (8, 4), (8, 4), (9, 4), (9, 0), (10, 2), (11, 7), (12, 0), (13, 8), (13, 9), (13, 8), (15, 6), (16, 9), (16, 2), (16, 3), (17, 0), (17, 8), (17, 2), (17, 14), (18, 15)

In [24]: g.community_leading_eigenvector()
Out[24]: <igraph.clustering.VertexClustering object at 0x21f2610>

In [25]: g.community_leading_eigenvector()
---------------------------------------------------------------------------
InternalError Traceback (most recent call last)

/home/mjbommar/Source/igraph/<ipython console> in <module>()

/usr/lib/python2.6/dist-packages/igraph/__init__.pyc in community_leading_eigenvector(self, clusters, return_merges)
    538 eigenvectors of matrices, arXiv:physics/0605087"""
    539 if clusters is None: clusters=-1
--> 540 cl, merges = GraphBase.community_leading_eigenvector(self, clusters, return_merges)
    541 if merges is None:
    542 return VertexClustering(self, cl)

InternalError: Error at community.c:1184: , Unknown ARPACK error

Revision history for this message
Michael Bommarito (mjbommar) wrote :

Code didn't copy right...I'm attaching a script.
Note that even simplify()'ing the graph still produces the error.

Revision history for this message
Michael Bommarito (mjbommar) wrote :

Reduced the size of the previous edge list while retaining the exception.
The second graph is the first graph less the last edge.
Strange that the second graph with the worse condition number takes an extra call to eigenvector before ARPACK goes...

g=igraph.Graph([(2, 0), (3, 0), (4, 1), (4, 1), (5, 2), (6, 4), (7, 5), (7, 4), (8, 4), (8, 4), (9, 4), (9, 0), (10, 2), (11, 7), (12, 0), (13, 8), (13, 9), (13, 8), (15, 6), (16, 9), (16, 2), (16, 3), (17, 0), (17, 8), (17, 2), (17, 14), (18, 15), (18, 0), (18, 3), (18, 3), (19, 6), (20, 19), (20, 8), (20, 10), (20, 16), (20, 2), (22, 9), (23, 10), (24, 4), (24, 20), (24, 13), (24, 13), (25, 9), (26, 15), (26, 20), (26, 13), (26, 4), (27, 5), (27, 25), (27, 12), (27, 18), (27, 10), (28, 21), (28, 27), (29, 5), (29, 3), (29, 23), (29, 18), (32, 3), (33, 22), (33, 25), (34, 3), (34, 27), (34, 1), (35, 8), (35, 34), (36, 33), (36, 31), (36, 15), (36, 34), (36, 24), (37, 4), (38, 9), (38, 10), (38, 29), (39, 26), (39, 26), (40, 5), (40, 31), (40, 16), (41, 13), (41, 7), (41, 32), (42, 22), (42, 5), (42, 14), (43, 10), (45, 30), (48, 5), (49, 34), (50, 5), (50, 24), (51, 30), (51, 2), (51, 26), (51, 19), (51, 10), (53, 17), (53, 46), (54, 38), (54, 38), (55, 54), (55, 27), (55, 25), (55, 27), (55, 50), (56, 39), (56, 3), (57, 52)])

cond(A) ~ 4e17
max eig ~ 5.59208777925

ARPACK exception is thrown after 2 calls to community_leading_eigenvector().

g=igraph.Graph([(2, 0), (3, 0), (4, 1), (4, 1), (5, 2), (6, 4), (7, 5), (7, 4), (8, 4), (8, 4), (9, 4), (9, 0), (10, 2), (11, 7), (12, 0), (13, 8), (13, 9), (13, 8), (15, 6), (16, 9), (16, 2), (16, 3), (17, 0), (17, 8), (17, 2), (17, 14), (18, 15), (18, 0), (18, 3), (18, 3), (19, 6), (20, 19), (20, 8), (20, 10), (20, 16), (20, 2), (22, 9), (23, 10), (24, 4), (24, 20), (24, 13), (24, 13), (25, 9), (26, 15), (26, 20), (26, 13), (26, 4), (27, 5), (27, 25), (27, 12), (27, 18), (27, 10), (28, 21), (28, 27), (29, 5), (29, 3), (29, 23), (29, 18), (32, 3), (33, 22), (33, 25), (34, 3), (34, 27), (34, 1), (35, 8), (35, 34), (36, 33), (36, 31), (36, 15), (36, 34), (36, 24), (37, 4), (38, 9), (38, 10), (38, 29), (39, 26), (39, 26), (40, 5), (40, 31), (40, 16), (41, 13), (41, 7), (41, 32), (42, 22), (42, 5), (42, 14), (43, 10), (45, 30), (48, 5), (49, 34), (50, 5), (50, 24), (51, 30), (51, 2), (51, 26), (51, 19), (51, 10), (53, 17), (53, 46), (54, 38), (54, 38), (55, 54), (55, 27), (55, 25), (55, 27), (55, 50), (56, 39), (56, 3)])

cond(A) ~ 3e32
max eig ~ 5.59208777925

ARPACK exception is thrown after 3 calls to community_leading_eigenvector().

Revision history for this message
Michael Bommarito (mjbommar) wrote :

Another finding: leading_eigenvector_naive does not throw the exception.

Revision history for this message
Michael Bommarito (mjbommar) wrote :

The previous reports were all from Linux (ubuntu 9.04) 64-bit, Python 2.6 and igraph-0.6 from Launchpad.
Had a chance to reboot and test the script I uploaded on Windows XP, 32-bit, Python 2.6 and igraph 0.5.2.

This throws the mxiter RuntimeWarning, but still runs.
>>python break.py
c:\python\lib\site-packages\igraph\__init__.py:519: RuntimeWarning: Maximum number of ARPACK iterations reached at .\src\community.c:1176
  cl, merges = GraphBase.community_leading_eigenvector(self, clusters, return_merges)

Revision history for this message
Michael Bommarito (mjbommar) wrote :

Testing again on the Linux side, but this time bypassing the python module. I'm uploading some C code that just loads the edge list above and calls eigenvector a few times, outputting the return value of eigenvector.

$ gcc test.c -o test -ligraph && ./test
|V| = 58, |E| = 109
evresult #0: 0
evresult #1: 0
evresult #2: 0
evresult #3: 0
evresult #4: 0
evresult #5: 0
evresult #6: 0
evresult #7: 0
evresult #8: 0
evresult #9: 0
evresult #10: 0

Revision history for this message
Michael Bommarito (mjbommar) wrote :

Based on the output from 0.5.2 on Windows, I decided to test the maxiter parameter in 0.6.

Could the error handling be what broke? Did the maxiter warning code maybe get switched and now it's unrecognized, getting caught by the "Unknown error" handler?

igraph.arpack_options.maxiter = 10 => Unknown ARPACK error
igraph.arpack_options.maxiter = 100 => Unknown ARPACK error
igraph.arpack_options.maxiter = 1000 => Unknown ARPACK error
igraph.arpack_options.maxiter = 10000 => OK

I guess I'm not sure how many iterations are appropriate here. I have a good understanding of the eigenvalue methods from a theoretical standpoint, but I never learned the details of the implicitly restarted algorithms in class.

Revision history for this message
Michael Bommarito (mjbommar) wrote :

More bad things with the python interface...trying to simplify the problem.

The 3-star graph causes problems with Python:

'''
import igraph
g = igraph.Graph([(0, 1), (0, 2)])

for i in range(10):
 print i, g.community_leading_eigenvector().membership
'''

This returns different results for each call:
0 [0, 1, 0]
1 [0, 0, 0]
2 [0, 1, 0]
3 [0, 0, 1]
4 [0, 0, 0]
5 [0, 0, 0]
6 [0, 0, 0]
7 [0, 1, 0]
8 [0, 0, 1]
9 [0, 1, 0]

The 4-star graph causes the ARPACK exception on the second call.
g = igraph.Graph([(0, 1), (0, 2), 0, 3)])
...
0 [0, 1, 0, 0]
1
Traceback (most recent call last):
  File "break2.py", line 9, in <module>
    print i, g.community_leading_eigenvector().membership
  File "/usr/lib/python2.6/dist-packages/igraph/__init__.py", line 540, in community_leading_eigenvector
    cl, merges = GraphBase.community_leading_eigenvector(self, clusters, return_merges)
igraph.core.InternalError: Error at community.c:1183: , Unknown ARPACK error

The eigenvalues and eigenfunctions of the k-star can be written explicitly, but the cond(A) gets pretty bad quite quickly I guess. Why though does this work directly from the C library but not from python?

Revision history for this message
Michael Bommarito (mjbommar) wrote :

Another platform with no problems: RHEL 5.4, 32-bit with python 2.6 and igraph 0.5.2. So both 32bit operating systems I've tried work fine with 0.5.2.

I also compiled the 0.6 C library with gcc 4.3 and 4.2 on 64-bit ubuntu, and both cause problems.

Revision history for this message
Tamás Nepusz (ntamas) wrote :

Hi Michael,
Thanks for the detailed bug report, I will take a closer look at this issue tomorrow and try to come up with a fix.

Changed in igraph:
status: New → Confirmed
Revision history for this message
Michael Bommarito (mjbommar) wrote :

Hello Tamas,
  Thanks! I have the jobs running on 0.5.2, so no rush (for me).

  Do let me know if you'd like me to test a patch against one of the architecture I mentioned above.

  I should also mention that I have a random model that seems to be sampling from a space that is dense with these problem matrices, so I can make sure we didn't just fix a special case.

Revision history for this message
Tamás Nepusz (ntamas) wrote :

Okay, so here's what I figured out so far:

1. The original bug report (unknown ARPACK error) is caused by incorrect error handling in igraph_i_arpack_err_dsaupd. Practically, error codes 1 and 3 were not handled properly. The second call to Graph.leading_eigenvector_community() resulted in a "maximum number of iterations reached" error message which was then turned to an unknown ARPACK error.

2. Having fixed the error handler, I tried the test case in #7 with the 4-star graph. With the fixed error handler, I got an error message complaining about NCV (the number of Lanczos vectors to be generated) is too small. The ARPACK documentation says the following: "At present there is no a-priori analysis to guide the selection of NCV relative to NEV. The only formal requirement is that NCV > NEV. However, it is recommended that NCV .ge. 2*NEV.". Since NEV is 1 (we need the leading eigenvector only), NCV was set to 3 per default. I increased NCV to 4 and it seems to have fixed the problem. I also checked other C code snippets that use ARPACK in other projects, and the general rule of thumb seems to set NCV to 4 * NEV and threshold it with the number of rows in the input matrix from above.

3. The inconsistency wrt the obtained results in repeated calls of community_leading_eigenvector() stems from the fact that the top eigenvalue of the modularity matrix is zero for these graphs if I calculated it right. Due to rounding errors and such, the actual eigenvalue calculated by ARPACK can be slightly larger than zero depending on the random initial starting vector used. igraph uses a hard threshold at zero, i.e. if the largest eigenvalue is less than or equal to zero, the community splitting process is stopped. If it is slightly larger than zero, the process will continue. If I increase the threshold to 1e-5 in the code, I always get the same results for star graphs, but I'm not sure that there isn't a better solution.

I'm attaching a patch that should fix the above mentioned issues. Can you please check it on your platform(s)? The patch is against the head revision of the igraph trunk, but it should work against the 0.5 branch as well.

Revision history for this message
Michael Bommarito (mjbommar) wrote :

Hi Tomas,
  Ok, I applied the patch against linux 64-bit and it now passes the break.py attachment I uploaded earlier (as well as the 4-star as you tested).

  I decided to test this against the random model that first produced the bug report. I get through about 100-200 random graphs before an error now, which is much better than the 5-10 I was seeing prior to the patch.

Traceback (most recent call last):
  File "model1.py", line 91, in <module>
    m.run(50)
  File "model1.py", line 65, in run
    communityEV[self.time] = gg.community_leading_eigenvector().membership
  File "/usr/lib/python2.6/dist-packages/igraph/__init__.py", line 540, in community_leading_eigenvector
    cl, merges = GraphBase.community_leading_eigenvector(self, clusters, return_merges)
igraph.core.InternalError: Error at community.c:1185: , Maximum number of iterations reached

Undirected graph (|V| = 10, |E| = 15)
[(0, 1), (0, 2), (0, 3), (0, 6), (1, 6), (1, 7), (2, 4), (2, 5), (2, 7), (3, 7), (3, 9), (4, 6), (4, 7), (5, 6), (5, 8)]

So it looks much better. I might just catch these particularly nasty graphs and accept some sampling bias in my model.

In case you're curious, I'll dump a few of these. They almost all appear for early time steps for slowly growing graphs:
[(1, 0), (2, 0), (2, 1), (3, 0), (3, 1), (4, 0), (4, 1), (5, 1), (5, 0)]
[(1, 0), (2, 0), (2, 1), (3, 1), (3, 0), (4, 1), (4, 0)]
[(1, 0), (2, 0), (3, 0), (4, 2), (4, 0), (4, 3), (4, 1)]
[(1, 0), (2, 0), (3, 1), (3, 0), (3, 2), (4, 2), (4, 1), (4, 0)]
[(1, 0), (2, 0), (3, 1), (3, 2), (4, 1), (4, 2), (4, 3), (4, 0)]
[(1, 0), (2, 1), (2, 0), (3, 0), (3, 1), (4, 0), (4, 1)]
[(1, 0), (2, 1), (2, 0), (3, 0), (3, 1), (4, 0), (4, 2), (4, 3)]
[(1, 0), (2, 1), (2, 0), (3, 0), (3, 1), (4, 1), (4, 0)]
[(1, 0), (2, 1), (2, 0), (3, 0), (3, 2), (4, 0), (4, 2)]

Revision history for this message
Michael Bommarito (mjbommar) wrote :

Alright, tried this on linux 32-bit now too. The patch has the same effect as on 64-bit.

I did realize that there are actually two exceptions being thrown depending on the graph:

igraph.core.InternalError: Error at community.c:1185: , No shifts could be applied during a cycle of the Implicitly restarted Arnoldi iteration. One possibility is to increase the size of NCV relative to NEV

igraph.core.InternalError: Error at community.c:1185: , Maximum number of iterations reached

I followed the advice of the first exception and bumped NCV to 5. The code has now run for 500 graphs without either exception being thrown. It is somewhat slower, however.

Revision history for this message
Michael Bommarito (mjbommar) wrote :

Hm, turns out I can still find certain parameters that cause lots of problems.

I've written a very very simplified version of my model and attached it. It does require numpy and igraph, but I'm assuming you have numpy too...

I get a failure about 1/2 or 1/3 times I run the script.

Revision history for this message
Tamás Nepusz (ntamas) wrote :

I have a feeling that you get failures when the graph contains isolated vertices (or maybe small isolated components, though I'm not sure about the latter). An isolated vertex results in a zero eigenvalue, and its corresponding eigenvector will contain zeroes only except for the single component that corresponds to the isolated vertex itself. I believe that eigenvalues close to zero take a long time to converge using ARPACK, so when the splitting process reaches the stage when the zero eigenvalue becomes the largest, ARPACK will sometimes exceed the number of iterations allowed. You can easily set a larger limit, though:

>>> igraph.arpack_options.mxiter = 100000

This usually works for me. The default is only 3000, and it was chosen quite arbitrarily, so maybe we should simply increase it in the next release. (Note that the leading eigenvector method uses multiple ARPACK invocations, so checking igraph.arpack_options.iter after a successful run of Graph.community_leading_eigenvector() will tell you the number of iterations for the last invocation only, which can easily be lower than the peak).

I think an even more sophisticated solution that we should consider is to add a pre-processing step before invoking ARPACK that decomposes the original graph to connected components, calls the community detection method on each of the components and merges the results in the end.

On the other error message related to NEV and NCV: yes, a single Arnoldi iteration takes O(n*NCV*NCV) time, that's why the algorithm is slower when NCV is larger, and that's why we try to keep NCV low. I'm not sure what we should do with this issue; I'm reluctant to raise NCV any further as a lower value is satisfactory for most problem instances. It might also happen that removing isolated vertices as a pre-processing step also resolves the NCV issue.

Revision history for this message
Tamás Nepusz (ntamas) wrote :

Revision #1624 contains some stability improvements to igraph_community_leading_eigenvector{_naive} as follows:

- The stopping threshold for the leading eigenvalue is 0.000001 instead of strictly zero -- this eliminates the issues with non-deterministic results in case of practically indivisible graphs (e.g., small star graphs)

- The connected components of the graph are determined as a preprocessing step in igraph_community_leading_eigenvector{_naive} using igraph_clusters(). Each cluster is then subdivided further using the eigenvector method. This eliminates the issues with graphs containing isolated vertices or many small components. (The attached simpleModel.py now runs without error messages using the default ARPACK iteration theshold, which is only 3000).

However, the graphs in comment #12 still don't work really well. They can also be considered practically indivisible, and a common property of them is that the top eigenvalue is zero. (In some cases, the multiplicity of the top eigenvalue is greater than 1). I guess ARPACK struggles to find the corresponding eigenvector within machine precision.

Revision history for this message
Gábor Csárdi (gabor.csardi) wrote : Continue on github

The development of igraph has moved to github, so please do not comment on this bug here. You are of course welcome to comment on github, here:
https://github.com/igraph/igraph/issues/399

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.