Forest Fire game has crucial error

Bug #715536 reported by Mayank Lahiri
2
Affects Status Importance Assigned to Milestone
igraph
Fix Released
Medium
Gábor Csárdi

Bug Description

The current implementation of the Forest Fire network generation game has a subtle bug that results in graphs that are *significantly* denser than what the original research paper proposed. I found it impossible to reproduce the results in the research paper [1], so I compared the igraph implementation with the one in the SNAP network library (which contains a version written presumably by the author of [1]).

The bug is as follows: in the main game loop, although the original paper says that the two generated numbers x and y are geometrically distributed, they are actually geometrically distributed MINUS ONE. In other words, the SNAP implementation allows 0 to be chosen for the number of links to burn at *each iteration* with the highest probability, rather than 1. This means that fires burn *much* longer than they are intended to, and the resultant graphs are much denser than intended.

I have verified that the simple "-1" factor generates graphs with the same quantitative trends as in the paper [1].

The patch is as follows to the trunk of igraph-0.6: lines 199 and 200 of src/forestfire.c should be changed from:

long int neis_out=RNG_GEOM(param_geom_out);
long int neis_in=RNG_GEOM(param_geom_in);

to:

long int neis_out=RNG_GEOM(param_geom_out) - 1;
long int neis_in=RNG_GEOM(param_geom_in) - 1;

For verification, the corresponding lines in the SNAP graph library source are lines 110 and 128 in snap/ff.cpp

References:
[1] "Graphs over time: densification laws, shrinking diameters and possible explanations", Leskovec et al, TKDD 2007.

Also:
The igraph implementation appears to be using a Poisson approximation to a binomial distribution instead of a geometric variate. Any reason for doing this instead of generating a random geometric variate directly?

Mayank Lahiri (mlahiri)
description: updated
Revision history for this message
Gábor Csárdi (gabor.csardi) wrote :

Thanks for pointing this out. At the time the paper came out, SNAP did not exist (or at least I didn't know about it) and I remember that I had could not manage to replicate the exact results, but since the algorithm was correctly implemented, as in the paper, and the results were qualitatively more or less OK (for slightly different parameters), there wasn't much I could do.

Actually the algorithm is not even the same in the different versions of the paper, see these:
http://cs.stanford.edu/people/jure/pubs/powergrowth-kdd05.pdf
http://www.cs.cornell.edu/home/kleinber/kdd05-time.pdf
and I remember a third version that I cannot find now. The version I implemented gave more or less OK results.

I'll correct this, and add an argument that chooses between the two versions.

As for the Poisson approximation, the code is coming from the R internals and based on this book:
Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.

Changed in igraph:
importance: Undecided → Medium
status: New → Confirmed
assignee: nobody → Gábor Csárdi (gabor.csardi)
Revision history for this message
Mayank Lahiri (mlahiri) wrote : Re: [Bug 715536] Re: Forest Fire game has crucial error
Download full text (4.7 KiB)

Thank you for the quick response. I've had some trouble implementing the
algorithm from the paper's description as well.

I just noticed one more thing about the random variates: the KDD05 paper
says that the two variates are drawn from a binomial distribution (which
explains the Poisson approximation), but the TKDD07 journal paper says that
the two variates are drawn from a geometric distribution. The SNAP code uses
the geometric distribution, and in fact a method from the same Devroye book
you mentioned, which is essentially this:

geom(p):
    if(p == 1.0)
        return 1;
    return ceil(log(1.0 - runif()) / log(1.0-p));

where runif() is uniform random in [0,1) and p is in (0,1].

If you're putting in a separate option for forest fire, perhaps this could
be included as a change as well. I was only able to reproduce results from
the paper with the "-1" factor and the geometric variates as above.

The net result of the geometric distribution and the lack of "-1" in the
paper's description is that whenever a new node can link to at least one new
vertex during each stage of a forest fire, it will.

Thank you again for all the effort put into developing igraph. We have been
using it at our lab for the last 4 years and it has been wonderful,
especially the choice of C/Python/R interfaces.

-Mayank

2011/2/8 Gábor Csárdi <email address hidden>

> Thanks for pointing this out. At the time the paper came out, SNAP did
> not exist (or at least I didn't know about it) and I remember that I had
> could not manage to replicate the exact results, but since the algorithm
> was correctly implemented, as in the paper, and the results were
> qualitatively more or less OK (for slightly different parameters), there
> wasn't much I could do.
>
> Actually the algorithm is not even the same in the different versions of
> the paper, see these:
> http://cs.stanford.edu/people/jure/pubs/powergrowth-kdd05.pdf
> http://www.cs.cornell.edu/home/kleinber/kdd05-time.pdf
> and I remember a third version that I cannot find now. The version I
> implemented gave more or less OK results.
>
> I'll correct this, and add an argument that chooses between the two
> versions.
>
> As for the Poisson approximation, the code is coming from the R internals
> and based on this book:
> Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag,
> New York. Page 480.
>
> ** Changed in: igraph
> Importance: Undecided => Medium
>
> ** Changed in: igraph
> Status: New => Confirmed
>
> ** Changed in: igraph
> Assignee: (unassigned) => Gábor Csárdi (gabor.csardi)
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/715536
>
> Title:
> Forest Fire game has crucial error
>
> Status in The igraph library:
> Confirmed
>
> Bug description:
> The current implementation of the Forest Fire network generation game
> has a subtle bug that results in graphs that are *significantly*
> denser than what the original research paper proposed. I found it
> impossible to reproduce the results in the research paper [1], so I
> compared the igraph implementation with the one in the SNAP network
> l...

Read more...

Revision history for this message
Gábor Csárdi (gabor.csardi) wrote :

It turns out, something else was the problem. See http://en.wikipedia.org/wiki/Geometric_distribution and the two kinds of geometric distribution. Your RNG samples from the k=1,2,3,... kind, the one in igraph samples from the k=0,1,2,3,... kind. So there is no need to subtract one.

There was, however a bug in the code, that I have fixed in revision #2335 (0.6-main).

Thanks again for the report.

Changed in igraph:
status: Confirmed → Fix Released
Revision history for this message
Mayank Lahiri (mlahiri) wrote :

Thanks, that seems to have done the trick!
-Mayank

2011/2/13 Gábor Csárdi <email address hidden>

> It turns out, something else was the problem. See
> http://en.wikipedia.org/wiki/Geometric_distribution and the two kinds of
> geometric distribution. Your RNG samples from the k=1,2,3,... kind, the
> one in igraph samples from the k=0,1,2,3,... kind. So there is no need
> to subtract one.
>
> There was, however a bug in the code, that I have fixed in revision
> #2335 (0.6-main).
>
> Thanks again for the report.
>
> ** Changed in: igraph
> Status: Confirmed => Fix Released
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/715536
>
> Title:
> Forest Fire game has crucial error
>
> Status in The igraph library:
> Fix Released
>
> Bug description:
> The current implementation of the Forest Fire network generation game
> has a subtle bug that results in graphs that are *significantly*
> denser than what the original research paper proposed. I found it
> impossible to reproduce the results in the research paper [1], so I
> compared the igraph implementation with the one in the SNAP network
> library (which contains a version written presumably by the author of
> [1]).
>
> The bug is as follows: in the main game loop, although the original
> paper says that the two generated numbers x and y are geometrically
> distributed, they are actually geometrically distributed MINUS ONE. In
> other words, the SNAP implementation allows 0 to be chosen for the
> number of links to burn at *each iteration* with the highest
> probability, rather than 1. This means that fires burn *much* longer
> than they are intended to, and the resultant graphs are much denser
> than intended.
>
> I have verified that the simple "-1" factor generates graphs with the
> same quantitative trends as in the paper [1].
>
> The patch is as follows to the trunk of igraph-0.6: lines 199 and 200
> of src/forestfire.c should be changed from:
>
> long int neis_out=RNG_GEOM(param_geom_out);
> long int neis_in=RNG_GEOM(param_geom_in);
>
> to:
>
> long int neis_out=RNG_GEOM(param_geom_out) - 1;
> long int neis_in=RNG_GEOM(param_geom_in) - 1;
>
> For verification, the corresponding lines in the SNAP graph library
> source are lines 110 and 128 in snap/ff.cpp
>
> References:
> [1] "Graphs over time: densification laws, shrinking diameters and
> possible explanations", Leskovec et al, TKDD 2007.
>
> Also:
> The igraph implementation appears to be using a Poisson approximation to
> a binomial distribution instead of a geometric variate. Any reason for doing
> this instead of generating a random geometric variate directly?
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/igraph/+bug/715536/+subscribe
>

Revision history for this message
Gábor Csárdi (gabor.csardi) wrote : Continue on github

The development of igraph has moved to github, so please do not comment on this bug here. You are of course welcome to comment on github, here:
https://github.com/igraph/igraph/issues/212

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.