Bug #715536 “Forest Fire game has crucial error” : Bugs : igraph

Mayank Lahiri (mlahiri) on 2011-02-09

description:

updated

Revision history for this message

Gábor Csárdi (gabor.csardi) wrote on 2011-02-09:

#1

Thanks for pointing this out. At the time the paper came out, SNAP did not exist (or at least I didn't know about it) and I remember that I had could not manage to replicate the exact results, but since the algorithm was correctly implemented, as in the paper, and the results were qualitatively more or less OK (for slightly different parameters), there wasn't much I could do.

Actually the algorithm is not even the same in the different versions of the paper, see these:
http://cs.stanford.edu/people/jure/pubs/powergrowth-kdd05.pdf
http://www.cs.cornell.edu/home/kleinber/kdd05-time.pdf
and I remember a third version that I cannot find now. The version I implemented gave more or less OK results.

I'll correct this, and add an argument that chooses between the two versions.

As for the Poisson approximation, the code is coming from the R internals and based on this book:
Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.

Changed in igraph:
importance:	Undecided → Medium
status:	New → Confirmed
assignee:	nobody → Gábor Csárdi (gabor.csardi)

Revision history for this message

Mayank Lahiri (mlahiri) wrote on 2011-02-09: Re: [Bug 715536] Re: Forest Fire game has crucial error

#2

Download full text (4.7 KiB)

Thank you for the quick response. I've had some trouble implementing the
algorithm from the paper's description as well.

I just noticed one more thing about the random variates: the KDD05 paper
says that the two variates are drawn from a binomial distribution (which
explains the Poisson approximation), but the TKDD07 journal paper says that
the two variates are drawn from a geometric distribution. The SNAP code uses
the geometric distribution, and in fact a method from the same Devroye book
you mentioned, which is essentially this:

geom(p):
    if(p == 1.0)
        return 1;
    return ceil(log(1.0 - runif()) / log(1.0-p));

where runif() is uniform random in [0,1) and p is in (0,1].

If you're putting in a separate option for forest fire, perhaps this could
be included as a change as well. I was only able to reproduce results from
the paper with the "-1" factor and the geometric variates as above.

The net result of the geometric distribution and the lack of "-1" in the
paper's description is that whenever a new node can link to at least one new
vertex during each stage of a forest fire, it will.

Thank you again for all the effort put into developing igraph. We have been
using it at our lab for the last 4 years and it has been wonderful,
especially the choice of C/Python/R interfaces.

-Mayank

2011/2/8 Gábor Csárdi <email address hidden>

> Thanks for pointing this out. At the time the paper came out, SNAP did
> not exist (or at least I didn't know about it) and I remember that I had
> could not manage to replicate the exact results, but since the algorithm
> was correctly implemented, as in the paper, and the results were
> qualitatively more or less OK (for slightly different parameters), there
> wasn't much I could do.
>
> Actually the algorithm is not even the same in the different versions of
> the paper, see these:
> http://cs.stanford.edu/people/jure/pubs/powergrowth-kdd05.pdf
> http://www.cs.cornell.edu/home/kleinber/kdd05-time.pdf
> and I remember a third version that I cannot find now. The version I
> implemented gave more or less OK results.
>
> I'll correct this, and add an argument that chooses between the two
> versions.
>
> As for the Poisson approximation, the code is coming from the R internals
> and based on this book:
> Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag,
> New York. Page 480.
>
> ** Changed in: igraph
> Importance: Undecided => Medium
>
> ** Changed in: igraph
> Status: New => Confirmed
>
> ** Changed in: igraph
> Assignee: (unassigned) => Gábor Csárdi (gabor.csardi)
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/715536
>
> Title:
> Forest Fire game has crucial error
>
> Status in The igraph library:
> Confirmed
>
> Bug description:
> The current implementation of the Forest Fire network generation game
> has a subtle bug that results in graphs that are *significantly*
> denser than what the original research paper proposed. I found it
> impossible to reproduce the results in the research paper [1], so I
> compared the igraph implementation with the one in the SNAP network
> l...

Thank you for the quick response. I've had some trouble implementing the
algorithm from the paper's description as well.

I just noticed one more thing about the random variates: the KDD05 paper
says that the two variates are drawn from a binomial distribution (which
explains the Poisson approximation), but the TKDD07 journal paper says that
the two variates are drawn from a geometric distribution. The SNAP code uses
the geometric distribution, and in fact a method from the same Devroye book
you mentioned, which is essentially this:

geom(p):
    if(p == 1.0)
        return 1;
    return ceil(log(1.0 - runif()) / log(1.0-p));

where runif() is uniform random in [0,1) and p is in (0,1].

If you're putting in a separate option for forest fire, perhaps this could
be included as a change as well. I was only able to reproduce results from
the paper with the "-1" factor and the geometric variates as above.

The net result of the geometric distribution and the lack of "-1" in the
paper's description is that whenever a new node can link to at least one new
vertex during each stage of a forest fire, it will.

Thank you again for all the effort put into developing igraph. We have been
using it at our lab for the last 4 years and it has been wonderful,
especially the choice of C/Python/R interfaces.

-Mayank

2011/2/8 Gábor Csárdi <csardi.gabor@gmail.com>

> Thanks for pointing this out. At the time the paper came out, SNAP did
> not exist (or at least I didn't know about it) and I remember that I had
> could not manage to replicate the exact results, but since the algorithm
> was correctly implemented, as in the paper, and the results were
> qualitatively more or less OK (for slightly different parameters), there
> wasn't much I could do.
>
> Actually the algorithm is not even the same in the different versions of
> the paper, see these:
> http://cs.stanford.edu/people/jure/pubs/powergrowth-kdd05.pdf
> http://www.cs.cornell.edu/home/kleinber/kdd05-time.pdf
> and I remember a third version that I cannot find now. The version I
> implemented gave more or less OK results.
>
> I'll correct this, and add an argument that chooses between the two
> versions.
>
> As for the Poisson approximation, the code is coming from the R internals
> and based on this book:
> Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag,
> New York. Page 480.
>
> ** Changed in: igraph
>   Importance: Undecided => Medium
>
> ** Changed in: igraph
>       Status: New => Confirmed
>
> ** Changed in: igraph
>     Assignee: (unassigned) => Gábor Csárdi (gabor.csardi)
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/715536
>
> Title:
>  Forest Fire game has crucial error
>
> Status in The igraph library:
>   Confirmed
>
> Bug description:
>  The current implementation of the Forest Fire network generation game
>  has a subtle bug that results in graphs that are *significantly*
>  denser than what the original research paper proposed. I found it
>  impossible to reproduce the results in the research paper [1], so I
>  compared the igraph implementation with the one in the SNAP network
>  library (which contains a version written presumably by the author of
>  [1]).
>
>  The bug is as follows: in the main game loop, although the original
>  paper says that the two generated numbers x and y are geometrically
>  distributed, they are actually geometrically distributed MINUS ONE. In
>  other words, the SNAP implementation allows 0 to be chosen for the
>  number of links to burn at *each iteration* with the highest
>  probability, rather than 1. This means that fires burn *much* longer
>  than they are intended to, and the resultant graphs are much denser
>  than intended.
>
>  I have verified that the simple "-1" factor generates graphs with the
>  same quantitative trends as in the paper [1].
>
>  The patch is as follows to the trunk of igraph-0.6: lines 199 and 200
>  of src/forestfire.c should be changed from:
>
>  long int neis_out=RNG_GEOM(param_geom_out);
>  long int neis_in=RNG_GEOM(param_geom_in);
>
>  to:
>
>  long int neis_out=RNG_GEOM(param_geom_out) - 1;
>  long int neis_in=RNG_GEOM(param_geom_in) - 1;
>
>  For verification, the corresponding lines in the SNAP graph library
>  source are lines 110 and 128 in snap/ff.cpp
>
>  References:
>  [1] "Graphs over time: densification laws, shrinking diameters and
> possible explanations", Leskovec et al, TKDD 2007.
>
>  Also:
>   The igraph implementation appears to be using a Poisson approximation to
> a binomial distribution instead of a geometric variate. Any reason for doing
> this instead of generating a random geometric variate directly?
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/igraph/+bug/715536/+subscribe
>

Revision history for this message

Gábor Csárdi (gabor.csardi) wrote on 2011-02-13:

#3

It turns out, something else was the problem. See http://en.wikipedia.org/wiki/Geometric_distribution and the two kinds of geometric distribution. Your RNG samples from the k=1,2,3,... kind, the one in igraph samples from the k=0,1,2,3,... kind. So there is no need to subtract one.

There was, however a bug in the code, that I have fixed in revision #2335 (0.6-main).

Thanks again for the report.

Changed in igraph:
status:	Confirmed → Fix Released

Revision history for this message

Mayank Lahiri (mlahiri) wrote on 2011-02-13:

#4

Thanks, that seems to have done the trick!
-Mayank

2011/2/13 Gábor Csárdi <email address hidden>

> It turns out, something else was the problem. See
> http://en.wikipedia.org/wiki/Geometric_distribution and the two kinds of
> geometric distribution. Your RNG samples from the k=1,2,3,... kind, the
> one in igraph samples from the k=0,1,2,3,... kind. So there is no need
> to subtract one.
>
> There was, however a bug in the code, that I have fixed in revision
> #2335 (0.6-main).
>
> Thanks again for the report.
>
> ** Changed in: igraph
> Status: Confirmed => Fix Released
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/715536
>
> Title:
> Forest Fire game has crucial error
>
> Status in The igraph library:
> Fix Released
>
> Bug description:
> The current implementation of the Forest Fire network generation game
> has a subtle bug that results in graphs that are *significantly*
> denser than what the original research paper proposed. I found it
> impossible to reproduce the results in the research paper [1], so I
> compared the igraph implementation with the one in the SNAP network
> library (which contains a version written presumably by the author of
> [1]).
>
> The bug is as follows: in the main game loop, although the original
> paper says that the two generated numbers x and y are geometrically
> distributed, they are actually geometrically distributed MINUS ONE. In
> other words, the SNAP implementation allows 0 to be chosen for the
> number of links to burn at *each iteration* with the highest
> probability, rather than 1. This means that fires burn *much* longer
> than they are intended to, and the resultant graphs are much denser
> than intended.
>
> I have verified that the simple "-1" factor generates graphs with the
> same quantitative trends as in the paper [1].
>
> The patch is as follows to the trunk of igraph-0.6: lines 199 and 200
> of src/forestfire.c should be changed from:
>
> long int neis_out=RNG_GEOM(param_geom_out);
> long int neis_in=RNG_GEOM(param_geom_in);
>
> to:
>
> long int neis_out=RNG_GEOM(param_geom_out) - 1;
> long int neis_in=RNG_GEOM(param_geom_in) - 1;
>
> For verification, the corresponding lines in the SNAP graph library
> source are lines 110 and 128 in snap/ff.cpp
>
> References:
> [1] "Graphs over time: densification laws, shrinking diameters and
> possible explanations", Leskovec et al, TKDD 2007.
>
> Also:
> The igraph implementation appears to be using a Poisson approximation to
> a binomial distribution instead of a geometric variate. Any reason for doing
> this instead of generating a random geometric variate directly?
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/igraph/+bug/715536/+subscribe
>

Thanks, that seems to have done the trick!
-Mayank

2011/2/13 Gábor Csárdi <csardi.gabor@gmail.com>

> It turns out, something else was the problem. See
> http://en.wikipedia.org/wiki/Geometric_distribution and the two kinds of
> geometric distribution. Your RNG samples from the k=1,2,3,... kind, the
> one in igraph samples from the k=0,1,2,3,... kind. So there is no need
> to subtract one.
>
> There was, however a bug in the code, that I have fixed in revision
> #2335 (0.6-main).
>
> Thanks again for the report.
>
> ** Changed in: igraph
>       Status: Confirmed => Fix Released
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/715536
>
> Title:
>  Forest Fire game has crucial error
>
> Status in The igraph library:
>   Fix Released
>
> Bug description:
>  The current implementation of the Forest Fire network generation game
>  has a subtle bug that results in graphs that are *significantly*
>  denser than what the original research paper proposed. I found it
>  impossible to reproduce the results in the research paper [1], so I
>  compared the igraph implementation with the one in the SNAP network
>  library (which contains a version written presumably by the author of
>  [1]).
>
>  The bug is as follows: in the main game loop, although the original
>  paper says that the two generated numbers x and y are geometrically
>  distributed, they are actually geometrically distributed MINUS ONE. In
>  other words, the SNAP implementation allows 0 to be chosen for the
>  number of links to burn at *each iteration* with the highest
>  probability, rather than 1. This means that fires burn *much* longer
>  than they are intended to, and the resultant graphs are much denser
>  than intended.
>
>  I have verified that the simple "-1" factor generates graphs with the
>  same quantitative trends as in the paper [1].
>
>  The patch is as follows to the trunk of igraph-0.6: lines 199 and 200
>  of src/forestfire.c should be changed from:
>
>  long int neis_out=RNG_GEOM(param_geom_out);
>  long int neis_in=RNG_GEOM(param_geom_in);
>
>  to:
>
>  long int neis_out=RNG_GEOM(param_geom_out) - 1;
>  long int neis_in=RNG_GEOM(param_geom_in) - 1;
>
>  For verification, the corresponding lines in the SNAP graph library
>  source are lines 110 and 128 in snap/ff.cpp
>
>  References:
>  [1] "Graphs over time: densification laws, shrinking diameters and
> possible explanations", Leskovec et al, TKDD 2007.
>
>  Also:
>   The igraph implementation appears to be using a Poisson approximation to
> a binomial distribution instead of a geometric variate. Any reason for doing
> this instead of generating a random geometric variate directly?
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/igraph/+bug/715536/+subscribe
>

Revision history for this message

Gábor Csárdi (gabor.csardi) wrote on 2013-05-06: Continue on github

#5

The development of igraph has moved to github, so please do not comment on this bug here. You are of course welcome to comment on github, here:
https://github.com/igraph/igraph/issues/212

igraph

Forest Fire game has crucial error

Bug Description

Related branches

Other bug subscribers

Remote bug watches