Missing diagonals after subdomain assembly

Bug #927539 reported by Joachim Haga
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
DOLFIN
Fix Released
Medium
Unassigned

Bug Description

The matrix produced by subdomain assembly seems to not (always) include diagonal entries. PETSc says:

[0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR: Object is in wrong state!
[0]PETSC ERROR: Matrix is missing diagonal entry in row 124!
[0]PETSC ERROR: ------------------------------------------------------------------------

I don't use this functionality, maybe this is fine ("user error"). But since PETSc apparently consider a matrix with missing diagonals an error, dolfin should probably not produce them.

Reproduction: Put this at the end of the method test_subdomain_assembly_meshdomains() in test/unit/fem/python/SystemAssembler.py. Notice that there is no boundary active, so the problem isn't just that we're missing a diagonal we actually need for the DirichletBC.

        A = assemble(a0)
        def boundary(x): return False
        bc = DirichletBC(V, Constant(0.0), boundary)
        bc.apply(A)

Related branches

Revision history for this message
Anders Logg (logg) wrote : Re: [Bug 927539] [NEW] Missing diagonals after subdomain assembly

I'm not sure this is a bug. There's no way we can prevent users from
assembling matrices with missing diagonal entries. Consider the
following example:

c = Expression("0.0")
A = assemble(Constant(c*dot(grad(u), grad(v))*dx)

That will give A = 0 which has all diagonal entries zero.

--
Anders

On Mon, Feb 06, 2012 at 10:47:33AM -0000, Joachim Haga wrote:
> Public bug reported:
>
> The matrix produced by subdomain assembly seems to not (always) include
> diagonal entries. PETSc says:
>
> [0]PETSC ERROR: --------------------- Error Message ------------------------------------
> [0]PETSC ERROR: Object is in wrong state!
> [0]PETSC ERROR: Matrix is missing diagonal entry in row 124!
> [0]PETSC ERROR: ------------------------------------------------------------------------
>
> I don't use this functionality, maybe this is fine ("user error"). But
> since PETSc apparently consider a matrix with missing diagonals an
> error, dolfin should probably not produce them.
>
> Reproduction: Put this at the end of the method
> test_subdomain_assembly_meshdomains() in
> test/unit/fem/python/SystemAssembler.py. Notice that there is no
> boundary active, so the problem isn't just that we're missing a diagonal
> we actually need for the DirichletBC.
>
> A = assemble(a0)
> def boundary(x): return False
> bc = DirichletBC(V, Constant(0.0), boundary)
> bc.apply(A)
>
> ** Affects: dolfin
> Importance: Undecided
> Status: New
>

Anders Logg (logg)
Changed in dolfin:
status: New → Invalid
Revision history for this message
Joachim Haga (jobh) wrote :

Your example doesn't make (syntactical) sense, but to be clear: There is no problem with 0.0 on the diagonal. The problem is missing diagonal entries, ie. diagonals that are not present in the sparsity pattern. And it should be possible to ensure that this doesn't happen, no?

Revision history for this message
Anders Logg (logg) wrote : Re: [Bug 927539] Re: Missing diagonals after subdomain assembly

On Mon, Feb 06, 2012 at 12:46:05PM -0000, Joachim Haga wrote:
> Your example doesn't make (syntactical) sense, but to be clear:

It should be

c = Expression("0.0")
A = assemble(c*dot(grad(u), grad(v))*dx)

> There is no problem with 0.0 on the diagonal. The problem is missing
> diagonal entries, ie. diagonals that are not present in the sparsity
> pattern. And it should be possible to ensure that this doesn't
> happen, no?

ok, now I see your point. Yes, it seems reasonable to ensure that we
always have an entry on the diagonal in the sparsity pattern. Possibly
in combination with a global option to turn it off.

--
Anders

Changed in dolfin:
status: Invalid → Opinion
Revision history for this message
Cian Wilson (cwilson) wrote :

I end up with no entry on my diagonal all the time when using assembly over subdomains. I've been using ident_zeros to get around this but find that on the first call it is painfully slow, which I suspect (though I haven't looked into it) is due to expanding the size of the sparsity after its initial allocation. If it were possible to have an option to leave the diagonal entries in the sparsity on the first assembly this would be great!

Anders Logg (logg)
Changed in dolfin:
importance: Undecided → Medium
milestone: none → 1.1.0
status: Opinion → Confirmed
Revision history for this message
Cian Wilson (cwilson) wrote :

This appears to have become a fatal (or at least very verbose) error in petsc-dev so I've locally implemented a very simple and naive implementation to get around it (see linked branch). Would appreciate some feedback on it to make it mergeable.

In particular:
 - this now happens automatically when using the standard assembler, should it be optional as suggested above and if so where would that option go (subroutine parameter or global)?
 - I've only modified the standard assembler and not the openmp assembler, should it go there too?
 - as this bug only affects the petsc backend should it only happen when using petsc?
 - any other comments/suggestions?

Run through of dolfin tests for linked branch available at:
http://diana.apam.columbia.edu:8010/builders/dolfin-x86-64-diag/builds/0/steps/shell_4/logs/stdio

Revision history for this message
Joachim Haga (jobh) wrote :

I haven't looked at the changes, just a comment. I don't think "dont_fail"
needs to be a parameter. ;-)
Den 7. mai 2012 16.41 skrev "Cian Wilson" <email address hidden>
følgende:

> This appears to have become a fatal (or at least very verbose) error in
> petsc-dev so I've locally implemented a very simple and naive
> implementation to get around it (see linked branch). Would appreciate
> some feedback on it to make it mergeable.
>
> In particular:
> - this now happens automatically when using the standard assembler,
> should it be optional as suggested above and if so where would that option
> go (subroutine parameter or global)?
> - I've only modified the standard assembler and not the openmp assembler,
> should it go there too?
> - as this bug only affects the petsc backend should it only happen when
> using petsc?
> - any other comments/suggestions?
>
> Run through of dolfin tests for linked branch available at:
>
> http://diana.apam.columbia.edu:8010/builders/dolfin-x86-64-diag/builds/0/steps/shell_4/logs/stdio
>
> --
> You received this bug notification because you are a member of DOLFIN
> Team, which is subscribed to DOLFIN.
> https://bugs.launchpad.net/bugs/927539
>
> Title:
> Missing diagonals after subdomain assembly
>
> Status in DOLFIN:
> Confirmed
>
> Bug description:
> The matrix produced by subdomain assembly seems to not (always)
> include diagonal entries. PETSc says:
>
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Object is in wrong state!
> [0]PETSC ERROR: Matrix is missing diagonal entry in row 124!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
>
> I don't use this functionality, maybe this is fine ("user error"). But
> since PETSc apparently consider a matrix with missing diagonals an
> error, dolfin should probably not produce them.
>
> Reproduction: Put this at the end of the method
> test_subdomain_assembly_meshdomains() in
> test/unit/fem/python/SystemAssembler.py. Notice that there is no
> boundary active, so the problem isn't just that we're missing a
> diagonal we actually need for the DirichletBC.
>
> A = assemble(a0)
> def boundary(x): return False
> bc = DirichletBC(V, Constant(0.0), boundary)
> bc.apply(A)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/dolfin/+bug/927539/+subscriptions
>

Revision history for this message
Cian Wilson (cwilson) wrote :
Revision history for this message
Cian Wilson (cwilson) wrote :

"dont_fail" would be a great parameter for every subroutine ;-) but no, that's not exactly what I was asking/suggesting.

The quick patch (now attached) I implemented to save myself from this locally just automatically puts 0.0s in the diagonal entries if you're assembling a square matrix to prevent these entries from being collapsed out of the sparsity at the end of assembly. I was asking whether this behaviour should be controlled by a parameter/option as perhaps there are situations where this isn't an appropriate thing to do.

Revision history for this message
Garth Wells (garth-wells) wrote :

I don't like the look of this. It's mixing a linear algebra issue
(PETSc bug?) with the assembler. Moreover, it will be horrendously
slow if the diagonal has not been allocated during the initialisation
of the matrix.

There is a suggestion in the thread to make the sparsity pattern
contain the diagonal entry, which seems a reasonable option, but there
is not answer on whether this solves the problem. Would it solve the
problem?

Garth

On 9 May 2012 06:05, Cian Wilson <email address hidden> wrote:
> ** Patch added: "diag.diff"
>   https://bugs.launchpad.net/dolfin/+bug/927539/+attachment/3137879/+files/diag.diff
>
> --
> You received this bug notification because you are a member of DOLFIN
> Core Team, which is subscribed to DOLFIN.
> https://bugs.launchpad.net/bugs/927539
>
> Title:
>  Missing diagonals after subdomain assembly
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/dolfin/+bug/927539/+subscriptions

Revision history for this message
Cian Wilson (cwilson) wrote :

Hi Garth,

> I don't like the look of this. It's mixing a linear algebra issue
> (PETSc bug?) with the assembler. Moreover, it will be horrendously
> slow if the diagonal has not been allocated during the initialisation
> of the matrix.
I agree. I was suggesting it more as a way of prompting better
suggestions as this has become a major issue for me with petsc-dev.

> There is a suggestion in the thread to make the sparsity pattern
> contain the diagonal entry, which seems a reasonable option, but there
> is not answer on whether this solves the problem. Would it solve the
> problem?
In all the examples I've looked at the sparsity returned by
AssemblerTools::init_global_tensor *does* contain the diagonal entry.
The problem arises when the tensor is finalized and entries in the
sparsity which haven't been used are collapsed out. Hence the petsc
error about the matrix being in the wrong state and missing the diagonal
and also why ident_zeros is painfully slow (and in petsc-dev produces
lots of very verbose errors).

Looking at the code again a possible alternative way to get around this
might be to set finalize_tensor to false and finalize the assembly
separately (after calling ident_zeros if you want 1s on the diagonal or
some other routine to set the zeros on the diagonal if not)? This would
at least separate out the linear algebra issue from the assembly and
require minimal code modification.

Let me know what I'm missing.

Cheers,
Cian

>
> Garth
>
> On 9 May 2012 06:05, Cian Wilson<email address hidden> wrote:
>> ** Patch added: "diag.diff"
>> https://bugs.launchpad.net/dolfin/+bug/927539/+attachment/3137879/+files/diag.diff
>>
>> --
>> You received this bug notification because you are a member of DOLFIN
>> Core Team, which is subscribed to DOLFIN.
>> https://bugs.launchpad.net/bugs/927539
>>
>> Title:
>> Missing diagonals after subdomain assembly
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/dolfin/+bug/927539/+subscriptions

Revision history for this message
Cian Wilson (cwilson) wrote :

Of course, what I was missing was that you can't call ident_zeros on an unfinalized matrix as it depends on MatGetRows with the petsc backend. So calling a routine that adds zeros to the diagonal is necessary before finalizing before calling ident_zeros.

Revision history for this message
Cian Wilson (cwilson) wrote :

Afraid this problem has arisen for me again. My previous solution depended on being able to not finalize the tensor then add the zeros in after the first assembly but before collapsing the sparsity. The addition of:
MatSetOption(*A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE);
in PetscMatrix::init in r6784-5 has meant that this is no longer possible.

I've had another go at implementing a fix (see lp:~ldeo-magma/dolfin/diagonalentries ) that includes the diagonal in the sparsity and sets zeros in the matrix to ensure they're not collapsed out. It's controlled by the parameter keep_diagonal, which can be passed into any assembly routine (that calls init_global_tensor) but defaults to the current behaviour (false). This is all based on the changes to incorporate periodic bcs into the sparsity.

Cheers,
Cian

Revision history for this message
Garth Wells (garth-wells) wrote :

On 22 July 2012 01:14, Cian Wilson <email address hidden> wrote:
> Afraid this problem has arisen for me again. My previous solution depended on being able to not finalize the tensor then add the zeros in after the first assembly but before collapsing the sparsity.

Did you try using the setting the assembler argument

   bool finalize_tensor

to false? This would only work for PETSc anyway at the moment.

> The addition of:
> MatSetOption(*A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE);
> in PetscMatrix::init in r6784-5 has meant that this is no longer possible.
>

I set this since it has become the default in PETSc 3.3. It also helps
catch issues that can have a major impact on efficiency.

> I've had another go at implementing a fix (see lp:~ldeo-
> magma/dolfin/diagonalentries ) that includes the diagonal in the
> sparsity and sets zeros in the matrix to ensure they're not collapsed
> out. It's controlled by the parameter keep_diagonal, which can be
> passed into any assembly routine (that calls init_global_tensor) but
> defaults to the current behaviour (false). This is all based on the
> changes to incorporate periodic bcs into the sparsity.
>

Thanks. I'll take a look shortly.

Garth

> Cheers,
> Cian
>
> ** Branch unlinked: lp:~ldeo-magma/dolfin/diag
>
> ** Branch linked: lp:~ldeo-magma/dolfin/diagonalentries
>
> --
> You received this bug notification because you are a member of DOLFIN
> Core Team, which is subscribed to DOLFIN.
> https://bugs.launchpad.net/bugs/927539
>
> Title:
> Missing diagonals after subdomain assembly
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/dolfin/+bug/927539/+subscriptions

Revision history for this message
Cian Wilson (cwilson) wrote :

On 07/23/2012 07:00 AM, Garth Wells wrote:
> On 22 July 2012 01:14, Cian Wilson <email address hidden> wrote:
>> Afraid this problem has arisen for me again. My previous solution depended on being able to not finalize the tensor then add the zeros in after the first assembly but before collapsing the sparsity.
> Did you try using the setting the assembler argument
>
> bool finalize_tensor
>
> to false? This would only work for PETSc anyway at the moment.
Yes, this was essentially my previous strategy. Assemble with
finalize_tensor set to false then add zeros on the diagonal then
finalize the tensor myself. This now throws an error when adding the
zeros because they're not in the preallocated sparsity.

>> The addition of:
>> MatSetOption(*A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_TRUE);
>> in PetscMatrix::init in r6784-5 has meant that this is no longer possible.
>>
> I set this since it has become the default in PETSc 3.3. It also helps
> catch issues that can have a major impact on efficiency.
I agree with setting it. Also the changes to including periodic bcs in
the sparsity are great as I'd previously written off periodic bcs with
petsc > 3.3.

>
>> I've had another go at implementing a fix (see lp:~ldeo-
>> magma/dolfin/diagonalentries ) that includes the diagonal in the
>> sparsity and sets zeros in the matrix to ensure they're not collapsed
>> out. It's controlled by the parameter keep_diagonal, which can be
>> passed into any assembly routine (that calls init_global_tensor) but
>> defaults to the current behaviour (false). This is all based on the
>> changes to incorporate periodic bcs into the sparsity.
>>
> Thanks. I'll take a look shortly.
Many thanks,
Cian

>
> Garth
>
>> Cheers,
>> Cian
>>
>> ** Branch unlinked: lp:~ldeo-magma/dolfin/diag
>>
>> ** Branch linked: lp:~ldeo-magma/dolfin/diagonalentries
>>
>> --
>> You received this bug notification because you are a member of DOLFIN
>> Core Team, which is subscribed to DOLFIN.
>> https://bugs.launchpad.net/bugs/927539
>>
>> Title:
>> Missing diagonals after subdomain assembly
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/dolfin/+bug/927539/+subscriptions

Changed in dolfin:
status: Confirmed → Fix Committed
Changed in dolfin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.