LinearOperator does not work in parallel
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
DOLFIN |
Fix Released
|
Medium
|
Anders Logg |
Bug Description
LinearOperator is not written to work in parallel. It does not respect the parallel layout of matrices and vectors. This leads to crashes.
The unit test has been disabled in parallel.
Changed in dolfin: | |
assignee: | nobody → Anders Logg (logg) |
Anders Logg (logg) wrote : Re: [Bug 1088175] [NEW] LinearOperator does not work in parallel | #1 |
Garth Wells (garth-wells) wrote : | #2 |
On Sun, Dec 9, 2012 at 1:20 PM, Anders Logg <email address hidden> wrote:
> I can take a look but object to the description. It is indeed written
> to work in parallel. It has also been tested to work in parallel
> before.
>
Tested where?
It's clear from the code that it will only work in parallel for very
special cases.
Garth
> --
> Anders
>
>
> On Sun, Dec 09, 2012 at 01:00:59PM -0000, Garth Wells wrote:
>> Public bug reported:
>>
>> LinearOperator is not written to work in parallel. It does not respect
>> the parallel layout of matrices and vectors. This leads to crashes.
>>
>> The unit test has been disabled in parallel.
>>
>> ** Affects: dolfin
>> Importance: Undecided
>> Assignee: Anders Logg (logg)
>> Status: New
>>
>> ** Changed in: dolfin
>> Assignee: (unassigned) => Anders Logg (logg)
>>
>
> --
> You received this bug notification because you are a member of DOLFIN
> Core Team, which is subscribed to DOLFIN.
> https:/
>
> Title:
> LinearOperator does not work in parallel
>
> To manage notifications about this bug go to:
> https:/
Anders Logg (logg) wrote : | #3 |
On Sun, Dec 09, 2012 at 01:27:50PM -0000, Garth Wells wrote:
> On Sun, Dec 9, 2012 at 1:20 PM, Anders Logg <email address hidden> wrote:
> > I can take a look but object to the description. It is indeed written
> > to work in parallel. It has also been tested to work in parallel
> > before.
> >
>
> Tested where?
I thought the unit test was running fine on the buildbots but I see
now it was never added to the list of tests (in the main test.py).
--
Anders
> It's clear from the code that it will only work in parallel for very
> special cases.
> Garth
>
> >
> >
> > On Sun, Dec 09, 2012 at 01:00:59PM -0000, Garth Wells wrote:
> >> Public bug reported:
> >>
> >> LinearOperator is not written to work in parallel. It does not respect
> >> the parallel layout of matrices and vectors. This leads to crashes.
> >>
> >> The unit test has been disabled in parallel.
> >>
> >> ** Affects: dolfin
> >> Importance: Undecided
> >> Assignee: Anders Logg (logg)
> >> Status: New
> >>
> >> ** Changed in: dolfin
> >> Assignee: (unassigned) => Anders Logg (logg)
> >>
> >
> >
> > Title:
> > LinearOperator does not work in parallel
> >
> > To manage notifications about this bug go to:
> > https:/
>
Garth Wells (garth-wells) wrote : | #4 |
On Sun, Dec 9, 2012 at 2:01 PM, Anders Logg <email address hidden> wrote:
> On Sun, Dec 09, 2012 at 01:27:50PM -0000, Garth Wells wrote:
>> On Sun, Dec 9, 2012 at 1:20 PM, Anders Logg <email address hidden> wrote:
>> > I can take a look but object to the description. It is indeed written
>> > to work in parallel. It has also been tested to work in parallel
>> > before.
>> >
>>
>> Tested where?
>
> I thought the unit test was running fine on the buildbots but I see
> now it was never added to the list of tests (in the main test.py).
>
The unit test is a special case for which it may run on 3 processes
because the number of dofs is divisible by 3. In general, the parallel
layouts will not match.
Garth
> --
> Anders
>
>> It's clear from the code that it will only work in parallel for very
>> special cases.
>> Garth
>>
>> >
>> >
>> > On Sun, Dec 09, 2012 at 01:00:59PM -0000, Garth Wells wrote:
>> >> Public bug reported:
>> >>
>> >> LinearOperator is not written to work in parallel. It does not respect
>> >> the parallel layout of matrices and vectors. This leads to crashes.
>> >>
>> >> The unit test has been disabled in parallel.
>> >>
>> >> ** Affects: dolfin
>> >> Importance: Undecided
>> >> Assignee: Anders Logg (logg)
>> >> Status: New
>> >>
>> >> ** Changed in: dolfin
>> >> Assignee: (unassigned) => Anders Logg (logg)
>> >>
>> >
>> >
>> > Title:
>> > LinearOperator does not work in parallel
>> >
>> > To manage notifications about this bug go to:
>> > https:/
>>
>
> --
> You received this bug notification because you are a member of DOLFIN
> Core Team, which is subscribed to DOLFIN.
> https:/
>
> Title:
> LinearOperator does not work in parallel
>
> To manage notifications about this bug go to:
> https:/
Anders Logg (logg) wrote : | #5 |
On Sun, Dec 09, 2012 at 02:32:19PM -0000, Garth Wells wrote:
> On Sun, Dec 9, 2012 at 2:01 PM, Anders Logg <email address hidden> wrote:
> > On Sun, Dec 09, 2012 at 01:27:50PM -0000, Garth Wells wrote:
> >> On Sun, Dec 9, 2012 at 1:20 PM, Anders Logg <email address hidden> wrote:
> >> > I can take a look but object to the description. It is indeed written
> >> > to work in parallel. It has also been tested to work in parallel
> >> > before.
> >> >
> >>
> >> Tested where?
> >
> > I thought the unit test was running fine on the buildbots but I see
> > now it was never added to the list of tests (in the main test.py).
> >
>
> The unit test is a special case for which it may run on 3 processes
> because the number of dofs is divisible by 3. In general, the parallel
> layouts will not match.
It doesn't even work for that case. So what happened is that I thought
the test was run on the buildbot and therefore worked in parallel. I
see now why it doesn't work but it should be relatively easy to fix.
--
Anders
> Garth
>
> >
> >> It's clear from the code that it will only work in parallel for very
> >> special cases.
> >> Garth
> >>
> >> >
> >> >
> >> > On Sun, Dec 09, 2012 at 01:00:59PM -0000, Garth Wells wrote:
> >> >> Public bug reported:
> >> >>
> >> >> LinearOperator is not written to work in parallel. It does not respect
> >> >> the parallel layout of matrices and vectors. This leads to crashes.
> >> >>
> >> >> The unit test has been disabled in parallel.
> >> >>
> >> >> ** Affects: dolfin
> >> >> Importance: Undecided
> >> >> Assignee: Anders Logg (logg)
> >> >> Status: New
> >> >>
> >> >> ** Changed in: dolfin
> >> >> Assignee: (unassigned) => Anders Logg (logg)
> >> >>
> >> >
> >> >
> >> > Title:
> >> > LinearOperator does not work in parallel
> >> >
> >> > To manage notifications about this bug go to:
> >> > https:/
> >>
> >
> >
> > Title:
> > LinearOperator does not work in parallel
> >
> > To manage notifications about this bug go to:
> > https:/
>
Anders Logg (logg) wrote : | #6 |
On Sun, Dec 09, 2012 at 04:02:03PM +0100, Anders Logg wrote:
> On Sun, Dec 09, 2012 at 02:32:19PM -0000, Garth Wells wrote:
> > On Sun, Dec 9, 2012 at 2:01 PM, Anders Logg <email address hidden> wrote:
> > > On Sun, Dec 09, 2012 at 01:27:50PM -0000, Garth Wells wrote:
> > >> On Sun, Dec 9, 2012 at 1:20 PM, Anders Logg <email address hidden> wrote:
> > >> > I can take a look but object to the description. It is indeed written
> > >> > to work in parallel. It has also been tested to work in parallel
> > >> > before.
> > >> >
> > >>
> > >> Tested where?
> > >
> > > I thought the unit test was running fine on the buildbots but I see
> > > now it was never added to the list of tests (in the main test.py).
> > >
> >
> > The unit test is a special case for which it may run on 3 processes
> > because the number of dofs is divisible by 3. In general, the parallel
> > layouts will not match.
>
> It doesn't even work for that case. So what happened is that I thought
> the test was run on the buildbot and therefore worked in parallel. I
> see now why it doesn't work but it should be relatively easy to fix.
I've made an attempt to fix this but it still crashes. I'm
initializing the local range for the mat shell matrix using the same
local range as for the solution vector:
// Get local range
std::size_t m_local = M;
std::size_t n_local = N;
if (MPI::num_
{
std:
m_local = local_range.first;
n_local = local_range.second;
}
// Initialize PETSc matrix
A.reset(new Mat, PETScMatrixDele
MatCreateShel
MatShellSetOp
What am I missing? Does the solution vector not have the same parallel
layout as the right-hand side b (which gets multiplied by the matrix
in the Krylov solver)?
--
Anders
Garth Wells (garth-wells) wrote : | #7 |
On Tue, Dec 11, 2012 at 11:21 PM, Anders Logg <email address hidden> wrote:
> On Sun, Dec 09, 2012 at 04:02:03PM +0100, Anders Logg wrote:
>> On Sun, Dec 09, 2012 at 02:32:19PM -0000, Garth Wells wrote:
>> > On Sun, Dec 9, 2012 at 2:01 PM, Anders Logg <email address hidden> wrote:
>> > > On Sun, Dec 09, 2012 at 01:27:50PM -0000, Garth Wells wrote:
>> > >> On Sun, Dec 9, 2012 at 1:20 PM, Anders Logg <email address hidden> wrote:
>> > >> > I can take a look but object to the description. It is indeed written
>> > >> > to work in parallel. It has also been tested to work in parallel
>> > >> > before.
>> > >> >
>> > >>
>> > >> Tested where?
>> > >
>> > > I thought the unit test was running fine on the buildbots but I see
>> > > now it was never added to the list of tests (in the main test.py).
>> > >
>> >
>> > The unit test is a special case for which it may run on 3 processes
>> > because the number of dofs is divisible by 3. In general, the parallel
>> > layouts will not match.
>>
>> It doesn't even work for that case. So what happened is that I thought
>> the test was run on the buildbot and therefore worked in parallel. I
>> see now why it doesn't work but it should be relatively easy to fix.
>
> I've made an attempt to fix this but it still crashes.
If you're using test/unit/
b = Vector(V.dim())
will certainly cause a problem. We have the function
GenericMatrix:
vector consistently with a matrix operator,
Garth
> I'm
> initializing the local range for the mat shell matrix using the same
> local range as for the solution vector:
>
> // Get local range
> std::size_t m_local = M;
> std::size_t n_local = N;
> if (MPI::num_
> {
> std::pair<
> m_local = local_range.first;
> n_local = local_range.second;
> }
>
> // Initialize PETSc matrix
> A.reset(new Mat, PETScMatrixDele
> MatCreateShell(
> MatShellSetOper
>
> What am I missing? Does the solution vector not have the same parallel
> layout as the right-hand side b (which gets multiplied by the matrix
> in the Krylov solver)?
>
> --
> Anders
>
> --
> You received this bug notification because you are a member of DOLFIN
> Core Team, which is subscribed to DOLFIN.
> https:/
>
> Title:
> LinearOperator does not work in parallel
>
> To manage notifications about this bug go to:
> https:/
Anders Logg (logg) wrote : | #8 |
On Wed, Dec 12, 2012 at 08:58:26AM -0000, Garth Wells wrote:
> On Tue, Dec 11, 2012 at 11:21 PM, Anders Logg <email address hidden> wrote:
> > On Sun, Dec 09, 2012 at 04:02:03PM +0100, Anders Logg wrote:
> >> On Sun, Dec 09, 2012 at 02:32:19PM -0000, Garth Wells wrote:
> >> > On Sun, Dec 9, 2012 at 2:01 PM, Anders Logg <email address hidden> wrote:
> >> > > On Sun, Dec 09, 2012 at 01:27:50PM -0000, Garth Wells wrote:
> >> > >> On Sun, Dec 9, 2012 at 1:20 PM, Anders Logg <email address hidden> wrote:
> >> > >> > I can take a look but object to the description. It is indeed written
> >> > >> > to work in parallel. It has also been tested to work in parallel
> >> > >> > before.
> >> > >> >
> >> > >>
> >> > >> Tested where?
> >> > >
> >> > > I thought the unit test was running fine on the buildbots but I see
> >> > > now it was never added to the list of tests (in the main test.py).
> >> > >
> >> >
> >> > The unit test is a special case for which it may run on 3 processes
> >> > because the number of dofs is divisible by 3. In general, the parallel
> >> > layouts will not match.
> >>
> >> It doesn't even work for that case. So what happened is that I thought
> >> the test was run on the buildbot and therefore worked in parallel. I
> >> see now why it doesn't work but it should be relatively easy to fix.
> >
> > I've made an attempt to fix this but it still crashes.
>
> If you're using test/unit/
>
> b = Vector(V.dim())
>
> will certainly cause a problem. We have the function
> GenericMatrix:
> vector consistently with a matrix operator,
Yes, that line was obviously wrong. I've replaced it with a proper
right-hand side now in the unit test but it still breaks.
It looks like I need to initialize the local rows and columns separately:
http://
The local rows are determined by the result vector y and the local
columns by the vector x in y = Ax. But in a Krylov solve, the result
vector y will again be used to multiply the matrix, so it looks to me
that it needs to be the same?
--
Anders
Anders Logg (logg) wrote : | #9 |
On Wed, Dec 12, 2012 at 10:39:37AM +0100, Anders Logg wrote:
> On Wed, Dec 12, 2012 at 08:58:26AM -0000, Garth Wells wrote:
> > On Tue, Dec 11, 2012 at 11:21 PM, Anders Logg <email address hidden> wrote:
> > > On Sun, Dec 09, 2012 at 04:02:03PM +0100, Anders Logg wrote:
> > >> On Sun, Dec 09, 2012 at 02:32:19PM -0000, Garth Wells wrote:
> > >> > On Sun, Dec 9, 2012 at 2:01 PM, Anders Logg <email address hidden> wrote:
> > >> > > On Sun, Dec 09, 2012 at 01:27:50PM -0000, Garth Wells wrote:
> > >> > >> On Sun, Dec 9, 2012 at 1:20 PM, Anders Logg <email address hidden> wrote:
> > >> > >> > I can take a look but object to the description. It is indeed written
> > >> > >> > to work in parallel. It has also been tested to work in parallel
> > >> > >> > before.
> > >> > >> >
> > >> > >>
> > >> > >> Tested where?
> > >> > >
> > >> > > I thought the unit test was running fine on the buildbots but I see
> > >> > > now it was never added to the list of tests (in the main test.py).
> > >> > >
> > >> >
> > >> > The unit test is a special case for which it may run on 3 processes
> > >> > because the number of dofs is divisible by 3. In general, the parallel
> > >> > layouts will not match.
> > >>
> > >> It doesn't even work for that case. So what happened is that I thought
> > >> the test was run on the buildbot and therefore worked in parallel. I
> > >> see now why it doesn't work but it should be relatively easy to fix.
> > >
> > > I've made an attempt to fix this but it still crashes.
> >
> > If you're using test/unit/
> >
> > b = Vector(V.dim())
> >
> > will certainly cause a problem. We have the function
> > GenericMatrix:
> > vector consistently with a matrix operator,
>
> Yes, that line was obviously wrong. I've replaced it with a proper
> right-hand side now in the unit test but it still breaks.
>
> It looks like I need to initialize the local rows and columns separately:
>
> http://
>
> The local rows are determined by the result vector y and the local
> columns by the vector x in y = Ax. But in a Krylov solve, the result
> vector y will again be used to multiply the matrix, so it looks to me
> that it needs to be the same?
Works now. The problem was this...
m_local = local_range.first;
n_local = local_range.second;
:-)
Now replaced by this:
m_local = local_range_
n_local = local_range_
A LinearOperator must now be initialized with vectors x and y matching
the product y = Ax. In most cases, it will work fine to use the same
vector (vector b or u.vector()) since they will have the same parallel
layout.
--
Anders
Changed in dolfin: | |
status: | New → Fix Committed |
importance: | Undecided → Medium |
milestone: | none → 1.1.0 |
Changed in dolfin: | |
status: | Fix Committed → Fix Released |
I can take a look but object to the description. It is indeed written
to work in parallel. It has also been tested to work in parallel
before.
--
Anders
On Sun, Dec 09, 2012 at 01:00:59PM -0000, Garth Wells wrote:
> Public bug reported:
>
> LinearOperator is not written to work in parallel. It does not respect
> the parallel layout of matrices and vectors. This leads to crashes.
>
> The unit test has been disabled in parallel.
>
> ** Affects: dolfin
> Importance: Undecided
> Assignee: Anders Logg (logg)
> Status: New
>
> ** Changed in: dolfin
> Assignee: (unassigned) => Anders Logg (logg)
>