Comment 15 for bug 966553

On 28 March 2012 20:39, Johan Hake <email address hidden> wrote:
> On 03/28/2012 02:02 PM, Garth Wells wrote:
>> On 28 March 2012 19:31, Johan Hake<email address hidden>  wrote:
>>> On 03/28/2012 12:31 PM, Garth Wells wrote:
>>>> On 28 March 2012 16:42, Johan Hake<email address hidden>    wrote:
>>>>> On 03/28/2012 10:26 AM, Anders Logg wrote:
>>>>>> On Wed, Mar 28, 2012 at 08:06:02AM -0000, Johan Hake wrote:
>>>>>>> On 03/28/2012 09:41 AM, Garth Wells wrote:
>>>>>>>> On 28 March 2012 14:03, Johan Hake<email address hidden>       wrote:
>>>>>>>>> This is because the mesh has gone out of scope and destroyed the
>>>>>>>>> coordinatedata which was wrapped to a NumPy array. The same happens if
>>>>>>>>> you do:
>>>>>>>>>
>>>>>>>>>      coord = UnitCube(1, 1, 1).coordinates()
>>>>>>>>>
>>>>>>>>> We can solve this by making adding a deepcopy argument to the
>>>>>>>>> coordinates (and the cells) methods, which defaults to True. This might
>>>>>>>>> break code as some (hmpfr me!) uses coordinates to move a mesh. Changing
>>>>>>>>> the coordinates when a deepcopy is done wont change the  coordinates on
>>>>>>>>> the mesh.
>>>>>>>>>
>>>>>>>>> Another way, is to add a reference of the mesh object to the NumPy
>>>>>>>>> array. Then the Mesh object wont go out of scope until the coordinate
>>>>>>>>> array does. The latter is more tricky to implement, but would be the
>>>>>>>>> best way to do it.
>>>>>>>>>
>>>>>>>>
>>>>>>>> We've seen before that keeping references to large objects will at
>>>>>>>> some point cause a memory leak.
>>>>>>>
>>>>>>> The problem with the previous reference storage, was that it was a
>>>>>>> global storage. My suggestions is a storage connected to an object. If
>>>>>>> an object returns data it owes it should make sure it does not destroy
>>>>>>> it for the lifetime of that data. We already do this for for example
>>>>>>> down_casting of objects.
>>>>
>>>> The underlying problem is the same, and I think a poor design. I would
>>>> not expect that a simple array object would keep a possibly very large
>>>> object in scope.
>>>
>>> The whole slicing interface of NumPy is built on this design.
>>>
>>>    a = np.array([some big list])
>>>    b = a[::2]
>>>
>>> b is here a view into a. Changing values in b will change values in a.
>>> If a is destroyed b is still working, as it knows it came from a and
>>> therefor keeps a reference to it. NumPy provides the interface, we need
>>> the functionality, why not use it?
>>>
>>
>> The above argument doesn't make sense. How is it relevant? A DOLFIN
>> Vector is essentially an array, with some small and constant in size
>> extra data.  A Mesh comes with numerous arrays, which would all be
>> kept in scope for the sake of one array.
>
> This has nothing to do with dolfin Vectors, but on cashing large objects
> to potentially small NumPy arrays. In the above example that large
> object is another NumPy array.
>
>>> The coordinates of a mesh is not a small array. Nor are other examples
>>> which are natural to talk about in this regard.
>>>
>>>    MeshFunction.array, GenericFoo.data, Mesh.cells, Mesh.coordinates
>>>
>>
>> Whether coordinates is small or not is irrelevant. The problem is that
>> the other objects associated with a Mesh are of the same order in
>> size.
>
> Not sure i see any problems here. Remember that the NumPy array we are
> talking about here is only a view into something. So nothing is really
> stored in that array and no duplication of data is done.
>

All the Mesh data is kept in scope, which is the problem. The point is
that we should keep the coordinate array, but not all the associated
Mesh data (connectivity, . . . . ).

>>>>>>>> I suggest that we
>>>>>>>>
>>>>>>>> 1. Have  one function that returns a copy of the coordinates and one
>>>>>>>> that provides view
>>>>>>>
>>>>>>> I tend to agree, but the question is what should be default. The most
>>>>>>> conservative would be to let a copy be default, but that would most
>>>>>>> probably break user code, as most people (hmpfr me) uses coordinates to
>>>>>>> move meshes.
>>>>>>
>>>>>> I think we should try hard to avoid that. Accessing coordinates() by
>>>>>> reference (not copy) has been the default behavior and we would break
>>>>>> user code, possibly also some code examples from the book.
>>>>>>
>>>>>>>> or
>>>>>>>>
>>>>>>>> 2. Make the coordinate data a boost::shared_array<double>       or a
>>>>>>>> boost::shared_ptr<std::vector<double>       >       object.
>>>>>>>
>>>>>>> This would introduce a somewhat middle solution, which is pretty
>>>>>>> cumbersome to implement and which need to be changed anyhow, when we
>>>>>>> revamp the mesh interface.
>>>>>>
>>>>
>>>> It should be easy to implement - I have already changed some plain
>>>> arrays to std::vector. Relying on the contiguous storage that a
>>>> std::vector provides, it's easy to interface with functions looking to
>>>> accept a plain pointer. The question is how it should work in the SWIG
>>>> layer.
>>>
>>> Exactly, and here NumPy provides an excellent functionality. For example
>>> if we a provide a shared_ptr version of std::vector we might be able to
>>> do things in the way you suggest, but then we are out of NumPy land.
>>>
>>
>> How are we out of NumPy land? We now wrap a plain array. If the user,
>> for example, changes the size we are in trouble. With a std::vector we
>> can wrap it just like a plain pointer.
>
> The point here is to wrap any contiguous data as a view in NumPy, so
> whenever you change the values in the NumPy array we also change the
> values in the underlying C++ land. This can be accomplished if we store
> the data in std::vector or as a plain C-array, by just wrapping the data
> to a NumPy array.
>
> However the problem of the original object going out of scope is still
> present. I interpreted your shared_ptr<std::vector> that such a solution
> might help us prevent such a destruction. However then we need to
> associate that object to the NumPy array so the counter on the
> shared_ptr can be decreased when the NumPy array goes out of scope.
>

A reference to the shared_ptr<std::vector> rather than the whole Mesh
object could be stored.

>>> A middle way might be to create a dummy Python object which holds a
>>> reference to a shared_ptr vector, which we then created a NumPy array
>>> around. This dummy object could then be attached to the NumPy array
>>> preventing the data to go out of scope, but I do not see how that would
>>> help us.
>>>
>>
>> It helps because all of the other arrays associated with a Mesh are
>> not unnecessarily kept in scope.
>
> Sure, I see that point, but what would a coordinate or a cell array be
> with out a mesh?
>

That's up to the user and not us. It is just an array, and that's how
it should be viewed.

Garth

> Johan
>
>> Garth
>>
>>> Johan
>>>
>>>>>> Is it up for revamping???
>>>>>
>>>>> :)
>>>>>
>>>>> You keep on talking about this. How on earth are we going to support
>>>>> coarsening, refinement, and parallel such if we do not make the storage
>>>>> of coordinates, and connectivities more flexible. And if that will be
>>>>> the case, providing a numpy array view to these entities need to be
>>>>> rethinked.
>>>>>
>>>>
>>>> Yes, but it does require some thinking.
>>>>
>>>> Garth
>>>>
>>>>> Johan
>>>>>
>>>>>> --
>>>>>> Anders
>>>>>>
>>>>>>
>>>>>>> I vote for adding a reference to the returned object, which happens to
>>>>>>> be the solution I just implemented and are now testing ;)
>>>>>>
>>>>>>
>>>>>>> Johan
>>>>>>>
>>>>>>>>
>>>>>>>> Garth
>>>>>>>>
>>>>>>>>> ** Changed in: dolfin
>>>>>>>>>           Status: Confirmed =>       In Progress
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Title:
>>>>>>>>>      BoundaryMesh.coordinates () gives random results
>>>>>>>>>
>>>>>>>>> To manage notifications about this bug go to:
>>>>>>>>> https://bugs.launchpad.net/dolfin/+bug/966553/+subscriptions
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> You received this bug notification because you are a member of DOLFIN
>>>>> Core Team, which is subscribed to DOLFIN.
>>>>> https://bugs.launchpad.net/bugs/966553
>>>>>
>>>>> Title:
>>>>>    BoundaryMesh.coordinates () gives random results
>>>>>
>>>>> To manage notifications about this bug go to:
>>>>> https://bugs.launchpad.net/dolfin/+bug/966553/+subscriptions
>>>>
>>>
>>> --
>>> You received this bug notification because you are a member of DOLFIN
>>> Core Team, which is subscribed to DOLFIN.
>>> https://bugs.launchpad.net/bugs/966553
>>>
>>> Title:
>>>   BoundaryMesh.coordinates () gives random results
>>>
>>> To manage notifications about this bug go to:
>>> https://bugs.launchpad.net/dolfin/+bug/966553/+subscriptions
>>
>
> --
> You received this bug notification because you are a member of DOLFIN
> Core Team, which is subscribed to DOLFIN.
> https://bugs.launchpad.net/bugs/966553
>
> Title:
>  BoundaryMesh.coordinates () gives random results
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/dolfin/+bug/966553/+subscriptions