Fluidity fails to compile with Intel 14

Bug #1046337 reported by Rhodri Davies on 2012-09-05
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fluidity
Medium
Fluidity Core Team

Bug Description

As discussed in today's dev meeting, I am opening a ticket relating to building fluidity with Intel 12. This ticket will be used to track the status of the build and also to keep everyone updated on various bug reports. I'd appreciate any help I can get on resolving this issue, as soon as possible. Sadly my skills with spotting (and understanding) compiler bugs are limited, to say the least.

Anyway,

Tim has been kind enough to set up a build test on cx1:

http://buildbot-ocean.ese.ic.ac.uk:8080/builders/compile_121

You will see from the latest build, that we currently fail in Fields_Allocates, with the following error:

Fields_Allocates.F90(2397): error #6780: A dummy argument with the INTENT(IN) attribute shall not be defined nor become undefined. [MESH]
      allocate(mesh%adj_lists%nnlist)

This is a compiler bug, since %adj_lists is a pointer within the 'mesh' object and we are only changing what it's pointing to (the allocation status of adj_lists%nnlist) and not the pointer itself. This is allowed in the fortran standard.

I will next report this bug to Matt Harvey (who will hopefully file a bug with intel). I'll also ask Matt to feedback anything from intel to this ticket. If anyone has anything they'd like to add please do so.

Rhodri Davies (rhodri-davies) wrote :

Further update on this. Matt Harvey has installed the latest ifort compiler on CX1 (13.0). Sadly the same bug remains. Will get Matt to prod Intel ASAP.

Changed in fluidity:
status: New → Confirmed
importance: Undecided → High
Rhodri Davies (rhodri-davies) wrote :

Stephan has written a short test to reproduce the bug. It's now with intel.

David Ham (david-ham) wrote :

We've hit this bug in Intel before and worked around it by changing the mesh argument to intent(inout). See allocate_scalar_field, for example.

Jon Hill (jon-hill) on 2013-02-28
Changed in fluidity:
importance: High → Medium
assignee: nobody → Fluidity Core Team (fluidity-core)
Rhodri Davies (rhodri-davies) wrote :

Dear AMCG (particularly Gerard and Michael),

Is there any chance you could add a detailed update on the status of building fluidity with intel to this ticket? The reason I ask, is that there is a chance I can get some support here from the NCI and I'm not keen to see duplication of efforts. How close are you guys to getting the issue resolved? What are the remaining issues? Is there anywhere you feel additional support would be beneficial? etc. etc...

Best wishes,

Rhod

summary: - Fluidity fails to compile with Intel 12
+ Fluidity fails to compile with Intel 13

Hi Rhodri

We have created a minimum set of sources to recreate the compiler failure and posted it on the intel forum:
http://software.intel.com/en-us/comment/1743703#comment-1743703
The good Steve Lionel et al are looking into it but it would probably do no harm to express your interest in this bug there. I think initially we would be just as happy with a work around as we would with a patched compiler which we'd have to wait for.

Cheers
Gerard

Rhodri Davies (rhodri-davies) wrote :

Brilliant. Thanks Gerard. I'll pass on the info to the NCI team here (http://nci.org.au/) and get them to jump on the bandwagon. If we put pressure from many angles, there's a chance things will get fixed sooner (at least that would be the hope). Please keep me updated on any future developments and I'll do the same from this end.

Rhod

Rhodri Davies (rhodri-davies) wrote :

Ok - I now have access to the latest intel compilers (14.0.080). Sadly, I think we're failing before we were on the previous version of intel (adjacency_lists.F90). Configured with --enable-debugging and --enable-2d-adaptivity. Make log attached but the key lines are as follows:

Adjacency_Lists.F90(875): error #6457: This derived type name has not been declared. [CSR_SPARSITY]
      type(csr_sparsity), intent(in) :: nelist
-----------^
Adjacency_Lists.F90(939): error #6457: This derived type name has not been declared. [CSR_SPARSITY]
    type(csr_sparsity), intent(in):: NEList
---------^
Adjacency_Lists.F90(865): error #6404: This name does not have a type, and must have an explicit type. [NELIST]
    subroutine find_adjacent_element(ele, adj_ele, nelist, nodes)
---------------------------------------------------^
Adjacency_Lists.F90(929): error #6404: This name does not have a type, and must have an explicit type. [NELIST]
  subroutine FindCommonElements(elements, n, NEList, nodes)
---------------------------------------------^
Adjacency_Lists.F90(892): error #6284: There is no matching specific function for this generic function reference. [ROW_M_PTR]
      elements1 => row_m_ptr(nelist, nodes(1))
-------------------^
Adjacency_Lists.F90(892): error #6678: When the target is an expression it must deliver a pointer result. [ROW_M_PTR]
      elements1 => row_m_ptr(nelist, nodes(1))
-------------------^
Adjacency_Lists.F90(897): error #6284: There is no matching specific function for this generic function reference. [ROW_M_PTR]
        row_idx(i - 1)%ptr => row_m_ptr(nelist, nodes(i))
------------------------------^
Adjacency_Lists.F90(897): error #6678: When the target is an expression it must deliver a pointer result. [ROW_M_PTR]
        row_idx(i - 1)%ptr => row_m_ptr(nelist, nodes(i))
------------------------------^
Adjacency_Lists.F90(949): error #6284: There is no matching specific function for this generic function reference. [ROW_M_PTR]
    elements1 => row_m_ptr( NEList, nodes(1) )
-----------------^
Adjacency_Lists.F90(949): error #6678: When the target is an expression it must deliver a pointer result. [ROW_M_PTR]
    elements1 => row_m_ptr( NEList, nodes(1) )
-----------------^
Adjacency_Lists.F90(952): error #6284: There is no matching specific function for this generic function reference. [ROW_M_PTR]
      row_idx(j-1)%ptr => row_m_ptr(NEList, nodes(j))
--------------------------^
Adjacency_Lists.F90(952): error #6678: When the target is an expression it must deliver a pointer result. [ROW_M_PTR]
      row_idx(j-1)%ptr => row_m_ptr(NEList, nodes(j))
--------------------------^
compilation aborted for Adjacency_Lists.F90 (code 1)
gmake[1]: *** [Adjacency_Lists.o] Error 1
make: *** [lib/libfluidity.a] Error 2

Any ideas? Is this a bug with us or them?!

R

summary: - Fluidity fails to compile with Intel 13
+ Fluidity fails to compile with Intel 14
Dale Roberts (ds-roberts) wrote :

Hi All

I've been working with Rhodri trying to get Fluidity compiled on the NCI's peak machine, with the intent to run scaling tests and perform some profiling once everything is up and running.

What appears to be happening with the above issue is that the type csr_sparsity is being lost from scope from the nested subroutine, and any subsequent declarations, regardless of whether the subroutine is nested or not. Moreover, there are other types from the same module that are picked up in the same nested subroutine and further on in the code.

I currently have a small reproducer of this bug that compiles with gfortran and ifort version 13.1.3.192, but not ifort 14.0.0.080, and will be passing it on to Intel.

I look forward to working with you all.

Dale

Gerard (g-gorman) wrote :

Rhodri, Dale - do you now have a version compiled with intel? If so, which version of the intel compiler?

Rhodri Davies (rhodri-davies) wrote :

Hi Gerard,

Dale has made good progress but we remain unable to compile with Intel. The global numbering (intent) issue is fixed in the latest version of the compilers 14.1.106. They have also fixed the issue that Dale outline above regarding losing type info in a nested subroutine.

However, we are now failing with an issue that we have seen before (module ordering at start of .F90 files). Dale is looking into this at present. I'm sure he'll update the ticket when he has some concrete info.

Rhod

Gerard (g-gorman) wrote :

Many thanks for the update. It's a pity Intel didn't give any feedback on the forum when they promised they would.

Rhodri Davies (rhodri-davies) wrote :

Perhaps give him another prod in the forum? That way, we can get confirmation that he has fixed the other issue you mentioned with Divergence_Matrix_CV.F90. I am not 100% sure of whether this specific issue is fixed yet.

Gerard (g-gorman) wrote :

Rhodri, Dale - can you give me the latest on compiling using Intel. I've exchanged emails with Steve who things the problems have been fixed. It would be an opportune moment to get in a new bug report if needs be.

Dale Roberts (ds-roberts) wrote :

Hi Gerard

Compilation with Intel v14.1.106 now proceeds further than with 14.0.080, we've seen problems with this version in other applications, and now recommend that NCI users avoid it. The latest error comes from Fields_manipulation.F90:

Fields_Manipulation.F90(29): warning #6738: The type/rank/keyword signature for this specific procedure matches another specific procedure that shares the same generic-name. [ELEMENTS^ALLOCATE_ELEMENT]
use elements
----^
Fields_Manipulation.F90(29): warning #6738: The type/rank/keyword signature for this specific procedure matches another specific procedure that shares the same generic-name. [ELEMENTS^ALLOCATE_ELEMENT_WITH_SURFACE]
use elements
----^
Fields_Manipulation.F90(29): warning #6738: The type/rank/keyword signature for this specific procedure matches another specific procedure that shares the same generic-name. [ELEMENTS^ALLOCATE_CONSTRAINTS_TYPE]
use elements
----^
Fields_Manipulation.F90(29): warning #6738: The type/rank/keyword signature for this specific procedure matches another specific procedure that shares the same generic-name. [ELEMENTS^DEALLOCATE_ELEMENT]
use elements
----^
Fields_Manipulation.F90(29): warning #6738: The type/rank/keyword signature for this specific procedure matches another specific procedure that shares the same generic-name. [ELEMENTS^DEALLOCATE_CONSTRAINTS]
use elements
----^
Fields_Manipulation.F90(29): warning #6738: The type/rank/keyword signature for this specific procedure matches another specific procedure that shares the same generic-name. [ELEMENTS^INCREF_ELEMENT_TYPE]
use elements
----^
Fields_Manipulation.F90(29): warning #6738: The type/rank/keyword signature for this specific procedure matches another specific procedure that shares the same generic-name. [ELEMENTS^HAS_REFERENCES_ELEMENT_TYPE]
use elements
----^
Fields_Manipulation.F90(2330): error #8032: Generic procedure reference has two or more specific procedure with the same type/rank/keyword signature. [DEALLOCATE]
    call deallocate(shape)
---------^
Fields_Manipulation.F90(2351): error #8032: Generic procedure reference has two or more specific procedure with the same type/rank/keyword signature. [DEALLOCATE]
    call deallocate(shape)
---------^
Fields_Manipulation.F90(3591): error #8032: Generic procedure reference has two or more specific procedure with the same type/rank/keyword signature. [INCREF]
      call incref(output_mesh%faces%shape)
-----------^

I haven't had a chance to look any closer at this, so I'm afraid I can't be any more helpful. I don't believe the problem is with module ordering any more, as Rhodri experimented briefly with re-ordering without any success. I'd me more inclined to believe that changing the order of 'use' statements changing the outcome of compilation is itself a bug which appears to not be present in this compiler version.

Gerard (g-gorman) wrote :

Thanks Dale. The headache here is to create a minimum compile set that we can bundle up and send to Intel because of the dependency chain. The last time I did this I configured to disable PETSc modules and any other package I could, and then wrote a custom Makefile to build all source files which the target file (in this case Fields_Manipulation.F90) depends on. This gave me something I could tar up for Intel to test. If someone has time to have a go at doing this then let me know because it will be a few weeks before I'll have time to put effort into this again.

Gerard (g-gorman) wrote :

Dale - can you please check for me if the tar ball we gave to intel still reproduces this error?

http://software.intel.com/sites/default/files/comment/1743641/sourcesmakefile.tar.gz

Rhodri Davies (rhodri-davies) wrote :

Hi Gerard,

I tried this today. Sadly, the build fails in fields_manipulation.f90 with the error that Dale has outlined above. Since fields manipulation is compiled before divergence_matrix_cv, I can't confirm whether or not the bug is fixed.

Saying that, this can be seen as good news (I'm an optimist!) - we can simply re-use the existing minimal reproducer with intel again, so that they can fix the issue with fields_manipulation... What worries me is that intel have had this reproducer for a while, so should really have fixed this by now.

I will update the ticket you have with the phenomenally unhelpful Steve Lionel...

Rhod

Rhodri Davies (rhodri-davies) wrote :

Gerard - I've updated the ticket on the intel site. Perhaps you could also add a comment (as you understand the issues better than I do). I will ask Dale to submit an official bug report to intel, via the NCI, in the new year - my supicion is that we'll get a faster fix from them that way than via the forum...

Best wishes and a merry Christmas,

Rhod

Gerard (g-gorman) wrote :

Rhodri/Dale - from the intel forum it looks like Steve has come through for us. Is there any chance that you can apply these workarounds and see if this allows Fluidity to compile?

Tim Greaves (tim-greaves) wrote :

Gerard - if we could get hold of a latest-version compiler on CX1 I have a buildbot queue set up for testing and we could run it through that to keep track of progress.

Rhodri Davies (rhodri-davies) wrote :

Hi Gerard,

The NCI system is down for the next 2 weeks so we will be unable to install the latest compilers and test until then (full service resumes sometime around Jan 13th). In the meantime, is there any chance that somebody at AMCG is able to undertake the changes highlighted by Steve in a branch? As the reproducer you sent him is not the full code, it may be that additional changes are needed elsewhere and you guys will be better qualified to do that than myself or Dale.

Happy New Year!!

Rhod

Rhodri Davies (rhodri-davies) wrote :

Gerard - a quick note - Steve's message ends with the following sentence:

`then the segfault can be reproduced and we can get a handle on what is going on there.' - this makes me think that even after the workarounds, there will be issues - i.e. the seg fault. With that in mind, I'm not sure it's worth installing the workarounds at present (at least until we get a further update from Steve).

Rhod

Gerard (g-gorman) wrote :

I had to reread his message several times to understand it. I believe what he means is that these two work arounds represent a simple reproduction case - i.e. with these changes it is ok, without it segfaults...thus reproduction. Previously he had a few thousand lines of Fortran with a needle lost in the middle making this difficult to debug.

Tim - As it stands Steve has promised to add me to the next beta testing round this January. So it's not really worth my while harassing the cx1 folks for an updated intel version right now.

Tim Greaves (tim-greaves) wrote :

Great - thanks Gerard. If we can get our own version we can stick it on our own system which would make life much easier for debugging purposes.

Rhodri Davies (rhodri-davies) wrote :

Just to update this ticket - intel claim to have fixed the issues we were having building Fluidity (or at least a reproducer that Gerard kindly supplied them with) - see this link:

http://software.intel.com/en-us/comment/1743703#comment-1743703

This fix will be released sometime in April. As soon as this update is available, Dale (at the NCI here) will test the fluidity build and report back. Fingers crossed.

I have asked intel to add Gerard's reproducer to their test suite - let's see what happens there...

R

Rhodri Davies (rhodri-davies) wrote :

Ok - Gerard's reproducer is now part of intel's regression test suite - so that's good news. They also claim to have fixed all issues with the reproducer, such that we should no longer need the two workarounds they previously provided for issues with Fields_Manipulation.F90 and Global_Numbering.F90. Let's see how things look in April when the update is released.

R

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers