nvprof does not complete without sudo

Bug #1767205 reported by Martin D. Weinberg
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
nvidia-cuda-toolkit (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Description: Ubuntu 18.04 LTS
Release: 18.04

Expected behavior: profile output

Actual behavior: error messages

Reproduce as follows:

cd NVIDIA_CUDA-9.1_Samples/0_Simple/matrixMul
nvcc -I ../../common/inc matrixMul.cu -o matrixMul

# check the exe works

./matrixMul
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "GeForce GTX 1080" with compute capability 6.1

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 1137.23 GFlop/s, Time= 0.115 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

# now try nvprof
nvprof ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
==4775== NVPROF is profiling process 4775, command: ./matrixMul
GPU Device 0: "GeForce GTX 1080" with compute capability 6.1

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
==4775== Error: Internal profiling error 4168:999.
Performance= 1130.40 GFlop/s, Time= 0.116 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
======== Error: CUDA profiling error.

# run with sudo
sudo nvprof ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
==4797== NVPROF is profiling process 4797, command: ./matrixMul
GPU Device 0: "GeForce GTX 1080" with compute capability 6.1

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 1132.95 GFlop/s, Time= 0.116 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==4797== Profiling application: ./matrixMul
==4797== Profiling result:
            Type Time(%) Time Calls Avg Min Max Name
 GPU activities: 99.54% 34.644ms 301 115.10us 114.15us 116.07us void matrixMulCUDA<int=32>(float*, float*, float*, int, int)
                    0.28% 98.465us 2 49.232us 32.960us 65.505us [CUDA memcpy HtoD]
                    0.18% 62.944us 1 62.944us 62.944us 62.944us [CUDA memcpy DtoH]
      API calls: 74.77% 110.27ms 3 36.757ms 3.4300us 110.26ms cudaMalloc
                   22.45% 33.105ms 1 33.105ms 33.105ms 33.105ms cudaEventSynchronize
                    0.93% 1.3780ms 3 459.33us 427.70us 478.26us cudaGetDeviceProperties
                    0.81% 1.1874ms 301 3.9440us 3.7260us 18.511us cudaLaunch
                    0.36% 536.51us 3 178.84us 56.346us 363.23us cudaMemcpy
                    0.31% 451.50us 94 4.8030us 301ns 228.31us cuDeviceGetAttribute
                    0.11% 156.37us 1 156.37us 156.37us 156.37us cudaDeviceSynchronize
                    0.09% 132.82us 1505 88ns 79ns 289ns cudaSetupArgument
                    0.07% 100.43us 3 33.475us 4.3440us 83.746us cudaFree
                    0.06% 82.848us 1 82.848us 82.848us 82.848us cuDeviceTotalMem
                    0.02% 35.673us 301 118ns 110ns 801ns cudaConfigureCall
                    0.02% 33.788us 1 33.788us 33.788us 33.788us cuDeviceGetName
                    0.00% 5.3080us 2 2.6540us 2.2050us 3.1030us cudaEventRecord
                    0.00% 3.2350us 2 1.6170us 1.0960us 2.1390us cudaEventCreate
                    0.00% 2.8120us 1 2.8120us 2.8120us 2.8120us cudaSetDevice
                    0.00% 2.0920us 1 2.0920us 2.0920us 2.0920us cudaEventElapsedTime
                    0.00% 1.7410us 3 580ns 292ns 1.0710us cuDeviceGetCount
                    0.00% 1.0230us 2 511ns 353ns 670ns cuDeviceGet
                    0.00% 658ns 1 658ns 658ns 658ns cudaGetDeviceCount

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: nvidia-profiler 9.1.85-3
ProcVersionSignature: Ubuntu 4.15.0-20.21-generic 4.15.17
Uname: Linux 4.15.0-20-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.9-0ubuntu7
Architecture: amd64
Date: Thu Apr 26 17:28:48 2018
Dependencies:
 gcc-8-base 8-20180414-1ubuntu2
 libc6 2.27-3ubuntu1
 libcuinj64-9.1 9.1.85-3
 libgcc1 1:8-20180414-1ubuntu2
InstallationDate: Installed on 2018-04-21 (5 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Alpha amd64 (20180421)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: nvidia-cuda-toolkit
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Martin D. Weinberg (martin-weinberg-5) wrote :
Revision history for this message
Graham Inggs (ginggs) wrote :

Martin, thanks for the detailed bug report.
I have tried on two different machines and on both, nvprof works without requiring sudo access.
However, neither of these are fresh installations of 18.04.

After running nvprof with sudo, please try running it again without.

Also, would you be able to test on a different machine?

Graham Inggs (ginggs)
Changed in nvidia-cuda-toolkit (Ubuntu):
status: New → Incomplete
Revision history for this message
Martin D. Weinberg (martin-weinberg-5) wrote :

There is no change running 'nvprof' immediately after 'sudo 'nvprof' or vice versa. The former fails, the latter works. The same is true for 'nvvp' because of course 'nvvp' calls 'nvprof'. I ran 'nvprof' with strace to look for some sort of write access to a /proc or /sys resource but I didn't see anything obvious.

I build this machine expressly for CUDA development and 18.04 (then beta) was installed one week ago so there have been next to no customizations. I do not have another Ubuntu box with Nvidia, sorry about that.

Revision history for this message
Graham Inggs (ginggs) wrote :

Please try:

sudo apt install nvidia-modprobe

and reboot.

Revision history for this message
Martin D. Weinberg (martin-weinberg-5) wrote :

Thanks. Did that, but no change in the 'nvprof' behavior.

Revision history for this message
Graham Inggs (ginggs) wrote :

I will try to reproduce this in a clean install next week.

Revision history for this message
Graham Inggs (ginggs) wrote :

Martin, I noticed your GPU has compute capability 6.1 and my tested GPUs were 5.0 and 3.0.
Would you please try the newer nvidia-modprobe 390.25 from my PPA?
https://launchpad.net/~ginggs/+archive/ubuntu/testing

Revision history for this message
Martin D. Weinberg (martin-weinberg-5) wrote :

I tried it, and the behavior is the same.

However, the thread http://devtalk.nvidia.com/default/topic/1025155/-resolved-profiling-error-4168-999 suggests that this issue may be an upstream bug, although the developer did not give details.

So maybe the Ubuntu package is all good, but we'll need to wait for an upstream release to fix this issue.

Revision history for this message
Graham Inggs (ginggs) wrote :

Martin, thanks for testing and for the link.

From that link, it sounds like there is a known issue with cudaMemcpy2DToArray.
Would you try some of the samples from 1_Utilities, e.g. bandwdithTest, or even deviceQuery and see if you get the same error?

At this stage, please keep nvidia-modprobe installed.

While trying to reproduce this issue, I was also unable to run nvprof as a normal user after removing nvidia-modprobe, but I never see 'Error: Internal profiling error 4168:999'. I've reported this as bug LP: #1767777

Revision history for this message
Martin D. Weinberg (martin-weinberg-5) wrote :

Same problem with bandwidthTest. nvprof works with deviceQuery and reports time spent in the various cuda query calls, but also reports "no kernels profiled".

I did check the /dev/nvidia* permissions and they look fine to me, e.g. crw-rw-rw. As I also mentioned, suid/guid on nvprof did not help either; has to be sudo or an actual root login (e.g. sudo -i).

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for nvidia-cuda-toolkit (Ubuntu) because there has been no activity for 60 days.]

Changed in nvidia-cuda-toolkit (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.