Ubuntu
nvidia-cuda-toolkit package

nvprof does not complete without sudo

Bug #1767205 reported by Martin D. Weinberg on 2018-04-26

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	nvidia-cuda-toolkit (Ubuntu)	Expired	Undecided	Unassigned

Bug Description

Description: Ubuntu 18.04 LTS
Release: 18.04

Expected behavior: profile output

Actual behavior: error messages

Reproduce as follows:

cd NVIDIA_CUDA-9.1_Samples/0_Simple/matrixMul
nvcc -I ../../common/inc matrixMul.cu -o matrixMul

# check the exe works

./matrixMul
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "GeForce GTX 1080" with compute capability 6.1

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 1137.23 GFlop/s, Time= 0.115 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

# now try nvprof
nvprof ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
==4775== NVPROF is profiling process 4775, command: ./matrixMul
GPU Device 0: "GeForce GTX 1080" with compute capability 6.1

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
==4775== Error: Internal profiling error 4168:999.
Performance= 1130.40 GFlop/s, Time= 0.116 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
======== Error: CUDA profiling error.

# run with sudo
sudo nvprof ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
==4797== NVPROF is profiling process 4797, command: ./matrixMul
GPU Device 0: "GeForce GTX 1080" with compute capability 6.1

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 1132.95 GFlop/s, Time= 0.116 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==4797== Profiling application: ./matrixMul
==4797== Profiling result:
            Type Time(%) Time Calls Avg Min Max Name
GPU activities: 99.54% 34.644ms 301 115.10us 114.15us 116.07us void matrixMulCUDA<int=32>(float*, float*,                     0.28% 98.465us                     0.18% 62.944us       API calls: 74.77% 110.27ms 3 36.757ms 3.4300us 110.26ms cudaMalloc
                   22.45% 33.105ms                     0.93% 1.3780ms                     0.81% 1.1874ms                     0.36% 536.51us                     0.31% 451.50us                     0.11% 156.37us                     0.09% 132.82us                     0.07% 100.43us                     0.06% 82.848us                     0.02% 35.673us                     0.02% 33.788us                     0.00% 5.3080us                     0.00% 3.2350us                     0.00% 2.8120us                     0.00% 2.0920us                     0.00% 1.7410us                     0.00% 1.0230us                     0.00% 658ns float*, int, int)
2 49.232us 32.960us 65.505us [CUDA memcpy HtoD]
1 62.944us 62.944us 62.944us [CUDA memcpy DtoH]
1 33.105ms 33.105ms 33.105ms cudaEventSynchronize
3 459.33us 427.70us 478.26us cudaGetDeviceProperties
301 3.9440us 3.7260us 18.511us cudaLaunch
3 178.84us 56.346us 363.23us cudaMemcpy
94 4.8030us 301ns 228.31us cuDeviceGetAttribute
1 156.37us 156.37us 156.37us cudaDeviceSynchronize
1505 88ns 79ns 289ns cudaSetupArgument
3 33.475us 4.3440us 83.746us cudaFree
1 82.848us 82.848us 82.848us cuDeviceTotalMem
301 118ns 110ns 801ns cudaConfigureCall
1 33.788us 33.788us 33.788us cuDeviceGetName
2 2.6540us 2.2050us 3.1030us cudaEventRecord
2 1.6170us 1.0960us 2.1390us cudaEventCreate
1 2.8120us 2.8120us 2.8120us cudaSetDevice
1 2.0920us 2.0920us 2.0920us cudaEventElapsedTime
3 580ns 292ns 1.0710us cuDeviceGetCount
2 511ns 353ns 670ns cuDeviceGet
1 658ns 658ns 658ns cudaGetDeviceCount

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: nvidia-profiler 9.1.85-3
ProcVersionSignature: Ubuntu 4.15.0-20.21-generic 4.15.17
Uname: Linux 4.15.0-20-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.9-0ubuntu7
Architecture: amd64
Date: Thu Apr 26 17:28:48 2018
Dependencies:
gcc-8-base 8-20180414-1ubuntu2
libc6 2.27-3ubuntu1
libcuinj64-9.1 9.1.85-3
libgcc1 1:8-20180414-1ubuntu2
InstallationDate: Installed on 2018-04-21 (5 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Alpha amd64 (20180421)
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: nvidia-cuda-toolkit
UpgradeStatus: No upgrade log present (probably fresh install)

Tags:

Revision history for this message

Martin D. Weinberg (martin-weinberg-5) wrote on 2018-04-26:

ProcCpuinfoMinimal.txt Edit (1.2 KiB, text/plain; charset="utf-8")

Revision history for this message

Graham Inggs (ginggs) wrote on 2018-04-27:

Martin, thanks for the detailed bug report.
I have tried on two different machines and on both, nvprof works without requiring sudo access.
However, neither of these are fresh installations of 18.04.

After running nvprof with sudo, please try running it again without.

Also, would you be able to test on a different machine?

Graham Inggs (ginggs) on 2018-04-27

Changed in nvidia-cuda-toolkit (Ubuntu):
status:	New → Incomplete

Revision history for this message

Martin D. Weinberg (martin-weinberg-5) wrote on 2018-04-27:

There is no change running 'nvprof' immediately after 'sudo 'nvprof' or vice versa. The former fails, the latter works. The same is true for 'nvvp' because of course 'nvvp' calls 'nvprof'. I ran 'nvprof' with strace to look for some sort of write access to a /proc or /sys resource but I didn't see anything obvious.

I build this machine expressly for CUDA development and 18.04 (then beta) was installed one week ago so there have been next to no customizations. I do not have another Ubuntu box with Nvidia, sorry about that.

Revision history for this message

Graham Inggs (ginggs) wrote on 2018-04-27:

Please try:

sudo apt install nvidia-modprobe

and reboot.

Revision history for this message

Martin D. Weinberg (martin-weinberg-5) wrote on 2018-04-27:

Thanks. Did that, but no change in the 'nvprof' behavior.

Revision history for this message

Graham Inggs (ginggs) wrote on 2018-04-27:

I will try to reproduce this in a clean install next week.

Revision history for this message

Graham Inggs (ginggs) wrote on 2018-04-28:

Martin, I noticed your GPU has compute capability 6.1 and my tested GPUs were 5.0 and 3.0.
Would you please try the newer nvidia-modprobe 390.25 from my PPA?
https://launchpad.net/~ginggs/+archive/ubuntu/testing

Revision history for this message

Martin D. Weinberg (martin-weinberg-5) wrote on 2018-04-28:

I tried it, and the behavior is the same.

However, the thread http://devtalk.nvidia.com/default/topic/1025155/-resolved-profiling-error-4168-999 suggests that this issue may be an upstream bug, although the developer did not give details.

So maybe the Ubuntu package is all good, but we'll need to wait for an upstream release to fix this issue.

Revision history for this message

Graham Inggs (ginggs) wrote on 2018-04-29:

Martin, thanks for testing and for the link.

From that link, it sounds like there is a known issue with cudaMemcpy2DToArray.
Would you try some of the samples from 1_Utilities, e.g. bandwdithTest, or even deviceQuery and see if you get the same error?

At this stage, please keep nvidia-modprobe installed.

While trying to reproduce this issue, I was also unable to run nvprof as a normal user after removing nvidia-modprobe, but I never see 'Error: Internal profiling error 4168:999'. I've reported this as bug LP: #1767777

Revision history for this message

Martin D. Weinberg (martin-weinberg-5) wrote on 2018-04-29:

#10

Same problem with bandwidthTest. nvprof works with deviceQuery and reports time spent in the various cuda query calls, but also reports "no kernels profiled".

I did check the /dev/nvidia* permissions and they look fine to me, e.g. crw-rw-rw. As I also mentioned, suid/guid on nvprof did not help either; has to be sudo or an actual root login (e.g. sudo -i).

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-06-29:

#11

[Expired for nvidia-cuda-toolkit (Ubuntu) because there has been no activity for 60 days.]

Changed in nvidia-cuda-toolkit (Ubuntu):
status:	Incomplete → Expired

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

ProcCpuinfoMinimal.txt Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntunvidia-cuda-toolkit package

nvprof does not complete without sudo

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
nvidia-cuda-toolkit package