Comment 3 for bug 1900627

Revision history for this message
Lishai Eitan (lishai-eitan) wrote :

Comment Summary:
----------------
suggested fix: make libnvidia-ml-dev depend on:
libnvidia-compute-450 (>= 450) | libnvidia-compute-450-server (>= 450) | libnvidia-ml.so.1 (>= 450) | libnvidia-ml1 (>=450)
instead of:
libnvidia-compute-450 (>= 450) | libnvidia-compute-450-server (>= 450) | libnvidia-ml.so.1 (>= 450)

Longer version :)
-----------------
I noticed now that libnvidia-ml.so.1 (>= 450) would satisfy libnvidia-ml-dev's dependencies (and in turn, would allow the installation of nvidia-cuda-toolkit).

both libnvidia-compute-450 and libnvidia-compute-455 provide libnvidia-ml1, AND
 contain the file libnvidia-ml.so.1, but do not "provide" libnvidia-ml.so.1, which suggests the problem is only in the dependency *declerations*, and nothing "material" prevents the installation of these packages together (i.e. if we would force their installation, they would work).

If this is correct, I believe that changing libnvidia-ml-dev to depend on libnvidia-ml1 (>=450) as an alternative to the dependency on libnvidia-ml.so.1 would fix the problem.

To test this theory, I created a dummy package (using equivs) that contain no files, depends on libnvidia-ml1 (>= 455.28) and "provides" libnvidia-ml.so.1 (= 455.28). After installing this package, I successfully installed nvidia-cuda-toolkit along with nvidia-driver-455.
This setup works for me: I was able to install pytorch and train some models on the gpu (verified using nvidia-smi).