m.roth at 5-cent.us
2017-Sep-26 17:40 UTC
[CentOS] Semi-OT: hardware: NVidia proprietary driver, C7.4
This is really frustrating. I've got a server with two K20c Tesla cards. I need to use the proprietary drivers to use the CUDA toolkit. Btw, I had no trouble at all with building for CentOS 7.3 I have what NVidia claims is the correct driver package, a 340 series. It appears to build, but then fails to load. The only error I see is "no such device", which makes no sense to me, esp. since it says nothing whatever else. I've gone through the install log, and there are a bunch of Note:, and warnings, but the later I think are all about comparing signed and unsigned integers. And lsmod shows no nvidia drivers registered, but the logs claims that Error: Driver 'nvidia' is already registered, aborting... Anyone got any ideas? mark
Scott Robbins
2017-Sep-26 17:59 UTC
[CentOS] Semi-OT: hardware: NVidia proprietary driver, C7.4
On Tue, Sep 26, 2017 at 01:40:54PM -0400, m.roth at 5-cent.us wrote:> This is really frustrating. I've got a server with two K20c Tesla cards. I > need to use the proprietary drivers to use the CUDA toolkit. Btw, I had no > trouble at all with building for CentOS 7.3 > > I have what NVidia claims is the correct driver package, a 340 series. It > appears to build, but then fails to load. The only error I see is "no such > device", which makes no sense to me, esp. since it says nothing whatever > else.Why not use the elrepo repo? They've worked flawlessly for me, both with legacy and new cards. -- Scott Robbins PGP keyID EB3467D6 ( 1B48 077D 66F6 9DB0 FDC2 A409 FA54 EB34 67D6 ) gpg --keyserver pgp.mit.edu --recv-keys EB3467D6
Phelps, Matthew
2017-Sep-26 18:18 UTC
[CentOS] Semi-OT: hardware: NVidia proprietary driver, C7.4
On Tue, Sep 26, 2017 at 1:59 PM, Scott Robbins <scottro11 at gmail.com> wrote:> On Tue, Sep 26, 2017 at 01:40:54PM -0400, m.roth at 5-cent.us wrote: > > This is really frustrating. I've got a server with two K20c Tesla cards. > I > > need to use the proprietary drivers to use the CUDA toolkit. Btw, I had > no > > trouble at all with building for CentOS 7.3 > > > > I have what NVidia claims is the correct driver package, a 340 series. It > > appears to build, but then fails to load. The only error I see is "no > such > > device", which makes no sense to me, esp. since it says nothing whatever > > else. > > Why not use the elrepo repo? They've worked flawlessly for me, both with > legacy and new cards. > > > > -- > Scott Robbins > PGP keyID EB3467D6 > ( 1B48 077D 66F6 9DB0 FDC2 A409 FA54 EB34 67D6 ) > gpg --keyserver pgp.mit.edu --recv-keys EB3467D6 > >Seconded. We use the elrepo repository for hundreds of workstations and have had no issues. Takes care of everything automatically. -- Matt Phelps System Administrator, Computation Facility Harvard - Smithsonian Center for Astrophysics mphelps at cfa.harvard.edu, http://www.cfa.harvard.edu
Phil Perry
2017-Sep-26 19:45 UTC
[CentOS] Semi-OT: hardware: NVidia proprietary driver, C7.4
On 26/09/17 18:40, m.roth at 5-cent.us wrote:> This is really frustrating. I've got a server with two K20c Tesla cards. I > need to use the proprietary drivers to use the CUDA toolkit. Btw, I had no > trouble at all with building for CentOS 7.3 > > I have what NVidia claims is the correct driver package, a 340 series. It > appears to build, but then fails to load. The only error I see is "no such > device", which makes no sense to me, esp. since it says nothing whatever > else. > > I've gone through the install log, and there are a bunch of Note:, and > warnings, but the later I think are all about comparing signed and > unsigned integers. > > And lsmod shows no nvidia drivers registered, but the logs claims that > Error: Driver 'nvidia' is already registered, aborting... > > Anyone got any ideas? > > mark >You don't say which version of the 340 series driver you have tried. There was a bug with recent legacy releases that affected el7.4 kernels. We (elrepo) patched the driver to fix that on rhel7.4 releases. I'm not sure but it _may_ have been fixed in the 340.104 driver released last week - I've not bothered building it as the changelog only mentions "Improved compatibility with recent Linux kernels" which we patched/fixed in our the previous release and other issues which don't affect kmods on RHEL. So it sounds like a known issue which has already been fixed. If you don't want to use our packages, maybe take a look at the patch and try applying it to your build.
Pete Biggs
2017-Sep-26 20:31 UTC
[CentOS] Semi-OT: hardware: NVidia proprietary driver, C7.4
On Tue, 2017-09-26 at 13:40 -0400, m.roth at 5-cent.us wrote:> This is really frustrating. I've got a server with two K20c Tesla cards. I > need to use the proprietary drivers to use the CUDA toolkit. Btw, I had no > trouble at all with building for CentOS 7.3 > > I have what NVidia claims is the correct driver package, a 340 series. It > appears to build, but then fails to load. The only error I see is "no such > device", which makes no sense to me, esp. since it says nothing whatever > else. > > I've gone through the install log, and there are a bunch of Note:, and > warnings, but the later I think are all about comparing signed and > unsigned integers. > > And lsmod shows no nvidia drivers registered, but the logs claims that > Error: Driver 'nvidia' is already registered, aborting... >Have you tried installing the toolkit from nVidia's own repository: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=CentOS&target_version=7&target_type=rpmnetwork That includes the kernel drivers as far as I can remember. P.
vychytraly .
2017-Sep-26 21:46 UTC
[CentOS] Semi-OT: hardware: NVidia proprietary driver, C7.4
>From my experience elrepo nvidia drivers work fine with CUDA packages fromnvidia repository On Tue, Sep 26, 2017 at 10:31 PM, Pete Biggs <pete at biggs.org.uk> wrote:> On Tue, 2017-09-26 at 13:40 -0400, m.roth at 5-cent.us wrote: > > This is really frustrating. I've got a server with two K20c Tesla cards. > I > > need to use the proprietary drivers to use the CUDA toolkit. Btw, I had > no > > trouble at all with building for CentOS 7.3 > > > > I have what NVidia claims is the correct driver package, a 340 series. It > > appears to build, but then fails to load. The only error I see is "no > such > > device", which makes no sense to me, esp. since it says nothing whatever > > else. > > > > I've gone through the install log, and there are a bunch of Note:, and > > warnings, but the later I think are all about comparing signed and > > unsigned integers. > > > > And lsmod shows no nvidia drivers registered, but the logs claims that > > Error: Driver 'nvidia' is already registered, aborting... > > > Have you tried installing the toolkit from nVidia's own repository: > > https://developer.nvidia.com/cuda-downloads?target_os> Linux&target_arch=x86_64&target_distro=CentOS&target_ > version=7&target_type=rpmnetwork > > That includes the kernel drivers as far as I can remember. > > P. > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >
Nicolas Kovacs
2017-Sep-26 21:47 UTC
[CentOS] Semi-OT: hardware: NVidia proprietary driver, C7.4
Le 26/09/2017 ? 19:59, Scott Robbins a ?crit?:> Why not use the elrepo repo? They've worked flawlessly for me, both with > legacy and new cards.I know this is weird, but I've had cases where the downloaded NVidia driver worked and the ELRepo driver didn't, and the other way around. Details here: https://blog.microlinux.fr/nvidia-centos/ Niki -- Microlinux - Solutions informatiques durables 7, place de l'?glise - 30730 Montpezat Web : http://www.microlinux.fr Mail : info at microlinux.fr T?l. : 04 66 63 10 32
Sorin Srbu
2017-Sep-27 06:56 UTC
[CentOS] Semi-OT: hardware: NVidia proprietary driver, C7.4
> -----Original Message----- > From: CentOS [mailto:centos-bounces at centos.org] On Behalf Of Phil Perry > Sent: den 26 september 2017 21:46 > To: centos at centos.org > Subject: Re: [CentOS] Semi-OT: hardware: NVidia proprietary driver, C7.4 > > On 26/09/17 18:40, m.roth at 5-cent.us wrote: > > This is really frustrating. I've got a server with two K20c Tesla cards.I> > need to use the proprietary drivers to use the CUDA toolkit. Btw, I hadno> > trouble at all with building for CentOS 7.3 > > > > I have what NVidia claims is the correct driver package, a 340 series.It> > appears to build, but then fails to load. The only error I see is "nosuch> > device", which makes no sense to me, esp. since it says nothing whatever > > else. > > > > I've gone through the install log, and there are a bunch of Note:, and > > warnings, but the later I think are all about comparing signed and > > unsigned integers. > > > > And lsmod shows no nvidia drivers registered, but the logs claims that > > Error: Driver 'nvidia' is already registered, aborting... > > > > Anyone got any ideas? > > > > mark > > > > You don't say which version of the 340 series driver you have tried. > > There was a bug with recent legacy releases that affected el7.4 kernels. > We (elrepo) patched the driver to fix that on rhel7.4 releases. I'm not > sure but it _may_ have been fixed in the 340.104 driver released last > week - I've not bothered building it as the changelog only mentions > "Improved compatibility with recent Linux kernels" which we > patched/fixed in our the previous release and other issues which don't > affect kmods on RHEL. > > So it sounds like a known issue which has already been fixed. If you > don't want to use our packages, maybe take a look at the patch and try > applying it to your build.Tested 340.76, 340.102, 340.104 (elrepo and proprietary). No luck over here with a GTX260 and the 64b-drivers. Will test some more, if still no luck, I'll just reinstall from scratch. -- //Sorin