Giter Site home page Giter Site logo

Comments (8)

evelikov avatar evelikov commented on July 16, 2024

Hello fellow Arch user. Can you share some idiot proof step-by-step reproducer steps?

Yes, I don't think we fixed anything like that with 3.0.12.

from dkms.

C0rn3j avatar C0rn3j commented on July 16, 2024

I can't reproduce it with a fake module(same error, but return code 4), so I presume a condition is that a module already has to be installed, or some other weird stuff is going on.

I can reproduce it by breaking an existing nvidia module by pointing its source file to /dev/null

[0] % cd /var/lib/dkms/nvidia/535.113.01

[0] % sudo rm -f source; sudo ln -sf /usr/src/nvidia-535.113.01 source

[0] % dkms status     
nvidia/535.113.01, 6.1.58-1-lts, x86_64: installed
nvidia/535.113.01, 6.5.7-arch1-1, x86_64: installed

[0] % sudo rm -f source; sudo ln -sf /dev/null source                 

[0] % dkms status                                    
Error! Could not locate dkms.conf file.
File: /var/lib/dkms/nvidia/535.113.01/source/dkms.conf does not exist.

[0] % 

from dkms.

evelikov avatar evelikov commented on July 16, 2024

The reproducer works for me. The error seems to be coming from the read_conf in module_status_built_extra().

All the other instances across the codebase are read_conf_or_die and a lot of that code is over 10 years old. Off the top of my head, I cannot see a reason why we couldn't flip the final user to "_or_die" variant.

@anbe42 IIRC you recently silenced dkms status so it doesn't show deprecation warnings - aka the read_conf that I'm thinking of, plus you did quite a lot of work around autoinstall (thanks again).

Do you have foresee any issues if we promote the error to being fatal?

from dkms.

evelikov avatar evelikov commented on July 16, 2024

@scaronni if you have any input, that would be highly appreciated as well. Thanks o/

from dkms.

evelikov avatar evelikov commented on July 16, 2024

Thinking about this a little more: autoinstall, explicitly aims to solder on, even when building/installing of specific module fails. So promoting the error to fatal does in the opposite direction.

On the other hand if dkms.conf is missing then the module is catastrophically broken.

@C0rn3j what did you do/what triggered the error on your end - was it manually tinkering around or something OS/packaging that caused it?

from dkms.

C0rn3j avatar C0rn3j commented on July 16, 2024

I am not sure yet what triggered it, I just had a bunch of broken dkms builds on two machines for non-existent kernel and driver versions, I suspect some weird race condition prodded on by the kernel-modules-hook package.

from dkms.

anbe42 avatar anbe42 commented on July 16, 2024

Looks like we have two things to fix here:

  • recovery from an (externally) broken /var/lib/dkms, aka dkms fsck
  • error propagation in such a case (the bug reported here)

A possibility how this broken state could have happened: Some packaging removed /usr/src/$driver-$oldversion upon some upgrade without calling the corresponding dkms remove hook first ... Should not happen with Debian packaged *-dkms modules, but I don't know what else is out there in the wild ...

from dkms.

evelikov avatar evelikov commented on July 16, 2024

Indeed splitting this in two makes sense. Recovery would be great, although since the base information is missing aka dkms.conf I don't know what we can do here.

Looking from the latter point, we already exit in all the other instances of missing dkms.conf. So it's a case of making those non-fatal and then fixing the almost impossible to test error paths or flipping the final one.

Browsing across the Arch packages:

  • kernel-modules-hook - touches only /usr/lib/modules making and restoring backups
  • nvidia-dkms - the one that was likely removed
  • dkms itself has separate hook/script, which does manual parsing/handling (akin to autoinstall) ensuring depmod is called only once per kernel, even if XXXs dkms modules are added/removed.

AFAICT autoinstall does not exist as far as Arch is concerned, although the extra script does call dkms status.

The pacman hook triggering the script is post transaction for install, and pre transaction for update/remove, so it cannot be the one causing the issue.

Considering there is no obvious way how this can happen (in Arch and Debian), outside of user error (it's fine, I'm not trying to blame anyone here) I'm inclined make it fatal error. If it turns out there's some valid use-case we can quickly revert it.

That said, let's leave this issue open for a while and see how things go.

from dkms.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.