Giter Site home page Giter Site logo

Restart Lua cleanly about way-cooler HOT 6 CLOSED

Timidger avatar Timidger commented on July 21, 2024
Restart Lua cleanly

from way-cooler.

Comments (6)

psychon avatar psychon commented on July 21, 2024

Fun fact: Does not crash here. Oh wait, yes it does. Let's try again under valgrind... holy shit, 4490 errors from 549 contexts, most of them "something in GL-land". Anyway, this time the crash was:

==11972== Process terminating with default action of signal 11 (SIGSEGV)
==11972==  Access not within mapped region at address 0x4E
==11972==    at 0x9AF2C3A: ffi_closure_unix64_inner (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==11972==    by 0x9AF3135: ffi_closure_unix64 (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==11972==    by 0x4E85B72: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.1)
==11972==    by 0x4E850F4: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.1)
==11972==    by 0x4E854BF: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.1)
==11972==    by 0x4E857D1: g_main_loop_run (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.1)
==11972==    by 0x5C7CB2: glib::auto::main_loop::MainLoop::run (main_loop.rs:52)
==11972==    by 0x139691: way_cooler::awesome::lua::enter_glib_loop::{{closure}} (mod.rs:57)
==11972==    by 0x158A37: <std::thread::local::LocalKey<T>>::try_with (local.rs:290)
==11972==    by 0x158784: <std::thread::local::LocalKey<T>>::with (local.rs:244)
==11972==    by 0x28A3F2: way_cooler::awesome::lua::enter_glib_loop (mod.rs:57)
==11972==    by 0x23838F: way_cooler::main::{{closure}} (main.rs:84)

The above seems to be "glib tries to call a FFI callback that was already freed". If you want me to guess: The pre-restart textclock started a timer for updating itself and that timer now fired. Thus, GLib calls the pointer that it got back then, but this was already freed ad now points into "nana-land". Thus, Bad Things (tm) happen.

I'm not really sure how to test this theory, but dropping the old lua state (lua_closeing it) definitely does not stop timers. I guess this could be hacked in for timers, but I also guess that there are many other possibilities to register a callback with something that would need to be fixed.

from way-cooler.

psychon avatar psychon commented on July 21, 2024

I'm not really sure how to test this theory,

I removed the textclock from the config and let the result. It survived a bit longer, but still crashed. This time valgrind even finds a use-after-free:

==12388== Invalid read of size 4
==12388==    at 0x9AF2C22: ffi_closure_unix64_inner (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==12388==    by 0x9AF3135: ffi_closure_unix64 (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==12388==    by 0x4E85B72: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.1)
==12388==    by 0x4E850F4: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.1)
==12388==    by 0x4E854BF: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.1)
==12388==    by 0x4E857D1: g_main_loop_run (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.1)
==12388==    by 0x5C7CB2: glib::auto::main_loop::MainLoop::run (main_loop.rs:52)
==12388==    by 0x139691: way_cooler::awesome::lua::enter_glib_loop::{{closure}} (mod.rs:57)
==12388==    by 0x158A37: <std::thread::local::LocalKey<T>>::try_with (local.rs:290)
==12388==    by 0x158784: <std::thread::local::LocalKey<T>>::with (local.rs:244)
==12388==    by 0x28A3F2: way_cooler::awesome::lua::enter_glib_loop (mod.rs:57)
==12388==    by 0x23838F: way_cooler::main::{{closure}} (main.rs:84)
==12388==  Address 0x124cb8cc is 4 bytes before a block of size 6 free'd
==12388==    at 0x4C2DDBB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12388==    by 0x4698DD: <alloc::heap::Heap as core::heap::Alloc>::dealloc (heap.rs:104)
==12388==    by 0x463181: <alloc::raw_vec::RawVec<T, A>>::dealloc_buffer (raw_vec.rs:706)
==12388==    by 0x464024: <alloc::raw_vec::RawVec<T, A> as core::ops::drop::Drop>::drop (raw_vec.rs:715)
==12388==    by 0x45AE94: core::ptr::drop_in_place (ptr.rs:59)
==12388==    by 0x45B4B1: core::ptr::drop_in_place (ptr.rs:59)
==12388==    by 0x45AAC4: core::ptr::drop_in_place (ptr.rs:59)
==12388==    by 0x4590DF: rlua::conversion::<impl rlua::value::ToLua<'lua> for alloc::string::String>::to_lua (conversion.rs:191)
==12388==    by 0x18FED4: rlua::multi::<impl rlua::value::ToLuaMulti<'lua> for T>::to_lua_multi (multi.rs:30)
==12388==    by 0x318DBE: rlua::lua::Lua::create_function::{{closure}} (lua.rs:277)
==12388==    by 0x454E1E: rlua::lua::Lua::create_callback_function::callback_call_impl::{{closure}} (lua.rs:1018)
==12388==    by 0x456285: std::panicking::try::do_call (panicking.rs:306)
==12388==  Block was alloc'd at
==12388==    at 0x4C2CB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12388==    by 0x5D6025: __rdl_alloc (in /home/psychon/projects/way-cooler/target/debug/way-cooler)
==12388==    by 0x5EC23B: alloc::slice::<impl alloc::borrow::ToOwned for [T]>::to_owned (in /home/psychon/projects/way-cooler/target/debug/way-cooler)
==12388==    by 0x5EBD84: <alloc::string::String as core::convert::From<&'a str>>::from (in /home/psychon/projects/way-cooler/target/debug/way-cooler)
==12388==    by 0x238473: <T as core::convert::Into<U>>::into (convert.rs:396)
==12388==    by 0x2395EB: way_cooler::awesome::lua::rust_interop::type_override (rust_interop.rs:31)
==12388==    by 0x17F2B8: core::ops::function::Fn::call (function.rs:73)
==12388==    by 0x318C2C: rlua::lua::Lua::create_function::{{closure}} (lua.rs:277)
==12388==    by 0x454E1E: rlua::lua::Lua::create_callback_function::callback_call_impl::{{closure}} (lua.rs:1018)
==12388==    by 0x456285: std::panicking::try::do_call (panicking.rs:306)
==12388==    by 0x5EAF3E: __rust_maybe_catch_panic (in /home/psychon/projects/way-cooler/target/debug/way-cooler)
==12388==    by 0x455F60: std::panicking::try (panicking.rs:285)

However, the address is 4 bytes before a block suggests that the backtrace is useless.

from way-cooler.

psychon avatar psychon commented on July 21, 2024

Here is a self-contained example that (I think) reproduces the crash (this was tested with rlua = { version = "0.12.0", default-features = false } and glib = "0.5.0" in Cargo.toml):

extern crate rlua;
extern crate glib;

use rlua::{Lua, Result};
use glib::MainLoop;

fn add_timer_and_drop_lua() -> Result<()> {
    let lua = Lua::new();
    lua.exec::<()>(
        r#"
        local glib = require("lgi").GLib
        glib.timeout_add(glib.PRIORITY_DEFAULT, 1000, function()
            print("timer")
            return true
        end)
        "#, None)?;
    Ok(())
}

fn main() {
    add_timer_and_drop_lua().unwrap();
    MainLoop::new(None, false).run();
}

Valgrind produces a megabyte of output since "things limp along" for a while before the crash, but the first error is:

==15178== Invalid read of size 4
==15178==    at 0x61DDC22: ffi_closure_unix64_inner (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==15178==    by 0x61DE135: ffi_closure_unix64 (in /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4)
==15178==    by 0x50D9B72: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.1)
==15178==    by 0x50D90F4: g_main_context_dispatch (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.1)
==15178==    by 0x50D94BF: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.1)
==15178==    by 0x50D97D1: g_main_loop_run (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.5600.1)
==15178==    by 0x116782: glib::auto::main_loop::MainLoop::run (main_loop.rs:52)
==15178==    by 0x11040C: foo::main (main.rs:22)
==15178==    by 0x112971: std::rt::lang_start::{{closure}} (rt.rs:74)
==15178==    by 0x155EE7: _ZN3std9panicking3try7do_call17h28ad4f5d10c3b646E.llvm.6634232217126391703 (in /tmp/foo/target/debug/foo)
==15178==    by 0x165DAE: __rust_maybe_catch_panic (in /tmp/foo/target/debug/foo)
==15178==    by 0x154215: std::rt::lang_start_internal (in /tmp/foo/target/debug/foo)
==15178==  Address 0x6af110c is 76 bytes inside a block of size 360 free'd
==15178==    at 0x4C2DDBB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15178==    by 0x11C30A: rlua::lua::Lua::create_lua::allocator (lua.rs:905)
==15178==    by 0x53B50A0: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==15178==    by 0x53B2E6A: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==15178==    by 0x53B3790: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==15178==    by 0x53B966D: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==15178==    by 0x11ABA1: <rlua::lua::Lua as core::ops::drop::Drop>::drop (lua.rs:66)
==15178==    by 0x110844: core::ptr::drop_in_place (ptr.rs:59)
==15178==    by 0x11036A: foo::add_timer_and_drop_lua (main.rs:18)
==15178==    by 0x1103D0: foo::main (main.rs:21)
==15178==    by 0x112971: std::rt::lang_start::{{closure}} (rt.rs:74)
==15178==    by 0x155EE7: _ZN3std9panicking3try7do_call17h28ad4f5d10c3b646E.llvm.6634232217126391703 (in /tmp/foo/target/debug/foo)
==15178==  Block was alloc'd at
==15178==    at 0x4C2CABF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15178==    by 0x4C2EE04: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15178==    by 0x11C319: rlua::lua::Lua::create_lua::allocator (lua.rs:908)
==15178==    by 0x53B50A0: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==15178==    by 0x53B35BD: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==15178==    by 0x53BA100: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==15178==    by 0x53ACA7D: lua_newuserdata (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==15178==    by 0x6DEE314: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3-lgi.so.0.0.0)
==15178==    by 0x6DEFD95: lgi_callable_create (in /usr/lib/x86_64-linux-gnu/liblua5.3-lgi.so.0.0.0)
==15178==    by 0x6DF5A19: lgi_marshal_2c (in /usr/lib/x86_64-linux-gnu/liblua5.3-lgi.so.0.0.0)
==15178==    by 0x6DEEA45: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3-lgi.so.0.0.0)
==15178==    by 0x6DF079D: ??? (in /usr/lib/x86_64-linux-gnu/liblua5.3-lgi.so.0.0.0)

Basically, Thou Shalt Not Drop The Lua (at least not without getting rid of the whole process).

How about implementing restart similarly to what awesome does: Calling execve with the arguments that were given to C main (this might be a bit harder to do from Rust...?).

from way-cooler.

Timidger avatar Timidger commented on July 21, 2024

The issue I have with exec is that it will restart the compositor which means all the clients will close.

Thanks for confirming that glib essentially needs to live as long as Lua.

The "correct" solution I feel like is to make Lua its own process / thread and use IPC to communicate. That will make things harder, but at the same time make it much easier to do things like this.

There have been talks in irc about performance concerns after it was shown gnome is acting slow because it runs Javascript in the same thread as rendering.

I know we just consolidated it into one thread, but that might be something that should be changed sooner rather than later.

wlroots is starting to stabalize, so I think I'll be focusing on wlroots-rs and compositor level stuff for Way Cooler first since that's more clear what to do.

I have a feeling we'll need to set it so that Way Cooler is two parts: a compositor (like the x server for awesome, does rendering and things) and Lua that executes user code. They could be separate processes maybe... Hmm.

from way-cooler.

psychon avatar psychon commented on July 21, 2024

The "correct" solution I feel like is to make Lua its own process / thread and use IPC to communicate.

A thread would still have this issue. One could work around it by setting up a separate GMainContext for the Lua thread and just leaking (or freeing? Does that work cleanly? I don't know) the old one on a "restart", but I am not really sure if that's a sensible thing to do.

Multiple processes would be quite non-trivial as well. If you want to go in that direction, I would suggest to do something "generic" so that there is a stable-ish protocol between the two processes and one could replace the Lua-running process with something else completely.
Some (privileged?) DBus interface for controlling the compositor and making all important decisions?
(Note that I do not know any "good/sane" Lua-DBus-bindings, so picking DBus would make the awesome-part harder, but at the same time would make the protocol much more generic than some hand-rolled one.)

from way-cooler.

Timidger avatar Timidger commented on July 21, 2024

This has been fixed with #556 and is outdated.

from way-cooler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.