Giter Site home page Giter Site logo

Comments (9)

tangliisu avatar tangliisu commented on April 16, 2024

I take some local test and cachelib works well. Just crashed when i go to test cluster

from cachelib.

sathyaphoenix avatar sathyaphoenix commented on April 16, 2024

That's strange. The default in the config is 1.25 here (https://github.com/facebookincubator/CacheLib/blob/main/cachelib/allocator/CacheAllocatorConfig.h#L569).

Can you share the stack trace of the exception and also log the config.allocationClassSizeFactor before creating the cache through make_unique.

Does it always crash and does the error happen when you manually set the factor through setDefaultAllocSizes() ?

from cachelib.

tangliisu avatar tangliisu commented on April 16, 2024

Hi I will private build to check config.allocationClassSizeFactor tomorrow and check if it will crash if i manually set the factor tomorrow. this is the stack trace of the exception.

E0818 21:25:01.105298    14 cachelib_cache_handler.cpp:54] invalid factor 6.93298464824273e-310
E0818 21:25:01.105343    14 ExceptionTracer.cpp:210] terminate() called, exception stack follows
E0818 21:25:01.105351    14 ExceptionTracer.cpp:212] Exception type: std::invalid_argument (14 frames)
    @ 00007fa0a90be092 __cxa_throw
                       /opt/folly/folly/experimental/exception_tracer/ExceptionTracerLib.cpp:58
    @ 0000564ab86e010c facebook::cachelib::MemoryAllocator::generateAllocSizes(double, unsigned int, unsigned int, bool) [clone .cold.442]
                       /opt/cachelib/cachelib/allocator/memory/MemoryAllocator.cpp:187
    @ 0000564abf74dede facebook::cachelib::CacheAllocator<facebook::cachelib::LruCacheTrait>::getAllocatorConfig(facebook::cachelib::CacheAllocatorConfig<facebook::cachelib::CacheAllocator<facebook::cachelib::LruCacheTrait> > const&)
                       /opt/cachelib/cachelib/../cachelib/allocator/Util.h:150
                       -> /opt/cachelib/cachelib/allocator/CacheAllocator.cpp
    @ 0000564abf79c75f facebook::cachelib::CacheAllocator<facebook::cachelib::LruCacheTrait>::CacheAllocator(facebook::cachelib::CacheAllocatorConfig<facebook::cachelib::CacheAllocator<facebook::cachelib::LruCacheTrait> >)
                       /opt/cachelib/cachelib/../cachelib/allocator/CacheAllocator-inl.h:34
                       -> /opt/cachelib/cachelib/allocator/CacheAllocator.cpp
    @ 0000564ab8de0021 cache_util::CreateCachelib(unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
                       /usr/include/c++/8/bits/unique_ptr.h:835
                       -> /proc/self/cwd/common/cache_util/cachelib_cache_handler.cpp
    @ 0000564ab8de4fbc cache_util::CachelibCacheHandler::CachelibCacheHandler(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, cache_util::SegmentInfo, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, cache_util::SegmentInfo> > > const&)
                       /proc/self/cwd/common/cache_util/cachelib_cache_handler.cpp:64
    @ 0000564ab8ba8197 scorpion::CreateCacheHandler(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<cache_util::CacheHandler>*)
                       /usr/include/c++/8/ext/new_allocator.h:136
                       -> /proc/self/cwd/scorpion_v2/utils.cpp
    @ 0000564ab88ef10b CreateScorpionHandlerV2(std::unique_ptr<scorpion::ScorpionHandlerV2, std::default_delete<scorpion::ScorpionHandlerV2> >*, std::shared_ptr<scorpion::model_server::ModelServer>*)
                       /proc/self/cwd/scorpion_v2/scorpion_v2.cpp:153
    @ 0000564ab86f18b7 main
                       /proc/self/cwd/scorpion_v2/scorpion_v2.cpp:238
    @ 00007fa09b3cbbf6 __libc_start_main
    @ 0000564ab88e7849 _start

E0818 21:25:01.129964    14 ExceptionTracer.cpp:214] exception stack complete
terminate called after throwing an instance of 'std::invalid_argument'
  what():  invalid factor 6.93298464824273e-310

from cachelib.

tangliisu avatar tangliisu commented on April 16, 2024

Hi @sathyaphoenix I tried to private build again. Surprisingly cachelib is not crashed and allocationClassFSizeFactor is as expected.
E0819 20:13:54.022516 14 cachelib_cache_handler.cpp:52] Cachelib allocationClassFSizeFactor in config is: 1.25
I think right now everything is good. Feel free to closing the ticket. (although i still don't know why allocationClassFSizeFactor becomes 0 sometimes)

from cachelib.

sathyaphoenix avatar sathyaphoenix commented on April 16, 2024

Thanks for confirming. If you can, please run with ASAN enabled and see if it can provide more information. For now, I am closing this issue. Please reopen if this re-appears and needs investigation.

from cachelib.

tangliisu avatar tangliisu commented on April 16, 2024

Hi @sathyaphoenix finally we find the root cause is we set -DFOLLY_SSE=0 to support AVX512 compiler optimizer. But cachelib requires folly::dynamic and f14map in nvmconfig and f14map requires at least FOLLY_SSE=2. I think cachelib does not check this case but just throws an error with a confusing error message.

The error does not appear in private build is because we don't use any compiler optimizer in private build pipeline. After setting folly_sse=2 in our master build pipeline, the error goes away. Do you think we can add an additional check or have a comment in nvmconfig to avoid this issue?

from cachelib.

sathyaphoenix avatar sathyaphoenix commented on April 16, 2024

@tangliisu Can you share the confusing error message that you see and also more details on how this causes the double value to be ~0. Also, please note that NvmConfig has moved away from using folly::dynamic in the main branch and it has simple declarative api to configure it. https://cachelib.org/docs/Cache_Library_User_Guides/Configure_HybridCache) .. We do rely on F14Map though. Once you share the error message, we can look into an appropriate work around.

from cachelib.

tangliisu avatar tangliisu commented on April 16, 2024

Thanks for the info. We pin cachelib to an old version so nvmconfig is still there.

I could not reproduce the error message allocationClassFSizeFactor ~0 in recent build. Recently the bad build error stack trace is

F0902 20:10:39.052548    14 dynamic.cpp:137] Check failed: 0 
*** Check failure stack trace: ***
    @     0x7f8d2b6739bd  google::LogMessage::Fail()
    @     0x7f8d2b6758a8  google::LogMessage::SendToLog()
    @     0x7f8d2b673563  google::LogMessage::Flush()
    @     0x7f8d2b6762f9  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f8d2b0ce716  folly::dynamic::operator=()
    @     0x562fb47e392e  facebook::cachelib::NvmCache<>::Config::Config()
    @     0x562fb47ed0b3  facebook::cachelib::CacheAllocatorConfig<>::CacheAllocatorConfig()
    @     0x562fb483b122  facebook::cachelib::CacheAllocator<>::CacheAllocator()
    @     0x562fade7cb2a  cache_util::CreateCachelib()
    @     0x562fade80c32  cache_util::CachelibCacheHandler::CachelibCacheHandler()
    @     0x562fadc40b48  scorpion::CreateCacheHandler()
    @     0x562fad98330c  CreateScorpionHandlerV2()
    @     0x562fad7853b8  main
    @     0x7f8d1aa53bf7  (unknown)
    @     0x562fad97b61a  _start

which makes sense. But i happened to get the confusing ~0 error before we figured out the FOLLY_SSE=0 issue

E0818 21:25:01.129964    14 ExceptionTracer.cpp:214] exception stack complete
terminate called after throwing an instance of 'std::invalid_argument'
  what():  invalid factor 6.93298464824273e-310

If cachelib still rely on F14Map, i guess we need to have FOLLY_SSE=2.

BTW we implemented cachelib in our system. The perf is very impressive. We are still working on tuning the cachelib to see if we could further reduce the CPU usage.

from cachelib.

sathyaphoenix avatar sathyaphoenix commented on April 16, 2024

Great to hear it is working out as expected. Let us know if you need any information for tuning.

It is strange though that not setting FOLLY_SSE=2 would cause an unrelated double to be broken. cc @agordon if he has any insights to share.

from cachelib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.