Giter Site home page Giter Site logo

brush's People

Contributors

galdeia avatar jdromano2 avatar lacava avatar msnliu avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

galdeia

brush's Issues

Improved time series operators

Computable Phenotypes can represent quite complex relations, such as this (portion of the) definition for resistant hypertension:

Has ≥ 4 simultaneous matching med classes on ≥ 2 occasions, ≥ 1 month apart.

The model must

  • aggregate time series
  • filter time series to minimum intervals
  • select time series matching a class of drugs
  • threshold the values of the time series
  • threshold the counts of the time series

For this example, wewould updates to operators to get this to work:

  • Sum operator that takes nary TimeSeries operators and returns a time series object
    • It would essentially be a “grouped” sum, where values are grouped by unique times and then the operator is applied
  • Some sort of filter/mask operator that filters a TimeSeries’s samples based on a condition
    • Interval: returns the time between events
    • Values: returns the time series values
  • FilterValues, FilterTimes, FilterIntervals
    • Signature: TimeSeries<T>TimeSeries<T>
    • operators with a threshold (stored in w)
    • heuristic way to determine threshold? (number of samples filtered in cases versus controls?)
    • probably the thresholds would have to be determined stochastically

Program class

The main class for an individual program that can learn, predict, mutate, and mate with other programs.

TODO:

  • make it easy to initialize a program from an initializer list of nodes or json
  • tree visualization using dot https://www.graphviz.org/pdf/dotguide.pdf
  • specializations for classification, regression, and unsupervised dimensionality reduction (aka multi-output)
  • allow default construction (follow sklearn API)

fit generates invalid sig_hash for ArrayXb terminal nodes

Both Brush's C++ module and python wrapper works fine with datasets that does not contain any binary column.

However, during some experiments with the pmlb's adult dataset (which have 2 bool columns), my jupyter notebook python kernel eventually died, and it seems that fitting expressions with one or more binary columns were the cause.

It seems that the sig_hash created for the Terminal nodes that have its type inferred inside the data.h is different from what is stores in the dispatch_table. I tried to fix that, but with no success, so I decided to open an issue here (still trying to fix it though).

Below there are some evidence that I gathered while trying to fix that.

Python wrapper

I am using gdb, after changing setup.py to enable compiling the C++ module with debug mode, to get a backtrace of the core dump. I have converted the src/brush/D_TS_experiments.ipynb into an python script (.py) to be able to run it with gdb. For this specific backtrace, I used the docs/examples/datasets/d_example_patients.csv dataset, which contains one binary column (sex), due to simplicity.

When I run the Brush's NSGAII evolutionary algorithm I got:

FATAL ERROR brush/src/program/dispatch_table.h:172: sig_hash=577185359398356073 not in map_.at(Terminal)
options:
14884157073895229501
509529941281334733
13777882714371223207
17717457037689164349

terminate called without an active exception

Thread 1 "python" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352685376) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.

The error backtrace is:

(gdb) backtrace
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352685376) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352685376) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352685376, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fffb5e13026 in __gnu_cxx::__verbose_terminate_handler ()
    at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#6  0x00007fffb5e11514 in __cxxabiv1::__terminate (handler=<optimized out>)
    at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#7  0x00007fffb5e11566 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
#8  0x00007fffb5e117a4 in __cxxabiv1::__cxa_rethrow () at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:136
#9  0x00007fffb15882ed in Brush::Util::HandleErrorThrow (err=..., 
    file=0x7fffb169a830 "brush/src/program/dispatch_table.h", line=172)
    at brush/src/util/error.cpp:16
#10 0x00007fffb13c5da8 in Brush::DispatchTable<true>::Get<Eigen::Array<bool, -1, 1, 0, -1, 1> > (
    this=0x7fffb19e5fa0 <Brush::dtable_fit>, n=Brush::NodeType::Terminal, sig_hash=577185359398356073)
    at brush/src/program/dispatch_table.h:172
#11 0x00007fffb13748d7 in tree_node_<Brush::Node>::fit<Eigen::Array<bool, -1, 1, 0, -1, 1> > (this=0x555556546430, 
    d=...) at brush/src/program/tree_node.h:63
#12 0x00007fffb1374976 in Brush::Operator<(Brush::NodeType)68719476736, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<bool, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true, void>::fit(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) const (this=0x7fffffffc2ef, d=..., tn=...)
    at brush/src/program/split.h:231
#13 0x00007fffb131dfa6 in Brush::Operator<(Brush::NodeType)68719476736, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<bool, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>--Type <RET> for more, q to quit, c to continue without paging--
, true, void>::eval(Brush::Data::Dataset const&, tree_node_<Brush::Node>&, float const**) const (this=0x7fffffffc2ef, 
    d=..., tn=..., weights=0x0) at brush/src/program/split.h:278
#14 0x00007fffb12fc25f in Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)68719476736, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<bool, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) (d=..., tn=...)
    at brush/src/program/operator.h:305
#15 0x00007fffb13aa8d9 in std::__invoke_impl<Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1> (*&)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Brush::Data::Dataset const&, tree_node_<Brush::Node>&> (
    __f=@0x7fffffffc420: 0x7fffb12fc22d <Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)68719476736, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<bool, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/invoke.h:61
#16 0x00007fffb133be16 in std::__invoke_r<Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1> (*&)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Brush::Data::Dataset const&, tree_node_<Brush::Node>&> (
    __fn=@0x7fffffffc420: 0x7fffb12fc22d <Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)68719476736, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<bool, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>) at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/invoke.h:116
#17 0x00007fffb130268d in std::_Function_handler<Eigen::Array<float, -1, 1, 0, -1, 1> (Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Eigen::Array<float, -1, 1, 0, -1, 1> (*)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>::_M_invoke(std::_Any_data const&, Brush::Data::Dataset const&, tree_node_<Brush::Node>&) (__functor=..., __args#0=..., 
    __args#1=...)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/std_function.h:291
#18 0x00007fffb13aa4a8 in std::function<Eigen::Array<float, -1, 1, 0, -1, 1> (Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>::operator()(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) const (this=0x7fffffffc420, 
    __args#0=..., __args#1=...)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/std_function.h:591
#19 0x00007fffb133b925 in tree_node_<Brush::Node>::fit<Eigen::Array<float, -1, 1, 0, -1, 1> > (this=0x555556546370, 
    d=...) at brush/src/program/tree_node.h:64
#20 0x00007fffb134431c in Brush::Operator<(Brush::NodeType)16, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>)>, true, void>::get_kids<std::array<Eigen::Array<float, -1, 1, 0, -1, 1>, 1ul> >(Brush::Data::Dataset const&, tree_node_<Brush::Node>&, float const**) const (this=0x7fffffffc5cf, d=..., tn=..., 
    weights=0x0) at brush/src/program/operator.h:113
#21 0x00007fffb1307965 in Brush::Operator<(Brush::NodeType)16, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>)>, true, void>::eval<std::array<Eigen::Array<float, -1, 1, 0, -1, 1>, 1ul>, float>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&, float const**) const (this=0x7fffffffc5cf, d=..., tn=..., 
    weights=0x0) at brush/src/program/operator.h:199
#22 0x00007fffb12f7bfc in Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)16, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) (d=..., tn=...) at brush/src/program/operator.h:305
#23 0x00007fffb13aa8d9 in std::__invoke_impl<Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1> (*&)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Brush::Data::Dataset const&, tree_node_<Brush::Node>&> (
    __f=@0x7fffffffc700: 0x7fffb12f7bca <Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)16, Br--Type <RET> for more, q to quit, c to continue without paging--
ush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/invoke.h:61
#24 0x00007fffb133be16 in std::__invoke_r<Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1> (*&)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Brush::Data::Dataset const&, tree_node_<Brush::Node>&> (
    __fn=@0x7fffffffc700: 0x7fffb12f7bca <Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)16, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/invoke.h:116
#25 0x00007fffb130268d in std::_Function_handler<Eigen::Array<float, -1, 1, 0, -1, 1> (Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Eigen::Array<float, -1, 1, 0, -1, 1> (*)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>::_M_invoke(std::_Any_data const&, Brush::Data::Dataset const&, tree_node_<Brush::Node>&) (__functor=..., __args#0=..., 
    __args#1=...)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/std_function.h:291
#26 0x00007fffb13aa4a8 in std::function<Eigen::Array<float, -1, 1, 0, -1, 1> (Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>::operator()(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) const (this=0x7fffffffc700, 
    __args#0=..., __args#1=...)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/std_function.h:591
#27 0x00007fffb133b925 in tree_node_<Brush::Node>::fit<Eigen::Array<float, -1, 1, 0, -1, 1> > (this=0x5555565482b0, 
    d=...) at brush/src/program/tree_node.h:64
#28 0x00007fffb136380c in Brush::Operator<(Brush::NodeType)8388608, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true, void>::get_kids<Eigen::Array<float, -1, 4, 0, -1, 4> >(Brush::Data::Dataset const&, tree_node_<Brush::Node>&, float const**) const (this=0x7fffffffc8ef, d=..., tn=..., weights=0x0)
    at brush/src/program/operator.h:115
#29 0x00007fffb1316c45 in Brush::Operator<(Brush::NodeType)8388608, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true, void>::eval<Eigen::Array<float, -1, 4, 0, -1, 4>, float>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&, float const**) const (this=0x7fffffffc8ef, d=..., tn=..., weights=0x0)
    at brush/src/program/operator.h:199
#30 0x00007fffb12fa9ca in Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)8388608, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) (d=..., tn=...) at brush/src/program/operator.h:305
#31 0x00007fffb13aa8d9 in std::__invoke_impl<Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1> (*&)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Brush::Data::Dataset const&, tree_node_<Brush::Node>&> (
    __f=@0x7fffffffca20: 0x7fffb12fa998 <Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)8388608, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/invoke.h:61
#32 0x00007fffb133be16 in std::__invoke_r<Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1> (*&)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Brush::Data::Dataset const&, tree_node_<Brush::Node>&> (
--Type <RET> for more, q to quit, c to continue without paging--
    __fn=@0x7fffffffca20: 0x7fffb12fa998 <Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)8388608, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/invoke.h:116
#33 0x00007fffb130268d in std::_Function_handler<Eigen::Array<float, -1, 1, 0, -1, 1> (Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Eigen::Array<float, -1, 1, 0, -1, 1> (*)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>::_M_invoke(std::_Any_data const&, Brush::Data::Dataset const&, tree_node_<Brush::Node>&) (__functor=..., __args#0=..., 
    __args#1=...)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/std_function.h:291
#34 0x00007fffb1b364b8 in std::function<Eigen::Array<float, -1, 1, 0, -1, 1> (Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>::operator()(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) const (this=0x7fffffffca20, 
    __args#0=..., __args#1=...)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/std_function.h:591
#35 0x00007fffb1b269c9 in tree_node_<Brush::Node>::fit<Eigen::Array<float, -1, 1, 0, -1, 1> > (this=0x555556548370, 
    d=...) at brush/src/bindings/../program/tree_node.h:64
#36 0x00007fffb1b26a3c in Brush::Program<(Brush::ProgramType)0>::fit (this=0x5555575a33b0, d=...)
    at brush/src/bindings/../program/program.h:100
#37 0x00007fffb1b3659a in pybind11::cpp_function::cpp_function<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}::operator()(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&) const (__closure=0x555556700228, c=0x5555575a33b0, args#0=...)
    at miniconda3/envs/brush/include/pybind11/pybind11.h:110
#38 0x00007fffb1b6eaf5 in pybind11::detail::argument_loader<Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&>::call_impl<Brush::Program<(Brush::ProgramType)0>&, pybind11::cpp_function::cpp_function<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}&, 0ul, 1ul, pybind11::detail::void_type>(pybind11::cpp_function::cpp_function<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::detail::void_type&&) && (
    this=0x7fffffffcb80, f=...) at miniconda3/envs/brush/include/pybind11/cast.h:1443
#39 0x00007fffb1b5dc51 in pybind11::detail::argument_loader<Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&>::call<Brush::Program<(Brush::ProgramType)0>&, pybind11::detail::void_type, pybind11::cpp_function::cpp_function<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}&>(pybind11::c--Type <RET> for more, q to quit, c to continue without paging--
pp_function::cpp_function<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}&) && (this=0x7fffffffcb80, f=...) at miniconda3/envs/brush/include/pybind11/cast.h:1411
#40 0x00007fffb1b46dea in pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}, Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(pybind11::cpp_function::initialize<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}&&, Brush::Program<(Brush::ProgramType)0>& (*)(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const (
    __closure=0x0, call=...) at miniconda3/envs/brush/include/pybind11/pybind11.h:248
#41 0x00007fffb1b46efb in pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}, Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(pybind11::cpp_function::initialize<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}&&, Brush::Program<(Brush::ProgramType)0>& (*)(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) ()
    at miniconda3/envs/brush/include/pybind11/pybind11.h:223
#42 0x00007fffb1a93a02 in pybind11::cpp_function::dispatcher (self=0x7fffb1e63a80, args_in=0x7fffaff04dc0, 
    kwargs_in=0x0) at miniconda3/envs/brush/include/pybind11/pybind11.h:939
#43 0x0000555555755497 in cfunction_call (func=0x7fffb1e77920, args=<optimized out>, kwargs=<optimized out>)
    at /usr/local/src/conda/python-3.11.2/Objects/methodobject.c:542
#44 0x00005555557314d4 in _PyObject_MakeTpCall (tstate=0x555555ad58b8 <_PyRuntime+166328>, callable=0x7fffb1e77920, 
    args=<optimized out>, nargs=<optimized out>, keywords=0x0) at /usr/local/src/conda/python-3.11.2/Objects/call.c:214
#45 0x000055555573df85 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, 
    throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.2/Python/ceval.c:4772
#46 0x0000555555784e5d in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fb02a0, 
    tstate=0x555555ad58b8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.2/Include/internal/pycore_ceval.h:73
--Type <RET> for more, q to quit, c to continue without paging--
#47 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, func=0x7fffb1eb11c0, tstate=0x555555ad58b8 <_PyRuntime+166328>)
    at /usr/local/src/conda/python-3.11.2/Python/ceval.c:6435
#48 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=<optimized out>, func=0x7fffb1eb11c0) at /usr/local/src/conda/python-3.11.2/Objects/call.c:393
#49 _PyObject_VectorcallTstate (tstate=0x555555ad58b8 <_PyRuntime+166328>, callable=0x7fffb1eb11c0, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.11.2/Include/internal/pycore_call.h:92
#50 0x0000555555784bf3 in method_vectorcall (method=method@entry=0x7fffaff06200, args=args@entry=0x7fffb1dce318, nargsf=<optimized out>, kwnames=0x7fffb1e972e0)
    at /usr/local/src/conda/python-3.11.2/Objects/classobject.c:59
#51 0x000055555576f57d in _PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=0x7fffaff06200, func=0x555555784b10 <method_vectorcall>, tstate=0x555555ad58b8 <_PyRuntime+166328>)
    at /usr/local/src/conda/python-3.11.2/Objects/call.c:257
#52 _PyObject_Call (kwargs=<optimized out>, args=<optimized out>, callable=0x7fffaff06200, tstate=0x555555ad58b8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.2/Objects/call.c:328
#53 PyObject_Call (callable=0x7fffaff06200, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.2/Objects/call.c:355
#54 0x0000555555841bd8 in partial_call (pto=0x7fffaff0cae0, args=0x7fffb1e97a60, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.2/Modules/_functoolsmodule.c:324
#55 0x00005555557314d4 in _PyObject_MakeTpCall (tstate=0x555555ad58b8 <_PyRuntime+166328>, callable=0x7fffaff0cae0, args=<optimized out>, nargs=<optimized out>, keywords=0x0)
    at /usr/local/src/conda/python-3.11.2/Objects/call.c:214
#56 0x00005555557aaf52 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffffffd5e0, callable=0x7fffaff0cae0, tstate=0x555555ad58b8 <_PyRuntime+166328>)
    at /usr/local/src/conda/python-3.11.2/Include/internal/pycore_call.h:92
#57 map_next (lz=<optimized out>) at /usr/local/src/conda/python-3.11.2/Python/bltinmodule.c:1369
#58 0x00005555557caf12 in zip_next (lz=0x7fffb1e8eb40) at /usr/local/src/conda/python-3.11.2/Python/bltinmodule.c:2788
#59 0x000055555573dd6e in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.2/Include/object.h:133
#60 0x00005555557fbb9e in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fb0020, tstate=0x555555ad58b8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.2/Include/internal/pycore_ceval.h:73
#61 _PyEval_Vector (tstate=0x555555ad58b8 <_PyRuntime+166328>, func=0x7ffff6dcdf80, locals=0x7ffff6df2280, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.11.2/Python/ceval.c:6435
#62 0x00005555557fb12f in PyEval_EvalCode (co=<optimized out>, globals=0x7ffff6df2280, locals=<optimized out>) at /usr/local/src/conda/python-3.11.2/Python/ceval.c:1154
#63 0x000055555581d49c in run_eval_code_obj (tstate=0x555555ad58b8 <_PyRuntime+166328>, co=0x555555c049d0, globals=0x7ffff6df2280, locals=0x7ffff6df2280)
    at /usr/local/src/conda/python-3.11.2/Python/pythonrun.c:1714
#64 0x0000555555819994 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7ffff6df2280, locals=0x7ffff6df2280, flags=<optimized out>, arena=<optimized out>)
    at /usr/local/src/conda/python-3.11.2/Python/pythonrun.c:1735
#65 0x000055555582e912 in pyrun_file (fp=fp@entry=0x555555b3f520, filename=filename@entry=0x7ffff6d965b0, start=start@entry=257, globals=globals@entry=0x7ffff6df2280, 
    locals=locals@entry=0x7ffff6df2280, closeit=closeit@entry=1, flags=0x7fffffffda58) at /usr/local/src/conda/python-3.11.2/Python/pythonrun.c:1630
#66 0x000055555582e235 in _PyRun_SimpleFileObject (fp=0x555555b3f520, filename=0x7ffff6d965b0, closeit=1, flags=0x7fffffffda58) at /usr/local/src/conda/python-3.11.2/Python/pythonrun.c:440
#67 0x000055555582e003 in _PyRun_AnyFileObject (fp=0x555555b3f520, filename=0x7ffff6d965b0, closeit=1, flags=0x7fffffffda58) at /usr/local/src/conda/python-3.11.2/Python/pythonrun.c:79
#68 0x00005555558280d6 in pymain_run_file_obj (skip_source_first_line=0, filename=0x7ffff6d965b0, program_name=0x7ffff6d2ec60) at /usr/local/src/conda/python-3.11.2/Modules/main.c:360
#69 pymain_run_file (config=0x555555abb900 <_PyRuntime+59904>) at /usr/local/src/conda/python-3.11.2/Modules/main.c:379
#70 pymain_run_python (exitcode=0x7fffffffda50) at /usr/local/src/conda/python-3.11.2/Modules/main.c:601
#71 Py_RunMain () at /usr/local/src/conda/python-3.11.2/Modules/main.c:680
#72 0x00005555557e9819 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.11.2/Modules/main.c:734
#73 0x00007ffff7c29d90 in __libc_start_call_main (main=main@entry=0x5555557e9770 <main>, argc=argc@entry=2, argv=argv@entry=0x7fffffffdca8) at ../sysdeps/nptl/libc_start_call_main.h:58
#74 0x00007ffff7c29e40 in __libc_start_main_impl (main=0x5555557e9770 <main>, argc=2, argv=0x7fffffffdca8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffdc98) at ../csu/libc-start.c:392
#75 0x00005555557e96b1 in _start ()

Steps to get the backtrace:

  1. Set cfg = "debug" in setup.py
  2. from the root: pip install .
  3. enter gdb with gdb python
  4. Create the following script:
from brush import BrushRegressor
import pandas as pd

if __name__ == '__main__':
    
    data = pd.read_csv('docs/examples/datasets/d_example_patients.csv')
    X = data.drop(columns='target')
    y = data['target']

    est = BrushRegressor().fit(X,y)
  1. run the script with (gdb) run <path_to_script_above.py>
  2. wait for the error, then call backtrace.

Failing test case

To check if this happens also in the C++ module, I've implemented a simple test case that works with the following data:

x_0 <ArrayXb>: [false, true, false]
x_1 <ArrayXb>: [false, true, true]
x_2 <ArrayXi>: [2, 1, -3]
x_3 <ArrayXi>: [2, 1, 3]
x_4 <ArrayXf>: [2.1, 3.7, -5.2]

The test fails during the fit of the following expression:

Tree model for depth = 2, size= 4: Sub(1.00,If(x_0>1.00,x_4,1.00))
Name Sub, node Sub, feature , sig_hash 10001460114883919497
Name Constant, node Constant, feature C, sig_hash 17717457037689164349
Name SplitOn, node SplitOn, feature , sig_hash 13925856710854127623
Name Terminal, node Terminal, feature x_0, sig_hash 577185359398356073
Name Terminal, node Terminal, feature x_4, sig_hash 17717457037689164349
Name Constant, node Constant, feature C, sig_hash 17717457037689164349

PRG fit
FATAL ERROR brush/src/program/dispatch_table.h:172: sig_hash=577185359398356073 not in map_.at(Terminal)
options:
14884157073895229501
509529941281334733
13777882714371223207
17717457037689164349

terminate called without an active exception
Aborted (core dumped)

abide by pandas Dataframe types

Right now Brush Dataset tries to infer types, which might not be matched in another dataset for the same problem with, e.g., different numbers of integer values.

Solution: make Brush Dataset abide by pandas dataframe types as much as possible. Give the dataset initializer an optional argument of types for each feature.

Brush is incompatible with MacOS

Brush requires a modern version of GCC to compile, but modern versions of GCC are hard to come by on MacOS (e.g., the newest supported version on conda-forge is v4.8.5).

It would be ideal if we could add support for clang++ (LLVM). The version that comes installed on most Macs has support for almost all of the C++20 feature proposals (see https://clang.llvm.org/cxx_status.html#cxx20).

This is probably not highest priority at this point, but I can see it being an important feature down the road.

docs build commands

Hi @JDRomano2

what are the build commands for the docs? if I run

cd docs
sphinx-build . ../_site

I am able to generate a site, but it is mostly blank and has this breathe message in several places:

Warning
doxygennamespace: breathe_default_project value ‘brush’ does not seem to be a valid key for the breathe_projects dictionary

specify dependencies

  • specify major version dependencies in environment.yml
    • stick to fuzzy versions or minimum versions where possible
  • @lacava to provide conda info
  • update README with main requirements, if not obvious from environment.yml

Compilation Error

Error overview:

I am having compilation error in the linking step. Specifically, the error message is "Undefined symbols for architecture x86_64" and "ld: symbols not found for architecture x86_64". It looks like it has to do with string format, and a related package is fmt.

Error detail:

Originally I was trying on my M1 chip mac with Arm architecture, but it didn't work out, so I switched to a different mac using x86 architecture, and I encountered the following error:

`Undefined symbols for architecture x86_64:
"__ZN3fmt2v97vformatB5cxx11ENS0_17basic_string_viewIcEENS0_17basic_format_argsINS0_20basic_format_contextINS0_8appenderEcEEEE", referenced from:
__ZNK5Brush13DispatchTableILb1EE3GetIN5Eigen5ArrayIfLin1ELi1ELi0ELin1ELi1EEEEERKSt8functionIFT_RKNS_4data4DataERNS_10tree_node_INS_4NodeEEEEENS_8NodeTypeEm in brushgp.cpp.o
__ZNK5Brush13DispatchTableILb0EE3GetIN5Eigen5ArrayIfLin1ELi1ELi0ELin1ELi1EEEEERKSt8functionIFT_RKNS_4data4DataERNS_10tree_node_INS_4NodeEEEEENS_8NodeTypeEm in brushgp.cpp.o
__ZNK5Brush13DispatchTableILb1EE3GetIN5Eigen5ArrayIfLin1ELi1ELi0ELin1ELi1EEEEERKSt8functionIFT_RKNS_4data4DataERNS_10tree_node_INS_4NodeEEEEENS_8NodeTypeEm in dispatch_table.cpp.o
__ZNK5Brush13DispatchTableILb1EE3GetIN5Eigen5ArrayIfLin1ELin1ELi0ELin1ELin1EEEEERKSt8functionIFT_RKNS_4data4DataERNS_10tree_node_INS_4NodeEEEEENS_8NodeTypeEm in dispatch_table.cpp.o
__ZNK5Brush13DispatchTableILb1EE3GetIN5Eigen5ArrayIbLin1ELi1ELi0ELin1ELi1EEEEERKSt8functionIFT_RKNS_4data4DataERNS_10tree_node_INS_4NodeEEEEENS_8NodeTypeEm in dispatch_table.cpp.o
__ZNK5Brush13DispatchTableILb1EE3GetIN5Eigen5ArrayIbLin1ELin1ELi0ELin1ELin1EEEEERKSt8functionIFT_RKNS_4data4DataERNS_10tree_node_INS_4NodeEEEEENS_8NodeTypeEm in dispatch_table.cpp.o
__ZNK5Brush13DispatchTableILb1EE3GetINS_4data10TimeSeriesIfEEEERKSt8functionIFT_RKNS3_4DataERNS_10tree_node_INS_4NodeEEEEENS_8NodeTypeEm in dispatch_table.cpp.o
...

ld: symbol(s) not found for architecture x86_64`

Error attempt: The things I have tried so far:

initial attemps:

  1. change compiler from Apple Clang 13 to gcc-11
  2. include fmt header in brushgp and dispatch_table

suggested attempts:

  1. include fmt header in third party under src
  2. brew install fmt and link to it

further attemps:

  1. followed the stackoverflow post "https://stackoverflow.com/questions/56608684/how-to-use-the-fmt-library-without-getting-undefined-symbols-for-architecture-x" and tried fmt/format.h instead of fmt/core.h

future thoughts:

  1. maybe I should use the optional header-only mode or link to the fmt library?

It seems that all of the above attempts failed.

step through Operon's implementation of NNLS with Ceres

since operon accomplishes this in a similar fashion, a good bet would be to

  • install their code
  • put breakpoints in the autodiff/ceres weight optimization portions
  • step through the code and write a description of how they achieve this

see this file: https://github.com/heal-research/operon/blob/79d9f6cdd725c4724aa3f911de2bb08d65e3b20c/include/operon/nnls/nnls.hpp

  • find an entry point from one of the tests that uses Ceres. (They also have Eigen's nls solver, avoid that)

Mutation and Crossover

Mutation and crossover operators, part of the Program class.
These variation operators should choose the point of variation according to the distribution of prob_change among the nodes in the tree, and should insert new material from the search space according to the prob_keep associated with each node.

*Mutation *

  • insert
  • point
  • delete

Crossover

  • subtree swap

  • (semantic) best subtree replacement

  • tests!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.