browsermt / mts Goto Github PK
View Code? Open in Web Editor NEWThis project forked from ugermann/mts
Marian Translation Service
License: Apache License 2.0
This project forked from ugermann/mts
Marian Translation Service
License: Apache License 2.0
The workers and the queue are properly owned by Service.
Currently they are shared pointers.
Line 27 in 41978b4
Line 28 in 41978b4
They should be member variables of Service. Remove the shared pointer. If Service launches threads it's also responsible for cleaning them up in its destruction. And order the member variables such that the threads are stopped before the queue is deallocated.
If you've got an object that entails a memory allocation or other expensive construction/destruction, prefer to reuse it by pulling the object outside a loop. I see Timer
, Histories
and PCItem
mts/src/bergamot/batch_translator.cpp
Lines 117 to 136 in 338d5f7
This should have been passed by reference.
Line 11 in 41978b4
Also, it's not even clear what this is used for.
Think about this code more:
Lines 38 to 40 in 41978b4
Two threads could decrease the counter atomically then advance to see the counter is zero. Both threads would execute the code.
If I understand correctly, there are currently there are currently 4 interconnected boolean command line options relating to intgemm
--optimize
--optmize8
--intgemm-shift
--intgemm-shift-all
This does not make sense to me as options are either mutually exclusive (things can't be 8-bit and 16-bit at the same time), or imply one another, e.g. intgemm-shift implies intgemm-shift and intgemm-shift implies optimize8.
These four options should be combined into a single string-valued option --optimize with the values
which are then interpreted in cpu::Backed::configureDevice() in tensor/cpu/backend.h
This should happen sooner rather than later so that we don't have command line interface legacy issues. Easy to fix now, a nightmare to change later once people have started using these options.
cc: @kpu
@jerinphilip Based on our discussion on slack and upon some further thought, I'd propose this.
std::vector<std::string> input
and std::string translation
with std::vector<SentencePieceText>
and SentencePieceText, respectively.(Background comment: I'm hesitant to touch the Batch class, as it's fairly baked into Marian, so we need some mechanism of mapping sentence-level translations to the respective requests. Currently the TranslationService has a map that maps from unique sentence ids to the respective Jobs and promises.)
Wrt 3., here's what I'd propose for starters:
In the process of getting mts to run on macOS I had to make modifications to
in particular I had to fork and then create a branches of these repos and then modify the branches such that all the repos work together. My forks + branches are
Should I just submit PR's to the masters of all the above repos to make mts macOS compatible?
Hey,
Trying to compile this with gcc-9/10 results in the following error:
[ 59%] Building CXX object 3rd_party/marian/src/CMakeFiles/marian.dir/common/cli_wrapper.cpp.o
In file included from /mnt/Storage/mts/src/./service/api/rapidjson_utils.h:10,
from /mnt/Storage/mts/src/service/api/bergamot/node_translation.cpp:2:
/mnt/Storage/mts/src/./service/common/translation_service.h:92:52: error: non-local lambda expression cannot have a capture-default
92 | std::function<void (Ptr<Job> j)> callback =[=](Ptr<Job> j){return;});
| ^
In file included from /mnt/Storage/mts/src/service/common/plaintext_translation.cpp:2:
/mnt/Storage/mts/src/service/common/translation_service.h:92:52: error: non-local lambda expression cannot have a capture-default
92 | std::function<void (Ptr<Job> j)> callback =[=](Ptr<Job> j){return;});
| ^
In file included from /mnt/Storage/mts/src/service/api/rapidjson_utils.h:10,
from /mnt/Storage/mts/src/service/api/rapidjson_utils.cpp:5:
/mnt/Storage/mts/src/./service/common/translation_service.h:92:52: error: non-local lambda expression cannot have a capture-default
92 | std::function<void (Ptr<Job> j)> callback =[=](Ptr<Job> j){return;});
| ^
In file included from /mnt/Storage/mts/src/service/api/rapidjson_utils.h:10,
from /mnt/Storage/mts/src/service/api/output_options.cpp:3:
/mnt/Storage/mts/src/./service/common/translation_service.h:92:52: error: non-local lambda expression cannot have a capture-default
92 | std::function<void (Ptr<Job> j)> callback =[=](Ptr<Job> j){return;});
This change seems to have happened in gcc around 2018: https://patchwork.ozlabs.org/project/gcc/patch/[email protected]/
Removing the =
fixes the issue, but I'm not sure if this is the right thing to do.
Hey,
Compiling with gcc-7 results in the following:
[ 97%] Building CXX object src/service/CMakeFiles/rest-server.dir/rest/rest_server.cpp.o
In file included from /mts/src/service/rest/rest_server.cpp:7:
In file included from /mts/src/./service/api/json_request_handler.h:34:
In file included from /mts/src/./service/api/bergamot/json_request_handler.h:3:
/mts/src/./service/api/bergamot/node_translation.h:13:1: error: class 'OutputOptions' was previously declared as a struct; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags]
class OutputOptions;
^
/mts/src/./service/api/output_options.h:5:8: note: previous use is here
struct OutputOptions {
^
/mts/src/./service/api/bergamot/node_translation.h:13:1: note: did you mean struct here?
class OutputOptions;
^~~~~
struct
In file included from /mts/src/service/rest/rest_server.cpp:6:
In file included from /mts/3rd_party/marian/src/marian.h:5:
In file included from /mts/3rd_party/marian/src/common/config.h:3:
In file included from /mts/3rd_party/marian/src/3rd_party/yaml-cpp/yaml.h:10:
In file included from /mts/3rd_party/marian/src/3rd_party/yaml-cpp/parser.h:11:
In file included from /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/memory:81:
In file included from /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/shared_ptr.h:52:
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/shared_ptr_base.h:588:8: error: delete called on non-final 'marian::data::QueuedInput' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor]
delete __p;
^
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/shared_ptr_base.h:595:4: note: in instantiation of function template specialization 'std::__shared_count<__gnu_cxx::_S_atomic>::__shared_count<marian::data::QueuedInput *>' requested here
: __shared_count(__p)
^
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/shared_ptr_base.h:1079:17: note: in instantiation of function template specialization 'std::__shared_count<__gnu_cxx::_S_atomic>::__shared_count<marian::data::QueuedInput *>' requested here
: _M_ptr(__p), _M_refcount(__p, typename is_array<_Tp>::type())
^
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/shared_ptr_base.h:1243:4: note: in instantiation of function template specialization 'std::__shared_ptr<marian::data::QueuedInput, __gnu_cxx::_S_atomic>::__shared_ptr<marian::data::QueuedInput, void>' requested here
__shared_ptr(__p).swap(*this);
^
/mts/src/./service/common/translation_service.h:76:9: note: in instantiation of function template specialization 'std::__shared_ptr<marian::data::QueuedInput, __gnu_cxx::_S_atomic>::reset<marian::data::QueuedInput>' requested here
jq_.reset(new data::QueuedInput(vocabs_, options_));
^
/mts/src/service/rest/rest_server.cpp:180:21: note: in instantiation of function template specialization 'marian::server::TranslationService::start<marian::BeamSearch>' requested here
service->template start<BeamSearch>();
^
In file included from /mts/src/service/rest/rest_server.cpp:6:
In file included from /mts/3rd_party/marian/src/marian.h:5:
In file included from /mts/3rd_party/marian/src/common/config.h:3:
In file included from /mts/3rd_party/marian/src/3rd_party/yaml-cpp/yaml.h:10:
In file included from /mts/3rd_party/marian/src/3rd_party/yaml-cpp/parser.h:11:
In file included from /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/memory:81:
In file included from /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/shared_ptr.h:52:
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/shared_ptr_base.h:376:9: error: delete called on non-final 'marian::data::QueuedInput' that has virtual functions but non-virtual destructor [-Werror,-Wdelete-non-abstract-non-virtual-dtor]
{ delete _M_ptr; }
^
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/shared_ptr_base.h:371:7: note: in instantiation of member function 'std::_Sp_counted_ptr<marian::data::QueuedInput *, __gnu_cxx::_S_atomic>::_M_dispose' requested here
_Sp_counted_ptr(_Ptr __p) noexcept
^
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/shared_ptr_base.h:584:20: note: in instantiation of member function 'std::_Sp_counted_ptr<marian::data::QueuedInput *, __gnu_cxx::_S_atomic>::_Sp_counted_ptr' requested here
_M_pi = new _Sp_counted_ptr<_Ptr, _Lp>(__p);
^
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/shared_ptr_base.h:595:4: note: in instantiation of function template specialization 'std::__shared_count<__gnu_cxx::_S_atomic>::__shared_count<marian::data::QueuedInput *>' requested here
: __shared_count(__p)
^
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/shared_ptr_base.h:1079:17: note: in instantiation of function template specialization 'std::__shared_count<__gnu_cxx::_S_atomic>::__shared_count<marian::data::QueuedInput *>' requested here
: _M_ptr(__p), _M_refcount(__p, typename is_array<_Tp>::type())
^
/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/shared_ptr_base.h:1243:4: note: in instantiation of function template specialization 'std::__shared_ptr<marian::data::QueuedInput, __gnu_cxx::_S_atomic>::__shared_ptr<marian::data::QueuedInput, void>' requested here
__shared_ptr(__p).swap(*this);
^
/mts/src/./service/common/translation_service.h:76:9: note: in instantiation of function template specialization 'std::__shared_ptr<marian::data::QueuedInput, __gnu_cxx::_S_atomic>::reset<marian::data::QueuedInput>' requested here
jq_.reset(new data::QueuedInput(vocabs_, options_));
^
/mts/src/service/rest/rest_server.cpp:180:21: note: in instantiation of function template specialization 'marian::server::TranslationService::start<marian::BeamSearch>' requested here
service->template start<BeamSearch>();
^
3 errors generated.
src/service/CMakeFiles/rest-server.dir/build.make:62: recipe for target 'src/service/CMakeFiles/rest-server.dir/rest/rest_server.cpp.o' failed
make[2]: *** [src/service/CMakeFiles/rest-server.dir/rest/rest_server.cpp.o] Error 1
CMakeFiles/Makefile2:1379: recipe for target 'src/service/CMakeFiles/rest-server.dir/all' failed
make[1]: *** [src/service/CMakeFiles/rest-server.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
mts/src/bergamot/batch_translator.h
Lines 29 to 50 in b30809f
It looks like translate
and mainloop
are not meant to be called from outside the class. So they should be private.
Line 23 in 41978b4
NOTES
The time returned by gettimeofday() is affected by discontinuous jumps in the system time (e.g., if the system
administrator manually changes the system time). If you need a monotonically increasing clock, see clock_get‐
time(2).
Also do you even need a notion of clock time or is a sequence number sufficient to compare ages?
If it can be a direct member variable it should be.
Also, there's no need for the if statements because it's the constructor.
mts/src/bergamot/batch_translator.cpp
Lines 8 to 13 in d9aa0c5
Headers need to explain what member variables are for and what methods do.
Just use a return
to exit the loop. Nobody should call this function multiple times.
mts/src/bergamot/batch_translator.cpp
Line 113 in d9aa0c5
Also, the use of atomic suggests it is meant to be manipulated from multiple threads. When this is really only a local variable.
https://github.com/browsermt/mts/blob/nuke/src/bergamot/request.h#L18-L38
This is inconsistent
Lines 19 to 20 in 41978b4
Just a confusing API.
Hey,
Compilation is broken with boost version>= 1.7 (released in April 2019) due to change in the interface:
/mnt/Storage/mts/3rd_party/crow/include/crow/socket_adaptors.h: In member function ‘boost::asio::io_service& crow::SocketAdaptor::get_io_service()’:
/mnt/Storage/mts/3rd_party/crow/include/crow/socket_adaptors.h:22:28: error: ‘boost::asio::ip::tcp::socket’ {aka ‘class boost::asio::basic_stream_socket<boost::asio::ip::tcp>’} has no member named ‘get_io_service’
22 | return socket_.get_io_service();
| ^~~~~~~~~~~~~~
More info here:
rstudio/rstudio#4636
We should at least have that in the requirements.
Cheers,
Nick
The queue doesn't actually need to be that long:
https://github.com/browsermt/mts/blob/nuke/src/bergamot/service.cpp#L18
Keep in mind work is removed from the queue when a thread starts doing it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.