m4t1ss / softalignments Goto Github PK
View Code? Open in Web Editor NEWNeural macine translation soft alignment visualisations for web and command line
Home Page: http://attention.lielakeda.lv/
License: MIT License
Neural macine translation soft alignment visualisations for web and command line
Home Page: http://attention.lielakeda.lv/
License: MIT License
https://github.com/M4t1ss/SoftAlignments/blob/master/web/functions.php#L56
I think that closing parenthesis "?>" is missed.
input=a.nematus
input2=b.nematus
output_type=web
python process_alignments.py \
-i $input \
-o $output_type \
-f Nematus \
-v Nematus \
-w $input2 \
process_alignments.py -i <input_file> [-o <output_type>] [-f <from_system>] [-s <source_sentence_file>] [-t <target_sentence_file>]
input_file is the file with alignment weights (required)
source_sentence_file and target_sentence_file are required only for NeuralMonkey
output_type can be web (default), block, block2 or color
from_system can be Nematus, Marian, Sockeye, OpenNMT or NeuralMonkey (default)
Clicking on "Translation", "Confidence", "CDP", "APout", "APin" show the navigation bars. But when clicking on an item in a navigation bar shows the corresponding sentence but disable all navigation bars (they should be kept visible).
Probably add this to process_alignments.py
as an optional parameter. If set, only save full words and combined attention weights in web output files or show a combined attention matrix in the command line output
For instance, only show sentences that are between in some range of character length.
Highlight words and attention alignments on mouse hover in the web version. D3.js probably has ways to get this done
If tool accepts one single attention matrix format (should be specified in the docs), it will be easy to integrate the tool with different NMT frameworks.
NMT framework contributors will only have to save attention weights in the right format to use your tool.
This could be visualized in another table, where e.g. the confidence of the system across different documents could be compared and contrasted.
Which one is on top and which is the bottom one? Well, probably the systems are in the order that they were given to process_alignments.py
, but in the web view it's unclear...
Hi Matiss
For a very large alignments file (150 MB) I get the following error:
Fatal error: Maximum execution time of 30 seconds exceeded in /Users/mathiasmuller/Desktop/alignments/SoftAlignments/web/functions.php on line 54
Is there anything I can do? It would be nice if the tool could load bigger files on demand, e.g. for the web version.
Thanks and regards!
Mathias
processAlignments
currently fails if the source is empty and the only attention weights are 1.0
s on the </s>
(=<EOS>
) token.
$ cat test.a
0 ||| a test ||| 8.20928 ||| ||| 0 2
1.0
1.0
1.0
$ python ~/SoftAlignments/process_alignments.py -i test.a -f Nematus -o color
Traceback (most recent call last):
File "/home/user/mmueller/SoftAlignments/process_alignments.py", line 270, in <module>
main(sys.argv[1:])
File "/home/user/mmueller/SoftAlignments/process_alignments.py", line 252, in main
functions.processAlignments(data, folder, inputfile, outputType, num, refs)
File "/home/user/mmueller/SoftAlignments/functions.py", line 227, in processAlignments
ali = [l[:len(list(filter(None, tgt)))] for l in rawAli[:len(src)]]
IndexError: invalid index to scalar variable.
Is this a bug / unaccounted-for edge case, or am I doing something wrong?
Having an empty source seems like a strange requirement, but I am working with multiple sources, so that's a real use case.
Thanks!
Just displaying reference translation below the hyps is a tiny, but useful feature.
because this code support two system compare, it can adapt to support compare three or four system compare. researchers usually have multiply system to compare with same sentence.
What's wrong with these alignments? Do I need to pad the shorter translation with empty tokens to match the longer one in token count?
http://attention.lielakeda.lv/?directory=Compare%20Nematus%20-%20Neural%20Monkey%20200s%20En-%3ELv&s=32
Penalize tokens ending with @@
less for having attention aligned to multiple other tokens.
...or maybe concatenate the attention matrix to word-level?
thank you for your work . if this option is convenient for you , you can consider my suggestion.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.