Comments (1)
I used the weekend to dive a bit into the topic and it turns out this is a bit more complicated than ASCII support -- surprise :D
A few things I've noticed:
- Unicode is not the encoding. There are different encodings for Unicode like UTF-8 or UTF-16
- The mentioned encodings are variable length encodings which means one character in those encodings may (read will for UTF16) occupy more than one byte
- This makes the mapping of byte x to character y non trivial, unless you decode character by character
I came up with two basic approaches to tackle the issue.
- Use something like String::from_utf8_lossy, put the whole line/paragraph in it and accept the ragged margin you get as an result as well as decoding errors at the endpoints of your byte slice. Haven't really followed this idea but I don't think it promising.
- Decode character by character and here's how
- Given a slice of bytes, try parsing them from the start using str:from_utf8
- on success: cool, you've just decoded the whole slice
- on failure: the error contains information about how many bytes were parsed successfully until the error happened. We yield the successfully parsed substring along with it's offset in the slice.
- increment the offset and repeat the procedure
- Now, we got a stream of successfully parsed substrings and their byte offsets. This can be turned into a stream of characters, each corresponding to a byte in the original slice.
- a character in a substring simply becomes a character in the steam
- if a character occupies more than one byte, it will be followed by size-1 dummy characters (need good Ideas which one to use here, currently it is '•')
- b bytes outside the successfully parsed substrings get represented by a dummy character (just like now)
- a character in a substring simply becomes a character in the steam
- Given a slice of bytes, try parsing them from the start using str:from_utf8
The latter sort of works but before seriously considering it, a few problems have to be sorted out:
- Monospaced fonts work great for ASCII but 💩 looks wider in my terminal (and others may appear narrower)
- There is a lot of weird stuff in unicode, especially further control codes, that needs to be tested and handled
- How to deal with valid Unicode that cannot be displayed by your font?
- Reliably detect, if the used terminal supports Unicode
Here's a screenshot of my progress:
(On the left: my modified version of heh, note the overflow in line one! On the right: original heh.)
If I find the time I'll polish the current state a bit and push it :) See #36
from heh.
Related Issues (20)
- Implement "Move Cursor to Click" feature
- Add switch to toggle endianness
- Implement Home/End key handling
- Refactor Input Handling
- Implement "Jump to byte" Feature HOT 1
- Implement Undo
- Implement Crash Handling
- Add `--offset` option
- Improve repository readability and documentation
- Search functionality HOT 2
- Resizing causes a crash
- Add Tests
- Implement Scroll Feature
- Implement Page Up/Page Down
- Give Option to Toggle Label Data
- Refactor label.rs
- Add sample data HOT 2
- [BUG] `--version` missing HOT 1
- [BUG] error[E0583]: file not found for module `windows` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from heh.