Giter Site home page Giter Site logo

unicode-display_width's Introduction

Unicode::DisplayWidth [version]

Determines the monospace display width of a string in Ruby. Useful for all kinds of terminal-based applications. Implementation based on EastAsianWidth.txt and other data, 100% in Ruby. It does not rely on the OS vendor (like wcwidth()) to provide an up-to-date method for measuring string width.

Unicode version: 15.1.0 (September 2023)

Supported Rubies: 3.3, 3.2, 3.1, 3.0

Old Rubies which might still work: 2.7, 2.6, 2.5, 2.4, 2.3

For even older Rubies, use version 2.3.0 of this gem: 2.3, 2.2, 2.1, 2.0, 1.9

Version 2.4.2 — Performance Updates

If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!

This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the charachter width lookup code has been optimized, so even when full-width characters are involved, the gem is much faster now.

Version 2.0 — Breaking Changes

Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:

  • Aliases of display_width (…_size, …_length) have been removed
  • Auto-loading of string core extension has been removed:

If you are relying on the String#display_width string extension to be automatically loaded (old behavior), please load it explicitly now:

require "unicode/display_width/string_ext"

You could also change your Gemfile line to achieve this:

gem "unicode-display_width", require: "unicode/display_width/string_ext"

Introduction to Character Widths

Guessing the correct space a character will consume on terminals is not easy. There is no single standard. Most implementations combine data from East Asian Width, some General Categories, and hand-picked adjustments.

How this Library Handles Widths

Further at the top means higher precedence. Please expect changes to this algorithm with every MINOR version update (the X in 1.X.0)!

Width Characters Comment
X (user defined) Overwrites any other values
-1 "\b" Backspace (total width never below 0)
0 "\0", "\x05", "\a", "\n", "\v", "\f", "\r", "\x0E", "\x0F" C0 control codes which do not change horizontal width
1 "\u{00AD}" SOFT HYPHEN
2 "\u{2E3A}" TWO-EM DASH
3 "\u{2E3B}" THREE-EM DASH
0 General Categories: Mn, Me, Cf (non-arabic) Excludes ARABIC format characters
0 "\u{1160}".."\u{11FF}", "\u{D7B0}".."\u{D7FF}" HANGUL JUNGSEONG
0 "\u{2060}".."\u{206F}", "\u{FFF0}".."\u{FFF8}", "\u{E0000}".."\u{E0FFF}" Ignorable ranges
2 East Asian Width: F, W Full-width characters
2 "\u{3400}".."\u{4DBF}", "\u{4E00}".."\u{9FFF}", "\u{F900}".."\u{FAFF}", "\u{20000}".."\u{2FFFD}", "\u{30000}".."\u{3FFFD}" Full-width ranges
1 or 2 East Asian Width: A Ambiguous characters, user defined, default: 1
1 All other codepoints -

Install

Install the gem with:

$ gem install unicode-display_width

Or add to your Gemfile:

gem 'unicode-display_width'

Usage

Classic API

require 'unicode/display_width'

Unicode::DisplayWidth.of("⚀") # => 1
Unicode::DisplayWidth.of("一") # => 2

Ambiguous Characters

The second parameter defines the value returned by characters defined as ambiguous:

Unicode::DisplayWidth.of("·", 1) # => 1
Unicode::DisplayWidth.of("·", 2) # => 2

Custom Overwrites

You can overwrite how to handle specific code points by passing a hash (or even a proc) as third parameter:

Unicode::DisplayWidth.of("a\tb", 1, "\t".ord => 10)) # => tab counted as 10, so result is 12

Please note that using overwrites disables some perfomance optimizations of this gem.

Emoji Support

Emoji width support is included, but in must be activated manually. It will adjust the string's size for modifier and zero-width joiner sequences. You also need to add the unicode-emoji gem to your Gemfile:

gem 'unicode-display_width'
gem 'unicode-emoji'

Enable the emoji string width adjustments by passing emoji: true as fourth parameter:

Unicode::DisplayWidth.of "🤾🏽‍♀️" # => 5
Unicode::DisplayWidth.of "🤾🏽‍♀️", 1, {}, emoji: true # => 2

Usage with String Extension

require 'unicode/display_width/string_ext'

"⚀".display_width # => 1
'一'.display_width # => 2

Modern API: Keyword-arguments Based Config Object

Version 2.0 introduces a keyword-argument based API, which allows you to save your configuration for later-reuse. This requires an extra line of code, but has the advantage that you'll need to define your string-width options only once:

require 'unicode/display_width'

display_width = Unicode::DisplayWidth.new(
  # ambiguous: 1,
  overwrite: { "A".ord => 100 },
  emoji: true,
)

display_width.of "⚀" # => 1
display_width.of "🤾🏽‍♀️" # => 2
display_width.of "A" # => 100

Usage From the CLI

Use this one-liner to print out display widths for strings from the command-line:

$ gem install unicode-display_width
$ ruby -r unicode/display_width -e 'puts Unicode::DisplayWidth.of $*[0]' -- "一"

Replace "一" with the actual string to measure

Other Implementations & Discussion

See unicode-x for more Unicode related micro libraries.

Copyright & Info

unicode-display_width's People

Contributors

amatsuda avatar fatkodima avatar janlelis avatar jspanjers avatar mishina2228 avatar pocke avatar rivo avatar rrosenblum avatar schwad avatar tas50 avatar viraptor avatar windwiny avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

unicode-display_width's Issues

Remove rubygems dependency.

Hi, I use this gem and rubocop, together with a portable version ruby. (traveling ruby)

this portable ruby is very simple, even it no rubygems, all see like work well, but
when i update rubocop and dependency, i have to change:

require 'rubygems/util'
require_relative 'constants'

module Unicode
  module DisplayWidth
    INDEX = Marshal.load(Gem::Util.gunzip(File.binread(INDEX_FILENAME)))
  end
end

To:

require_relative 'constants'

module Unicode
  module DisplayWidth
    INDEX = Marshal.load(File.binread(INDEX_FILENAME))
  end
end

And, unpack display_width.marshal.gz to display_width.marshal.
becuase in my portable ruby, no rubygems/util.

in my several gems (about 20+), this gem is the only gem which need rubygems,

I know rubygems is awesome, but, for portable ruby, it no use, I just need a $LOAD_PATH, it all work.

Thanks

Variation Selector-16 Support

Hello,

There is a set of Emoji characters that are displayed Narrow, such as U+23F1 (Stopwatch) which unicode-display_width correctly measures as 1.

But, when joined in sequence with U+FE0F (Variation Selector-16), they become wide. This is a bit rare, as currently it is true for only 7 of ~24 popular terminals. It took me years to fully understand what the heck was going on...

You might be interested in the Specification that I have written for the python wcwidth library, the ucs-detect used to asses terminal compliance, and the test results of more than 20 popular terminals.

I have written about all of those things in this article https://www.jeffquast.com/post/ucs-detect-test-results/

This gem breaks the babosa gem

I use rubocop together with babosa in a project. Rubocop added this gem as a dependency in 0.37.0.

When requiring babosa, the library checks whether Unicode is defined. If this is the case, it then goes on to require 'unicode' which fails my Travis CI builds and also build on the local machine.

$ gem install babosa
$ gem install unicode-display_width
$ irb
>> require 'unicode/display_width'
true
>> defined? Unicode
"constant"
>> require 'babosa'
LoadError: cannot load such file -- unicode
        from C:/Ruby/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from C:/Ruby/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from C:/Ruby/lib/ruby/gems/2.2.0/gems/babosa-1.0.2/lib/babosa/utf8/unicode_proxy.rb:1:in `<top (required)>'
        from C:/Ruby/lib/ruby/gems/2.2.0/gems/babosa-1.0.2/lib/babosa/identifier.rb:41:in `<class:Identifier>'
        ...

Bold/Italic Unicode characters incorrect width

Example: 𝗕𝗼𝗹𝗱

Javascript count this as 8 characters (just like emojis, each bold character has the length 2).

Ruby counts this word as 4 characters, causing an inconsistency with the frontend.

I just tried it with this Gem, but Unicode::DisplayWidth.of("𝗕𝗼𝗹𝗱") still returns 4.

Is this a bug or is there something I need to do in order to make it work for my use case?

Thank you

Kanji characters get error

Hi janlelis,
Great work!

Japanese use 3 types of charcters, Hiragana, Katakana, Kanji(Chinese Chars) and it works on Hiragana & Katakana.
But on Kanji, I got errors as follows;
irb(main):024:0> '一'.display_size
ArgumentError: ArgumentError
from /opt/local/lib/ruby1.9/gems/1.9.1/gems/unicode-display_width-0.1.0/lib/unicode/display_width.rb:36:in line' from /opt/local/lib/ruby1.9/gems/1.9.1/gems/unicode-display_width-0.1.0/lib/unicode/display_width.rb:42:incodepoint'
from /opt/local/lib/ruby1.9/gems/1.9.1/gems/unicode-display_width-0.1.0/lib/unicode/display_width.rb:65:in block in display_width' from /opt/local/lib/ruby1.9/gems/1.9.1/gems/unicode-display_width-0.1.0/lib/unicode/display_width.rb:64:ineach'
from /opt/local/lib/ruby1.9/gems/1.9.1/gems/unicode-display_width-0.1.0/lib/unicode/display_width.rb:64:in inject' from /opt/local/lib/ruby1.9/gems/1.9.1/gems/unicode-display_width-0.1.0/lib/unicode/display_width.rb:64:indisplay_width'
from (irb):24
from /opt/local/bin/irb:12:in `

'

'一' represent 'One' in Kanji.
irb(main):025:0> '一'.unpack('U*')
=> [19968]
irb(main):026:0> Unicode::DisplayWidth.offsets[19968]
=> nil

Other Kanji chars might get same results.

Any way, you do great work.

melborne

Ruby < 2.1.0 Compatibility

Your gemspece specifies compatibility with any Ruby >= 1.9.3, but this library is not, in fact, compatible with Ruby 1.9.3, because of this line in lib/unicode/display_width/index.rb:

require 'rubygems/util'
[...]

The file rubygems/util.rb is not included in the rubygems bundled with the 1.9.3 stdlib, but was only added somewhere between 2.0.0 and 2.1.0.

As noted in #17, this require is not actually required and the simplest solution would be to remove the rubygems dependency alltogether. Otherwise, you should increase the version constraint in your gemspec.

Thanks!

Invalid gemspec

Good day,

I have just redone my dev pc, with windows 10. and upon installing gems i received an error related to unicode-display complaining about an invalid gemspec.

error:
Invalid gemspec in [C:/Ruby193/lib/ruby/gems/1.9.1/specifications/unicode-display_width-1.0.3.gemspec]: Illformed requirement ["< 3.0.0, >= 1.9.3"]

The gem i am trying to install is rubocop that is depends on this.

Please advise.
Thank you
Nico van Niekerk

Cannot be used in a trap

This gem cannot be used in Signal trap. For example run this:

require 'unicode/display_width'

Signal.trap('USR1') do
  puts Unicode::DisplayWidth.of("⚀")
end

puts Process.pid
sleep(60)

and then execute:

kill -USR1 <process id>

This crashes with the error:

unicode-display_width/lib/unicode/display_width.rb:6:in `require_relative': can't be called from trap context (ThreadError)

I would suggest moving the require_relative to the top so that the file is loaded at the start rather than dynamically.

1.3.1 Introduces Bug

unicode-display_width-1.3.1/lib/unicode/display_width/index.rb:5:in `module:DisplayWidth': uninitialized constant Gem::Util (NameError)

circular require considered harmful

a warning occured with -w option.

irb -w
>> require 'unicode/display_width'
/Users/trsw/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/unicode-display_width-1.0.2/lib/unicode/display_width/string_ext.rb:1: warning: loading in progress, circular require considered harmful - /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/unicode-display_width-1.0.2/lib/unicode/display_width.rb
    from /Users/trsw/.rbenv/versions/2.3.0/bin/irb:11:in  `<main>'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb.rb:394:in  `start'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb.rb:394:in  `catch'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb.rb:395:in  `block in start'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb.rb:485:in  `eval_input'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb/ruby-lex.rb:231:in  `each_top_level_statement'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb/ruby-lex.rb:231:in  `catch'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb/ruby-lex.rb:232:in  `block in each_top_level_statement'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb/ruby-lex.rb:232:in  `loop'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb/ruby-lex.rb:246:in  `block (2 levels) in each_top_level_statement'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb.rb:486:in  `block in eval_input'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb.rb:623:in  `signal_status'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb.rb:489:in  `block (2 levels) in eval_input'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb/context.rb:380:in  `evaluate'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb/workspace.rb:87:in  `evaluate'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/irb/workspace.rb:87:in  `eval'
    from (irb):1:in  `irb_binding'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:40:in  `require'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:127:in  `rescue in require'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:127:in  `require'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/unicode-display_width-1.0.2/lib/unicode/display_width.rb:31:in  `<top (required)>'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/unicode-display_width-1.0.2/lib/unicode/display_width.rb:31:in  `require_relative'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/unicode-display_width-1.0.2/lib/unicode/display_width/string_ext.rb:1:in  `<top (required)>'
    from /Users/trsw/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/unicode-display_width-1.0.2/lib/unicode/display_width/string_ext.rb:1:in  `require_relative'

I think this line is needless.
lib/unicode/display_width/string_ext.rb

require_relative '../display_width'

Hangul Jamo Extended-B should be 0-width

The Hangul Jamo Extended-B block at U+D7B0..U+D7FF contains jungseong and jongseong for Old Korean, and should be treated the same as U+1160..U+11F0.
glibc's wcwidth() treats that block as 0 width since:

commit 6e540caa21616d5ec5511fafb22819204525138e
Author: Mike FABIAN <[email protected]>
Date:   Tue Jun 16 08:29:40 2020 +0200

Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]
Reviewed-by: default avatarCarlos O'Donell <[email protected]>

diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8
index 14c5d4fa33..8cce47cd97 100644
--- a/localedata/charmaps/UTF-8
+++ b/localedata/charmaps/UTF-8
@@ -48920,6 +48920,8 @@ WIDTH
 <UABE8>        0
 <UABED>        0
 <UAC00>...<UD7A3>      2
+<UD7B0>...<UD7C6>      0
+<UD7CB>...<UD7FB>      0
 <UF900>...<UFA6D>      2
 <UFA70>...<UFAD9>      2
 <UFB1E>        0

Print width of "\u{2E3A}" and "\u{2E3B}"

As far as I can see*, the graphic character width of "\u{2E3A}" and "\u{2E3B}" are 2 and 3, but the print width in the terminal is 1.

#!/usr/bin/ruby

puts "|" + "-"        * 6 + "|||";
puts "|" + "\u{2E3B}" * 6 + "|||";
puts "|" + "\u{0bf8}" * 6 + "|||";

puts "\n";
puts "|" + "\u{2E3B}"  + " " * 3 + "|||";
puts "|" + "\u{0bf8}"  + " " * 3 + "|||";

* Deepin Terminal, GNOME-Terminal, Guake Terminal, Kitty, Konsole, LXTerminal, Sakura, Terminal, Terminator, Tilda, Xfce-Terminal, XTerm, Yakuake

Parse error in v1.0.1 RubyGems gemspec

Using Bundler and v1.0.1 from RubyGems.org I get the error

Invalid gemspec in [XXX/.vendor/bundle/ruby/1.9.1/specifications/unicode-display_width-1.0.1.gemspec]: Illformed requirement ["< 3.0.0, >= 1.9.3"]

which stems from the GemSpec line

  s.required_ruby_version = Gem::Requirement.new("< 3.0.0,>= 1.9.3")

but it works if I change it to

  # Inserted extra quotation marks
  s.required_ruby_version = Gem::Requirement.new("< 3.0.0",">= 1.9.3")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.