It would be nice if the ANSI and screen packages supported Unicode strings, or there w

That's roughly what I was thinking, but: It shouldn't try to d

ANSI and screen don't support Unicode about pexpect HOT 22 CLOSED

pexpect commented on June 25, 2024

ANSI and screen don't support Unicode

from pexpect.

Comments (22)

takluyver commented on June 25, 2024

Hmmm. I suspect the simple changes would fail with non-ascii bytes. On some level, it would make sense to make it unicode only, because a screen does display text, not bytes, but that would break backwards compatibility. On Python 3, meanwhile, it probably already only accepts unicode, but quite possibly no-one has used it on Python 3 yet.

What about adding an encoding parameter & attribute to screen? It would then handle unicode internally, decoding any input before storing it. encoding would probably default to latin-1, so that arbitrary byte sequences can be handled. @jquast , thoughts?

from pexpect.

dcoshea commented on June 25, 2024

Hmmm. I suspect the simple changes would fail with non-ascii bytes.

With the changes suggested, I am in fact reading in bytes, passing them to a CP437 decoder, passing the resulting unicode strings to ANSI, then later calling get_region() to extract the unicode string, converting it to UTF-8, and outputting it, and stuff is coming back out the way it went in, e.g.:

Welcome to CentOS for x86_64
         ┌────────────────────────┤ Scanning ├────────────────────────┐
         │                                                            │
         │ Looking for installation images on CD device /dev/sr0      │
         │                                                            │
         └────────────────────────────────────────────────────────────┘

so I think that is sort of proof it is working, at least for the methods I updated. I realize there are other methods that probably need to be changed too.

With the encoding parameter, I assume you mean that any data passed in to ANSI would automatically be decoded, and any data returned would automatically be encoded? That seems reasonable, but what would I do if I actually wanted to pass in unicode strings I'd already decoded myself and/or get them back as unicode? It just so happens I want to do the decoding myself, because the stream's encoding changes. I'm not sure if there's an easy way to obtain some kind of null/dummy encoder/decoder?

from pexpect.

takluyver commented on June 25, 2024

I was thinking that it would decode if it got bytes, and use unicode directly if it got that, and the principle output would be unicode. str(s) on Python 2 would have to convert back to bytes, though.

from pexpect.

dcoshea commented on June 25, 2024

And we could provide __unicode__() so that you can use unicode(s) on Python 2.

This all sounds usable to me, thanks!

Should I go ahead and try to implement this myself?

from pexpect.

takluyver commented on June 25, 2024

Sure, go for it. I'm at a conference all next week, so I might not get to review it immediately, but I'll look at it soon.

from pexpect.

dcoshea commented on June 25, 2024

I was thinking that it would decode if it got bytes, and use unicode directly if it got that, and the principle output would be unicode.

On further thought, that doesn't seem symmetrical, nor backwards-compatible. Perhaps it would be more appropriate for the principal output to be bytes, and if you want unicode, you can use unicode(screen)?

I also figured that since, in my case, I don't want any encoding/decoding to occur, I would allow the codec to be specified as "None", meaning that all functions only accept unicode and only return unicode. My code to handle this looks like this:

    def __init__ (self, r=24,c=80,codec='latin-1',codec_errors='replace'):
[...]
        if codec is not None:
            self.decoder = codecs.getdecoder(codec)
            self.encoder = codecs.getencoder(codec)
            self.codec_errors = codec_errors
        else:
            self.decoder = None
            self.encoder = None
            self.codec_errors = None
[...]
    def _decode (self, s):
        '''This converts from the external coding system (as passed to
        the constructor) to the internal one (unicode). '''
        if self.decoder is not None:
            return self.decoder (s,self.codec_errors)[0]
        else:
            return unicode(s)

    def _encode (self, s):
        '''This converts from the internal coding system (unicode) to
        the external one (as passed to the constructor). '''
        if self.encoder is not None:
            return self.encoder (s,self.codec_errors)[0]
        else:
            return unicode(s)

put_abs() can call _decode(); __str__(), dump(), pretty() etc. can call _encode().

I realize this is of no use in Python 3 where str == unicode; I'll leave it for someone else to add support for bytes instances being passed in if they need it.

Does this sound reasonable?

from pexpect.

takluyver commented on June 25, 2024

That's roughly what I was thinking, but:

It shouldn't try to decode if it's already unicode.
I would probably return unicode from methods like dump() and pretty(). It's not breaking backwards compatibility much, and we're planning to do a 4.0 release anyway, so some minor breaks in backwards compatibility are OK.
It shouldn't refer to unicode, because that won't work on Python 3. I would do an isinstance(s, bytes) check to decide what to do (bytes is defined as an alias for str on Python 2).

from pexpect.

dcoshea commented on June 25, 2024

It shouldn't try to decode if it's already unicode.

I'm already making that check in the caller, although perhaps it would be better to put it into the _decode() function, I'll have a look.

I would probably return unicode from methods like dump() and pretty().

So really everything other than __str__() then, like you said originally?

When will this 4.0 release be happening?

It shouldn't refer to unicode [...]

Thanks, wasn't aware of that!

from pexpect.

takluyver commented on June 25, 2024

I would probably return unicode from methods like dump() and pretty().

So really everything other than str() then, like you said originally?

I think that makes most sense, yes. In Python 2, unicode mostly works like str anyway.

When will this 4.0 release be happening?

We don't have an exact timeframe, but the two main aims are asyncio integration, which I already have PR #69 open for, and merging Windows support (issue #17). I don't think that should take too long (famous last words).

from pexpect.

dcoshea commented on June 25, 2024

Thanks for the info.

I realize now that, unlike a solution where screen just starts accepting unicode only, if we're going to do encoding and decoding for the user, then ANSI has to be changed too, because its write() method splits the byte sequence up into individual bytes, passing them one at a time to write_ch(). Obviously for proper support of multi-byte encodings, the decoding has to be done before write() splits it up into characters.

It seems like the scope of the work is getting a bigger than I expected. If non-backward-compatible changes are okay for a major release, would it be okay for these packages to just start treating their input as unicode and not perform decoding? I would assume it would be no more difficult for users to have to decode input before passing it to these packages than it would be for them to have to encode it after calling methods like get_region().

from pexpect.

takluyver commented on June 25, 2024

I'll try to look into it. A major release means backwards incompatible changes are acceptable, but I still prefer to avoid them if it's practical.

from pexpect.

jquast commented on June 25, 2024

catching up.. I'm very familiar with these kinds of things, the tests that we have for this module are a bit weak, I'll plan to provide a cp437-encoded interface screen and work from there. New keyword parameters like encoding="latin-1" sounds good to me so far.

from pexpect.

dcoshea commented on June 25, 2024

Thanks, are you planning to do this soon, and do you plan to allow (in Python 2) unicode instances to be passed in/out?

from pexpect.

jquast commented on June 25, 2024

unicode-everywhere is definitely the intent, yes. Soon... trying my best :) I have a few things to wrap up in pexpect first, patches welcome :)

from pexpect.

dcoshea commented on June 25, 2024

patches welcome :)

I'd like to have the fairly trivial (at least as far as I thought) pull request #89 accepted, and file some other pull requests I have piled up here, before I continue with this bigger task.

from pexpect.

dcoshea commented on June 25, 2024

Thanks for taking care of pull request #89!

I assume that, for input passed to screen and ANSI, an incremental decoder should be used?

from pexpect.

jquast commented on June 25, 2024

yes, incremental decoder must be used, thanks.

from pexpect.

dcoshea commented on June 25, 2024

Thanks.

I'm making some progress and I think I've implemented things as desired in a way that works in Python 2. For Python 3, I gather that prior to 3.3 the u'' syntax for string literals was not available. I assume I need to implement a workaround for this, i.e. supporting only 2.6, 2.7 and 3.3+ would not be acceptable?

from pexpect.

takluyver commented on June 25, 2024

The next version of pexpect will only support python 2.6, 3.3 and above, so
you shouldn't need to work around the syntax for Unicode literals.

from pexpect.

dcoshea commented on June 25, 2024

Great news, thanks!

from pexpect.

dcoshea commented on June 25, 2024

Filed pull request #96 with a fix for this.

from pexpect.

jquast commented on June 25, 2024

Closing, pexpect's terminal emulation code remains next release but no longer improved, marked deprecated by #240 Suggest any terminal emulation / screen scraping code efforts moved to more concerted project efforts such as https://github.com/selectel/pyte

from pexpect.

ANSI and screen don't support Unicode about pexpect HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent