Comments (22)
Hmmm. I suspect the simple changes would fail with non-ascii bytes. On some level, it would make sense to make it unicode only, because a screen does display text, not bytes, but that would break backwards compatibility. On Python 3, meanwhile, it probably already only accepts unicode, but quite possibly no-one has used it on Python 3 yet.
What about adding an encoding parameter & attribute to screen
? It would then handle unicode internally, decoding any input before storing it. encoding
would probably default to latin-1, so that arbitrary byte sequences can be handled. @jquast , thoughts?
from pexpect.
Hmmm. I suspect the simple changes would fail with non-ascii bytes.
With the changes suggested, I am in fact reading in bytes, passing them to a CP437 decoder, passing the resulting unicode strings to ANSI, then later calling get_region() to extract the unicode string, converting it to UTF-8, and outputting it, and stuff is coming back out the way it went in, e.g.:
Welcome to CentOS for x86_64
┌────────────────────────┤ Scanning ├────────────────────────┐
│ │
│ Looking for installation images on CD device /dev/sr0 │
│ │
└────────────────────────────────────────────────────────────┘
so I think that is sort of proof it is working, at least for the methods I updated. I realize there are other methods that probably need to be changed too.
With the encoding parameter, I assume you mean that any data passed in to ANSI would automatically be decoded, and any data returned would automatically be encoded? That seems reasonable, but what would I do if I actually wanted to pass in unicode strings I'd already decoded myself and/or get them back as unicode? It just so happens I want to do the decoding myself, because the stream's encoding changes. I'm not sure if there's an easy way to obtain some kind of null/dummy encoder/decoder?
from pexpect.
I was thinking that it would decode if it got bytes, and use unicode directly if it got that, and the principle output would be unicode. str(s)
on Python 2 would have to convert back to bytes, though.
from pexpect.
And we could provide __unicode__()
so that you can use unicode(s)
on Python 2.
This all sounds usable to me, thanks!
Should I go ahead and try to implement this myself?
from pexpect.
Sure, go for it. I'm at a conference all next week, so I might not get to review it immediately, but I'll look at it soon.
from pexpect.
I was thinking that it would decode if it got bytes, and use unicode directly if it got that, and the principle output would be unicode.
On further thought, that doesn't seem symmetrical, nor backwards-compatible. Perhaps it would be more appropriate for the principal output to be bytes, and if you want unicode, you can use unicode(screen)
?
I also figured that since, in my case, I don't want any encoding/decoding to occur, I would allow the codec to be specified as "None", meaning that all functions only accept unicode and only return unicode. My code to handle this looks like this:
def __init__ (self, r=24,c=80,codec='latin-1',codec_errors='replace'):
[...]
if codec is not None:
self.decoder = codecs.getdecoder(codec)
self.encoder = codecs.getencoder(codec)
self.codec_errors = codec_errors
else:
self.decoder = None
self.encoder = None
self.codec_errors = None
[...]
def _decode (self, s):
'''This converts from the external coding system (as passed to
the constructor) to the internal one (unicode). '''
if self.decoder is not None:
return self.decoder (s,self.codec_errors)[0]
else:
return unicode(s)
def _encode (self, s):
'''This converts from the internal coding system (unicode) to
the external one (as passed to the constructor). '''
if self.encoder is not None:
return self.encoder (s,self.codec_errors)[0]
else:
return unicode(s)
put_abs()
can call _decode()
; __str__()
, dump()
, pretty()
etc. can call _encode()
.
I realize this is of no use in Python 3 where str == unicode; I'll leave it for someone else to add support for bytes
instances being passed in if they need it.
Does this sound reasonable?
from pexpect.
That's roughly what I was thinking, but:
- It shouldn't try to decode if it's already unicode.
- I would probably return unicode from methods like dump() and pretty(). It's not breaking backwards compatibility much, and we're planning to do a 4.0 release anyway, so some minor breaks in backwards compatibility are OK.
- It shouldn't refer to
unicode
, because that won't work on Python 3. I would do anisinstance(s, bytes)
check to decide what to do (bytes
is defined as an alias for str on Python 2).
from pexpect.
It shouldn't try to decode if it's already unicode.
I'm already making that check in the caller, although perhaps it would be better to put it into the _decode() function, I'll have a look.
I would probably return unicode from methods like dump() and pretty().
So really everything other than __str__()
then, like you said originally?
When will this 4.0 release be happening?
It shouldn't refer to unicode [...]
Thanks, wasn't aware of that!
from pexpect.
I would probably return unicode from methods like dump() and pretty().
So really everything other than str() then, like you said originally?
I think that makes most sense, yes. In Python 2, unicode mostly works like str anyway.
When will this 4.0 release be happening?
We don't have an exact timeframe, but the two main aims are asyncio integration, which I already have PR #69 open for, and merging Windows support (issue #17). I don't think that should take too long (famous last words).
from pexpect.
Thanks for the info.
I realize now that, unlike a solution where screen
just starts accepting unicode only, if we're going to do encoding and decoding for the user, then ANSI
has to be changed too, because its write()
method splits the byte sequence up into individual bytes, passing them one at a time to write_ch()
. Obviously for proper support of multi-byte encodings, the decoding has to be done before write()
splits it up into characters.
It seems like the scope of the work is getting a bigger than I expected. If non-backward-compatible changes are okay for a major release, would it be okay for these packages to just start treating their input as unicode and not perform decoding? I would assume it would be no more difficult for users to have to decode input before passing it to these packages than it would be for them to have to encode it after calling methods like get_region()
.
from pexpect.
I'll try to look into it. A major release means backwards incompatible changes are acceptable, but I still prefer to avoid them if it's practical.
from pexpect.
catching up.. I'm very familiar with these kinds of things, the tests that we have for this module are a bit weak, I'll plan to provide a cp437-encoded interface screen and work from there. New keyword parameters like encoding="latin-1" sounds good to me so far.
from pexpect.
Thanks, are you planning to do this soon, and do you plan to allow (in Python 2) unicode
instances to be passed in/out?
from pexpect.
unicode-everywhere is definitely the intent, yes. Soon... trying my best :) I have a few things to wrap up in pexpect first, patches welcome :)
from pexpect.
patches welcome :)
I'd like to have the fairly trivial (at least as far as I thought) pull request #89 accepted, and file some other pull requests I have piled up here, before I continue with this bigger task.
from pexpect.
Thanks for taking care of pull request #89!
I assume that, for input passed to screen and ANSI, an incremental decoder should be used?
from pexpect.
yes, incremental decoder must be used, thanks.
from pexpect.
Thanks.
I'm making some progress and I think I've implemented things as desired in a way that works in Python 2. For Python 3, I gather that prior to 3.3 the u'' syntax for string literals was not available. I assume I need to implement a workaround for this, i.e. supporting only 2.6, 2.7 and 3.3+ would not be acceptable?
from pexpect.
The next version of pexpect will only support python 2.6, 3.3 and above, so
you shouldn't need to work around the syntax for Unicode literals.
from pexpect.
Great news, thanks!
from pexpect.
Filed pull request #96 with a fix for this.
from pexpect.
Closing, pexpect's terminal emulation code remains next release but no longer improved, marked deprecated by #240 Suggest any terminal emulation / screen scraping code efforts moved to more concerted project efforts such as https://github.com/selectel/pyte
from pexpect.
Related Issues (20)
- Issue with Select module HOT 3
- Support for new release HOT 3
- Two tests hang on `cat` HOT 3
- test_large_stdout_stream timeout HOT 1
- REPLWrapTestCase.test_existing_spawn fail on illumos HOT 1
- sdist is missing requirements-testing.txt
- AttributeError: module 'asyncio' has no attribute 'coroutine' HOT 1
- Incorrect DEVELOPERS.rst
- False positive expect_exact HOT 1
- Test REPLWrapTestCase.test_pager_as_cat fails.
- An asterisk appearing out of nowhere with Clojure
- Time for a release: any reason to delay? HOT 5
- 4.9.0: git tag does not match PyPI version HOT 2
- Handling SIGTSTP possible?
- Docs not updated for 4.9?
- 4.9: pytest fails in 3 units HOT 1
- "expect()" and "await expect()" have different results on completed processes
- Pexpect does not implement enough asynchronous methods to prevent the use of time.sleep().
- add contribution documentation
- Pxssh sometime does not capture the full output of previous `sendline`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pexpect.