Comments (6)
Couldn't tell if this is a new bug, or the same bug:
Can't decode ill-formed UTF-8 octet sequence <E9> in position 5623 at /home/kent/perl5/perlbrew/perls/perl-5.19.5/lib/site_perl/5.19.5/Path/Tiny.pm line 679.
Can't decode ill-formed UTF-8 octet sequence <E9> in position 10096 at /home/kent/perl5/perlbrew/perls/perl-5.19.5/lib/site_perl/5.19.5/Path/Tiny.pm line 679.
This annoying, because its cropping up in the midst of processing several hundred files, and I have no idea which file the warning is pertaining to.
R-ing the source says its not any underlying mechanisms job to report this, because the underlying mechanisms aren't dealing with files at all, they only deal with the bytes.
So it makes sense its Path::Tiny's job to add filename context to this warning, ... though how to do that is anyones guess.
from path-tiny.
Could you please tell me more about what you're processing: i.e. how and why? I'd like to better understand some use cases before deciding on what Path::Tiny should do.
from path-tiny.
I'm just interating the contents of Dist/Zilla/Plugin/* , oddly enough, seems the problem is Test::Compile, specificially, Test::Compile has a =pod section in ISO-8859-1
from path-tiny.
Unicode::UTF8 supports fallbacks for encode_utf8()
and decode_utf8()
where you can report any warnings or throw exceptions.
Example:
diff --git a/lib/Path/Tiny.pm b/lib/Path/Tiny.pm
index c914332..32a343a 100644
--- a/lib/Path/Tiny.pm
+++ b/lib/Path/Tiny.pm
@@ -1137,7 +1137,24 @@ sub slurp_raw { $_[1] = { binmode => ":unix" }; goto &slurp }
sub slurp_utf8 {
if ( defined($HAS_UU) ? $HAS_UU : $HAS_UU = _check_UU() ) {
- return Unicode::UTF8::decode_utf8( slurp( $_[0], { binmode => ":unix" } ) );
+ my $path = $_[0]->[PATH];
+ my $fallback = sub {
+ my ($octets, $usv, $position) = @_;
+
+ my $msg;
+ if ($usv) {
+ $msg = sprintf "Can't interchange noncharacter code point U+%X in file '%s' at position %d",
+ $usv, $path, $position;
+ }
+ else {
+ $msg = sprintf "Can't decode ill-formed UTF-8 octet sequence <%s> in file '%s' at position %d",
+ join(' ', map { sprintf '%.2X', ord } split //, $octets), $path, $position;
+ }
+ Carp::carp($msg);
+ return "\x{FFFD}";
+ };
+ no warnings 'utf8';
+ return Unicode::UTF8::decode_utf8( slurp( $_[0], { binmode => ":unix" } ), $fallback);
}
else {
$_[1] = { binmode => ":raw:encoding(UTF-8)" };
Would output:
$ perl -Mlib=lib -MPath::Tiny -wle 'path("~/dev/bad")->slurp_utf8;'
Can't decode ill-formed UTF-8 octet sequence <EF BF> in file '/Users/chansen/dev/bad' at position 4 at -e line 1
from path-tiny.
PerlIO::encoding
takes its fallback behaviour from the value of $PerlIO::encoding::fallback
when the layer is applied, so you can set that to one of the Encoding::FB_*
values.
from path-tiny.
On reflection, I'm going to close this "won't fix". Users can disable warnings in various ways if as a Tiny module, I don't think it's the right move to add callback overhead handling malformed characters.
from path-tiny.
Related Issues (20)
- RFC: `with( $callback )` method HOT 1
- Tests fail on Windows 11: Failed test 'lstat' ... Error resolving realpath HOT 2
- mkdir fails if the directory exists HOT 1
- Broken tilde expansion for home directories of accounts with special characters HOT 1
- Windows 11: Failed test 'lstat' at t\filesystem.t line 420 (lstat->size returns zero) HOT 1
- Windows 11: Failed test 'relative symlinks with updir' at t/symlinks.t line 31. HOT 1
- Windows 11: Failed test 'Follow symlinks' at t/recurse.t line 139 HOT 1
- t/basic.t fails with ~root = / HOT 9
- Windows 11: Failed test "relative on absolute paths with symlinks" in t/rel-abs.t
- Feature request: $path->children('*.txt') or $path->glob('*.txt')
- A File::Temp object is created as an opened file, but the cached_temp is not open HOT 1
- Feature request: slurp_* support for chomp argument
- RFE: copy() & move() on existing Path::Tiny file object: return existing object
- Windows 11 fails test t/basic.t
- Feature Request: limit path->lines result to those matching pattern HOT 2
- Feature Request: integrate functionality of File::pushd
- Add is_temp, is_tempfile and is_tempdir predicates
- Support size option for digest to limit to first n bytes HOT 1
- Misleading error message for spew in non-existent dir HOT 10
- Adding 'dot' child
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from path-tiny.