perl-critic / ppi Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 45.0 2.4 MB

Perl 97.10% Raku 2.90%

ppi's Introduction

BUILD STATUS

CPAN Testers

NAME

Perl::Critic - Critique Perl source code for best-practices.

SYNOPSIS

use Perl::Critic;
my $file = shift;
my $critic = Perl::Critic->new();
my @violations = $critic->critique($file);
print @violations;

DESCRIPTION

Perl::Critic is an extensible framework for creating and applying coding standards to Perl source code. Essentially, it is a static source code analysis engine. Perl::Critic is distributed with a number of Perl::Critic::Policy modules that attempt to enforce various coding guidelines. Most Policy modules are based on Damian Conway's book Perl Best Practices. However, Perl::Critic is not limited to PBP and will even support Policies that contradict Conway. You can enable, disable, and customize those Polices through the Perl::Critic interface. You can also create new Policy modules that suit your own tastes.

For a command-line interface to Perl::Critic, see the documentation for perlcritic. If you want to integrate Perl::Critic with your build process, Test::Perl::Critic provides an interface that is suitable for test programs. Also, Test::Perl::Critic::Progressive is useful for gradually applying coding standards to legacy code. For the ultimate convenience (at the expense of some flexibility) see the criticism pragma.

If you'd like to try Perl::Critic without installing anything, there is a web-service available at http://perlcritic.com. The web-service does not yet support all the configuration features that are available in the native Perl::Critic API, but it should give you a good idea of what it does.

Also, ActivePerl includes a very slick graphical interface to Perl-Critic called perlcritic-gui. You can get a free community edition of ActivePerl from http://www.activestate.com.

PREREQUISITES

Perl::Critic runs on Perl back to Perl 5.10.1. It relies on the PPI module to do the heavy work of parsing Perl.

INTERFACE SUPPORT

The Perl::Critic module is considered to be a public class. Any changes to its interface will go through a deprecation cycle.

CONSTRUCTOR

new( [ -profile => $FILE, -severity => $N, -theme => $string, -include => \@PATTERNS, -exclude => \@PATTERNS, -top => $N, -only => $B, -profile-strictness => $PROFILE_STRICTNESS_{WARN|FATAL|QUIET}, -force => $B, -verbose => $N ], -color => $B, -pager => $string, -allow-unsafe => $B, -criticism-fatal => $B)
new()

Returns a reference to a new Perl::Critic object. Most arguments are just passed directly into Perl::Critic::Config, but I have described them here as well. The default value for all arguments can be defined in your .perlcriticrc file. See the "CONFIGURATION" section for more information about that. All arguments are optional key-value pairs as follows:

-profile is a path to a configuration file. If $FILE is not defined, Perl::Critic::Config attempts to find a .perlcriticrc configuration file in the current directory, and then in your home directory. Alternatively, you can set the PERLCRITIC environment variable to point to a file in another location. If a configuration file can't be found, or if $FILE is an empty string, then all Policies will be loaded with their default configuration. See "CONFIGURATION" for more information.

-severity is the minimum severity level. Only Policy modules that have a severity greater than $N will be applied. Severity values are integers ranging from 1 (least severe violations) to 5 (most severe violations). The default is 5. For a given -profile, decreasing the -severity will usually reveal more Policy violations. You can set the default value for this option in your .perlcriticrc file. Users can redefine the severity level for any Policy in their .perlcriticrc file. See "CONFIGURATION" for more information.

If it is difficult for you to remember whether severity "5" is the most or least restrictive level, then you can use one of these named values:
```
  SEVERITY NAME   ...is equivalent to...   SEVERITY NUMBER
  --------------------------------------------------------
  -severity => 'gentle'                     -severity => 5
  -severity => 'stern'                      -severity => 4
  -severity => 'harsh'                      -severity => 3
  -severity => 'cruel'                      -severity => 2
  -severity => 'brutal'                     -severity => 1
```
The names reflect how severely the code is criticized: a gentle criticism reports only the most severe violations, and so on down to a brutal criticism which reports even the most minor violations.

-theme is special expression that determines which Policies to apply based on their respective themes. For example, the following would load only Policies that have a 'bugs' AND 'pbp' theme:
```
  my $critic = Perl::Critic->new( -theme => 'bugs && pbp' );
```
Unless the -severity option is explicitly given, setting -theme silently causes the -severity to be set to 1. You can set the default value for this option in your .perlcriticrc file. See the "POLICY THEMES" section for more information about themes.

-include is a reference to a list of string @PATTERNS. Policy modules that match at least one m/$PATTERN/ixms will always be loaded, irrespective of all other settings. For example:
```
  my $critic = Perl::Critic->new(-include => ['layout'], -severity => 4);
```
This would cause Perl::Critic to apply all the CodeLayout::* Policy modules even though they have a severity level that is less than 4. You can set the default value for this option in your .perlcriticrc file. You can also use -include in conjunction with the -exclude option. Note that -exclude takes precedence over -include when a Policy matches both patterns.

-exclude is a reference to a list of string @PATTERNS. Policy modules that match at least one m/$PATTERN/ixms will not be loaded, irrespective of all other settings. For example:
```
  my $critic = Perl::Critic->new(-exclude => ['strict'], -severity => 1);
```
This would cause Perl::Critic to not apply the RequireUseStrict and ProhibitNoStrict Policy modules even though they have a severity level that is greater than 1. You can set the default value for this option in your .perlcriticrc file. You can also use -exclude in conjunction with the -include option. Note that -exclude takes precedence over -include when a Policy matches both patterns.

-single-policy is a string PATTERN. Only one policy that matches m/$PATTERN/ixms will be used. Policies that do not match will be excluded. This option has precedence over the -severity, -theme, -include, -exclude, and -only options. You can set the default value for this option in your .perlcriticrc file.

-top is the maximum number of Violations to return when ranked by their severity levels. This must be a positive integer. Violations are still returned in the order that they occur within the file. Unless the -severity option is explicitly given, setting -top silently causes the -severity to be set to 1. You can set the default value for this option in your .perlcriticrc file.

-only is a boolean value. If set to a true value, Perl::Critic will only choose from Policies that are mentioned in the user's profile. If set to a false value (which is the default), then Perl::Critic chooses from all the Policies that it finds at your site. You can set the default value for this option in your .perlcriticrc file.

-profile-strictness is an enumerated value, one of "$PROFILE_STRICTNESS_WARN" in Perl::Critic::Utils::Constants (the default), "$PROFILE_STRICTNESS_FATAL" in Perl::Critic::Utils::Constants, and "$PROFILE_STRICTNESS_QUIET" in Perl::Critic::Utils::Constants. If set to "$PROFILE_STRICTNESS_FATAL" in Perl::Critic::Utils::Constants, Perl::Critic will make certain warnings about problems found in a .perlcriticrc or file specified via the -profile option fatal. For example, Perl::Critic normally only warns about profiles referring to non-existent Policies, but this value makes this situation fatal. Correspondingly, "$PROFILE_STRICTNESS_QUIET" in Perl::Critic::Utils::Constants makes Perl::Critic shut up about these things.

-force is a boolean value that controls whether Perl::Critic observes the magical "## no critic" annotations in your code. If set to a true value, Perl::Critic will analyze all code. If set to a false value (which is the default) Perl::Critic will ignore code that is tagged with these annotations. See "BENDING THE RULES" for more information. You can set the default value for this option in your .perlcriticrc file.

-verbose can be a positive integer (from 1 to 11), or a literal format specification. See Perl::Critic::Violation for an explanation of format specifications. You can set the default value for this option in your .perlcriticrc file.

-unsafe directs Perl::Critic to allow the use of Policies that are marked as "unsafe" by the author. Such policies may compile untrusted code or do other nefarious things.

-color and -pager are not used by Perl::Critic but is provided for the benefit of perlcritic.

-criticism-fatal is not used by Perl::Critic but is provided for the benefit of criticism.

-color-severity-highest, -color-severity-high, -color-severity- medium, -color-severity-low, and -color-severity-lowest are not used by Perl::Critic, but are provided for the benefit of perlcritic. Each is set to the Term::ANSIColor color specification to be used to display violations of the corresponding severity.

-files-with-violations and -files-without-violations are not used by Perl::Critic, but are provided for the benefit of perlcritic, to cause only the relevant filenames to be displayed.

METHODS

critique( $source_code )

Runs the $source_code through the Perl::Critic engine using all the Policies that have been loaded into this engine. If $source_code is a scalar reference, then it is treated as a string of actual Perl code. If $source_code is a reference to an instance of PPI::Document, then that instance is used directly. Otherwise, it is treated as a path to a local file containing Perl code. This method returns a list of Perl::Critic::Violation objects for each violation of the loaded Policies. The list is sorted in the order that the Violations appear in the code. If there are no violations, this method returns an empty list.
add_policy( -policy => $policy_name, -params => \%param_hash )

Creates a Policy object and loads it into this Critic. If the object cannot be instantiated, it will throw a fatal exception. Otherwise, it returns a reference to this Critic.

-policy is the name of a Perl::Critic::Policy subclass module. The 'Perl::Critic::Policy' portion of the name can be omitted for brevity. This argument is required.

-params is an optional reference to a hash of Policy parameters. The contents of this hash reference will be passed into to the constructor of the Policy module. See the documentation in the relevant Policy module for a description of the arguments it supports.
policies()

Returns a list containing references to all the Policy objects that have been loaded into this engine. Objects will be in the order that they were loaded.
config()

Returns the Perl::Critic::Config object that was created for or given to this Critic.
statistics()

Returns the Perl::Critic::Statistics object that was created for this Critic. The Statistics object accumulates data for all files that are analyzed by this Critic.

FUNCTIONAL INTERFACE

For those folks who prefer to have a functional interface, The critique method can be exported on request and called as a static function. If the first argument is a hashref, its contents are used to construct a new Perl::Critic object internally. The keys of that hash should be the same as those supported by the Perl::Critic::new() method. Here are some examples:

use Perl::Critic qw(critique);

# Use default parameters...
@violations = critique( $some_file );

# Use custom parameters...
@violations = critique( {-severity => 2}, $some_file );

# As a one-liner
%> perl -MPerl::Critic=critique -e 'print critique(shift)' some_file.pm

None of the other object-methods are currently supported as static functions. Sorry.

CONFIGURATION

Most of the settings for Perl::Critic and each of the Policy modules can be controlled by a configuration file. The default configuration file is called .perlcriticrc. Perl::Critic will look for this file in the current directory first, and then in your home directory. Alternatively, you can set the PERLCRITIC environment variable to explicitly point to a different file in another location. If none of these files exist, and the -profile option is not given to the constructor, then all the modules that are found in the Perl::Critic::Policy namespace will be loaded with their default configuration.

The format of the configuration file is a series of INI-style blocks that contain key-value pairs separated by '='. Comments should start with '#' and can be placed on a separate line or after the name-value pairs if you desire.

Default settings for Perl::Critic itself can be set before the first named block. For example, putting any or all of these at the top of your configuration file will set the default value for the corresponding constructor argument.

severity  = 3                                     #Integer or named level
only      = 1                                     #Zero or One
force     = 0                                     #Zero or One
verbose   = 4                                     #Integer or format spec
top       = 50                                    #A positive integer
theme     = (pbp || security) && bugs             #A theme expression
include   = NamingConventions ClassHierarchies    #Space-delimited list
exclude   = Variables  Modules::RequirePackage    #Space-delimited list
criticism-fatal = 1                               #Zero or One
color     = 1                                     #Zero or One
allow-unsafe = 1                                  #Zero or One
pager     = less                                  #pager to pipe output to

The remainder of the configuration file is a series of blocks like this:

[Perl::Critic::Policy::Category::PolicyName]
severity = 1
set_themes = foo bar
add_themes = baz
maximum_violations_per_document = 57
arg1 = value1
arg2 = value2

Perl::Critic::Policy::Category::PolicyName is the full name of a module that implements the policy. The Policy modules distributed with Perl::Critic have been grouped into categories according to the table of contents in Damian Conway's book Perl Best Practices. For brevity, you can omit the 'Perl::Critic::Policy' part of the module name.

severity is the level of importance you wish to assign to the Policy. All Policy modules are defined with a default severity value ranging from 1 (least severe) to 5 (most severe). However, you may disagree with the default severity and choose to give it a higher or lower severity, based on your own coding philosophy. You can set the severity to an integer from 1 to 5, or use one of the equivalent names:

SEVERITY NAME ...is equivalent to... SEVERITY NUMBER
----------------------------------------------------
gentle                                             5
stern                                              4
harsh                                              3
cruel                                              2
brutal                                             1

The names reflect how severely the code is criticized: a gentle criticism reports only the most severe violations, and so on down to a brutal criticism which reports even the most minor violations.

set_themes sets the theme for the Policy and overrides its default theme. The argument is a string of one or more whitespace-delimited alphanumeric words. Themes are case-insensitive. See "POLICY THEMES" for more information.

add_themes appends to the default themes for this Policy. The argument is a string of one or more whitespace-delimited words. Themes are case- insensitive. See "POLICY THEMES" for more information.

maximum_violations_per_document limits the number of Violations the Policy will return for a given document. Some Policies have a default limit; see the documentation for the individual Policies to see whether there is one. To force a Policy to not have a limit, specify "no_limit" or the empty string for the value of this parameter.

The remaining key-value pairs are configuration parameters that will be passed into the constructor for that Policy. The constructors for most Policy objects do not support arguments, and those that do should have reasonable defaults. See the documentation on the appropriate Policy module for more details.

Instead of redefining the severity for a given Policy, you can completely disable a Policy by prepending a '-' to the name of the module in your configuration file. In this manner, the Policy will never be loaded, regardless of the -severity given to the Perl::Critic constructor.

A simple configuration might look like this:

#--------------------------------------------------------------
# I think these are really important, so always load them

[TestingAndDebugging::RequireUseStrict]
severity = 5

[TestingAndDebugging::RequireUseWarnings]
severity = 5

#--------------------------------------------------------------
# I think these are less important, so only load when asked

[Variables::ProhibitPackageVars]
severity = 2

[ControlStructures::ProhibitPostfixControls]
allow = if unless  # My custom configuration
severity = cruel   # Same as "severity = 2"

#--------------------------------------------------------------
# Give these policies a custom theme.  I can activate just
# these policies by saying `perlcritic -theme larry`

[Modules::RequireFilenameMatchesPackage]
add_themes = larry

[TestingAndDebugging::RequireTestLabels]
add_themes = larry curly moe

#--------------------------------------------------------------
# I do not agree with these at all, so never load them

[-NamingConventions::Capitalization]
[-ValuesAndExpressions::ProhibitMagicNumbers]

#--------------------------------------------------------------
# For all other Policies, I accept the default severity,
# so no additional configuration is required for them.

For additional configuration examples, see the perlcriticrc file that is included in this examples directory of this distribution.

Damian Conway's own Perl::Critic configuration is also included in this distribution as examples/perlcriticrc-conway.

THE POLICIES

A large number of Policy modules are distributed with Perl::Critic. They are described briefly in the companion document Perl::Critic::PolicySummary and in more detail in the individual modules themselves. Say "perlcritic -doc PATTERN" to see the perldoc for all Policy modules that match the regex m/PATTERN/ixms

There are a number of distributions of additional policies on CPAN. If Perl::Critic doesn't contain a policy that you want, some one may have already written it. See the "SEE ALSO" section below for a list of some of these distributions.

POLICY THEMES

Each Policy is defined with one or more "themes". Themes can be used to create arbitrary groups of Policies. They are intended to provide an alternative mechanism for selecting your preferred set of Policies. For example, you may wish disable a certain subset of Policies when analyzing test programs. Conversely, you may wish to enable only a specific subset of Policies when analyzing modules.

The Policies that ship with Perl::Critic have been broken into the following themes. This is just our attempt to provide some basic logical groupings. You are free to invent new themes that suit your needs.

THEME             DESCRIPTION
--------------------------------------------------------------------------
core              All policies that ship with Perl::Critic
pbp               Policies that come directly from "Perl Best Practices"
bugs              Policies that that prevent or reveal bugs
certrec           Policies that CERT recommends
certrule          Policies that CERT considers rules
maintenance       Policies that affect the long-term health of the code
cosmetic          Policies that only have a superficial effect
complexity        Policies that specifically relate to code complexity
security          Policies that relate to security issues
tests             Policies that are specific to test programs

Any Policy may fit into multiple themes. Say "perlcritic -list" to get a listing of all available Policies and the themes that are associated with each one. You can also change the theme for any Policy in your .perlcriticrc file. See the "CONFIGURATION" section for more information about that.

Using the -theme option, you can create an arbitrarily complex rule that determines which Policies will be loaded. Precedence is the same as regular Perl code, and you can use parentheses to enforce precedence as well. Supported operators are:

Operator    Alternative    Example
-----------------------------------------------------------------
&&          and            'pbp && core'
||          or             'pbp || (bugs && security)'
!           not            'pbp && ! (portability || complexity)'

Theme names are case-insensitive. If the -theme is set to an empty string, then it evaluates as true all Policies.

BENDING THE RULES

Perl::Critic takes a hard-line approach to your code: either you comply or you don't. In the real world, it is not always practical (nor even possible) to fully comply with coding standards. In such cases, it is wise to show that you are knowingly violating the standards and that you have a Damn Good Reason (DGR) for doing so.

To help with those situations, you can direct Perl::Critic to ignore certain lines or blocks of code by using annotations:

require 'LegacyLibaray1.pl';  ## no critic
require 'LegacyLibrary2.pl';  ## no critic

for my $element (@list) {

    ## no critic

    $foo = "";               #Violates 'ProhibitEmptyQuotes'
    $barf = bar() if $foo;   #Violates 'ProhibitPostfixControls'
    #Some more evil code...

    ## use critic

    #Some good code...
    do_something($_);
}

The "## no critic" annotations direct Perl::Critic to ignore the remaining lines of code until a "## use critic" annotation is found. If the "## no critic" annotation is on the same line as a code statement, then only that line of code is overlooked. To direct perlcritic to ignore the "## no critic" annotations, use the --force option.

A bare "## no critic" annotation disables all the active Policies. If you wish to disable only specific Policies, add a list of Policy names as arguments, just as you would for the "no strict" or "no warnings" pragmas. For example, this would disable the ProhibitEmptyQuotes and ProhibitPostfixControls policies until the end of the block or until the next "## use critic" annotation (whichever comes first):

## no critic (EmptyQuotes, PostfixControls)

# Now exempt from ValuesAndExpressions::ProhibitEmptyQuotes
$foo = "";

# Now exempt ControlStructures::ProhibitPostfixControls
$barf = bar() if $foo;

# Still subjected to ValuesAndExpression::RequireNumberSeparators
$long_int = 10000000000;

Since the Policy names are matched against the "## no critic" arguments as regular expressions, you can abbreviate the Policy names or disable an entire family of Policies in one shot like this:

## no critic (NamingConventions)

# Now exempt from NamingConventions::Capitalization
my $camelHumpVar = 'foo';

# Now exempt from NamingConventions::Capitalization
sub camelHumpSub {}

The argument list must be enclosed in parentheses or brackets and must contain one or more comma-separated barewords (e.g. don't use quotes). The "## no critic" annotations can be nested, and Policies named by an inner annotation will be disabled along with those already disabled an outer annotation.

Some Policies like Subroutines::ProhibitExcessComplexity apply to an entire block of code. In those cases, the "## no critic" annotation must appear on the line where the violation is reported. For example:

sub complicated_function {  ## no critic (ProhibitExcessComplexity)
    # Your code here...
}

Policies such as Documentation::RequirePodSections apply to the entire document, in which case violations are reported at line 1.

Use this feature wisely. "## no critic" annotations should be used in the smallest possible scope, or only on individual lines of code. And you should always be as specific as possible about which Policies you want to disable (i.e. never use a bare "## no critic"). If Perl::Critic complains about your code, try and find a compliant solution before resorting to this feature.

THE Perl::Critic PHILOSOPHY

Coding standards are deeply personal and highly subjective. The goal of Perl::Critic is to help you write code that conforms with a set of best practices. Our primary goal is not to dictate what those practices are, but rather, to implement the practices discovered by others. Ultimately, you make the rules -- Perl::Critic is merely a tool for encouraging consistency. If there is a policy that you think is important or that we have overlooked, we would be very grateful for contributions, or you can simply load your own private set of policies into Perl::Critic.

EXTENDING THE CRITIC

The modular design of Perl::Critic is intended to facilitate the addition of new Policies. You'll need to have some understanding of PPI, but most Policy modules are pretty straightforward and only require about 20 lines of code. Please see the Perl::Critic::DEVELOPER file included in this distribution for a step-by-step demonstration of how to create new Policy modules.

If you develop any new Policy modules, feel free to send them to <[email protected]> and I'll be happy to consider putting them into the Perl::Critic distribution. Or if you would like to work on the Perl::Critic project directly, you can fork our repository at https://github.com/Perl-Critic/Perl-Critic.git.

The Perl::Critic team is also available for hire. If your organization has its own coding standards, we can create custom Policies to enforce your local guidelines. Or if your code base is prone to a particular defect pattern, we can design Policies that will help you catch those costly defects before they go into production. To discuss your needs with the Perl::Critic team, just contact <[email protected]>.

PREREQUISITES

Perl::Critic requires the following modules:

CONTACTING THE DEVELOPMENT TEAM

You are encouraged to subscribe to the public mailing list at https://groups.google.com/d/forum/perl-critic. At least one member of the development team is usually hanging around in irc://irc.perl.org/#perlcritic and you can follow Perl::Critic on Twitter, at https://twitter.com/perlcritic.

BUGS

Scrutinizing Perl code is hard for humans, let alone machines. If you find any bugs, particularly false-positives or false-negatives from a Perl::Critic::Policy, please submit them at https://github.com/Perl-Critic/Perl-Critic/issues. Thanks.

CREDITS

Adam Kennedy - For creating PPI, the heart and soul of Perl::Critic.

Damian Conway - For writing Perl Best Practices, finally :)

Chris Dolan - For contributing the best features and Policy modules.

Andy Lester - Wise sage and master of all-things-testing.

Elliot Shank - The self-proclaimed quality freak.

Giuseppe Maxia - For all the great ideas and positive encouragement.

and Sharon, my wife - For putting up with my all-night code sessions.

Thanks also to the Perl Foundation for providing a grant to support Chris Dolan's project to implement twenty PBP policies. http://www.perlfoundation.org/april_1_2007_new_grant_awards

Thanks also to this incomplete laundry list of folks who have contributed to Perl::Critic in some way: Gregory Oschwald, Mike O'Regan, Tom Hukins, Omer Gazit, Evan Zacks, Paul Howarth, Sawyer X, Christian Walde, Dave Rolsky, Jakub Wilk, Roy Ivy III, Oliver Trosien, Glenn Fowler, Matt Creenan, Alex Balhatchet, Sebastian Paaske Tørholm, Stuart A Johnston, Dan Book, Steven Humphrey, James Raspass, Nick Tonkin, Harrison Katz, Douglas Sims, Mark Fowler, Alan Berndt, Neil Bowers, Sergey Romanov, Gabor Szabo, Graham Knop, Mike Eldridge, David Steinbrunner, Kirk Kimmel, Guillaume Aubert, Dave Cross, Anirvan Chatterjee, Todd Rinaldo, Graham Ollis, Karen Etheridge, Jonas Brømsø, Olaf Alders, Jim Keenan, Slaven Rezić, Szymon Nieznański.

AUTHOR

Jeffrey Ryan Thalhammer [email protected]

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The full text of this license can be found in the LICENSE file included with this module.

ppi's People

Contributors

Stargazers

Watchers

Forkers

dsteinbrunner wolfsage moregan moznion guillaumeaubert kevindawson cowens schwern mishin karenetheridge tsibley zhurs jmaslak contyk van-de-bugger chrestomanci hurricup ksurent edenhochbaum arount abeltje rurban shlomif grinnz akiym chriscapaci kentfredric cv-library s-nez kentnl-gentoo randyl book manwar evancarroll trwyant jkeenan zmughal perlservices nanto haarg clayne petdance h3xx sysfce2 happy-barney

ppi's Issues

statement of word + block doesn't recognize implicit statement end

When a statement consists of a word plus a block and is supposed to end implicitly after the block, the statement instead keeps picking up tokens until it encounters an explicit statement end. E.g.:

ppidump 'DESTROY {} sub foo {} 1; sub bar{}'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'DESTROY'
                        PPI::Structure::Block   { ... }
[    1,  12,  12 ]     PPI::Token::Word         'sub'
[    1,  16,  16 ]     PPI::Token::Word         'foo'
                        PPI::Structure::Block   { ... }
[    1,  23,  23 ]     PPI::Token::Number       '1'
[    1,  24,  24 ]     PPI::Token::Structure    ';'
                      PPI::Statement::Sub
[    1,  26,  26 ]     PPI::Token::Word         'sub'
[    1,  30,  30 ]     PPI::Token::Word         'bar'
                        PPI::Structure::Block   { ... }

The DESTROY+block statement is supposed to end when the block ends, but it it doesn't actually end until it sees the ';' after sub foo. If we change the initial statement to be something other than word+block, the DESTROY statement ends properly and foo is recognized as a sub:

ppidump 'sub DESTROY {} sub foo {} 1; sub bar{}'
                    PPI::Document
                      PPI::Statement::Sub
[    1,   1,   1 ]     PPI::Token::Word         'sub'
[    1,   5,   5 ]     PPI::Token::Word         'DESTROY'
                        PPI::Structure::Block   { ... }
                      PPI::Statement::Sub
[    1,  16,  16 ]     PPI::Token::Word         'sub'
[    1,  20,  20 ]     PPI::Token::Word         'foo'
                        PPI::Structure::Block   { ... }
                      PPI::Statement
[    1,  27,  27 ]     PPI::Token::Number       '1'
[    1,  28,  28 ]     PPI::Token::Structure    ';'
                      PPI::Statement::Sub
[    1,  30,  30 ]     PPI::Token::Word         'sub'
[    1,  34,  34 ]     PPI::Token::Word         'bar'
                        PPI::Structure::Block   { ... }

This word+block pattern occurs for the special subs AUTOLOAD and DESTROY if you omit their optional 'sub'. Those two cases will no longer be an issue when #39 is applied for #31 .

do+block doesn't have the problem because it does not end implicitly. It's followed by 'until' or ';' or an expression.

'sub {} sub foo{}' fits the word+block+implicit end pattern, but it doesn't compile.

Are there any other naturally-occurring instances of word+block+implicit statement end?

Merging method

Can you please rebase branches before merging them, or cherry-pick their commits and then close issues, instead of using the auto-merge? There is a massive multitude of reasons for this, which can be summarized as "non-ff merge results in utterly crazy and debug-hostile history": https://github.com/adamkennedy/PPI/network

Alternately just tag an issue as "good to merge" and i'll deal with it.

how is version bumping done for this dist?

@adamkennedy I scanned through Makefile and couldn't find anything that bumps the versions in the dist. Do you have some script for that?

contributor access to the repo for @moregan?

Adam, @moregan is doing excellent work in hunting down bugs and writing both tests and fixes. Especially the ability to manage the issues on this repo would be a great boon for him. Could you grant him access to the repo, on the condition that we proceed as before and only merge branches after they've been reviewed?

Many package names that are also keywords misparsed

PPI 1.215 and 1.216_01

Some package names that are also keywords don't parse as Word:

ppidump 'package x;'
                    PPI::Document
                      PPI::Statement::Package
[    1,   1,   1 ]     PPI::Token::Word         'package'
[    1,   9,   9 ]     PPI::Token::Operator     'x'
[    1,  10,  10 ]     PPI::Token::Structure    ';'

bless, return, and scalar as package names parse as Word, but they force the following curly braces to be a hash constructor instead of a block:

ppidump 'package scalar {}'
                    PPI::Document
                      PPI::Statement::Package
[    1,   1,   1 ]     PPI::Token::Word         'package'
[    1,   9,   9 ]     PPI::Token::Word         'scalar'
                        PPI::Structure::Constructor     { ... }

RT 75921: match on implicit $_ after map/grep not recogized

https://rt.cpan.org/Public/Bug/Display.html?id=75921

PPI 1.215 does not recognize a match against implicit $_ that follows map/grep if the match does not include 'm':

ppidump 'map { 0 } /z/'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'map'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,   7,   7 ]         PPI::Token::Number   '0'
[    1,  11,  11 ]     PPI::Token::Operator     '/'
[    1,  12,  12 ]     PPI::Token::Word         'z'
[    1,  13,  13 ]     PPI::Token::Operator     '/'

But with more information like 'm', it's fine:

ppidump 'map { 0 } m/z/'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'map'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,   7,   7 ]         PPI::Token::Number   '0'
[    1,  11,  11 ]     PPI::Token::Regexp::Match        'm/z/'

PPI::Statement::Variable too greedy

ppidump 'open( my $fh, ">", $filename );'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'open'
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Variable
[    1,   7,   7 ]         PPI::Token::Word     'my'
[    1,  10,  10 ]         PPI::Token::Symbol   '$fh'
[    1,  13,  13 ]         PPI::Token::Operator         ','
[    1,  15,  15 ]         PPI::Token::Quote::Double    '">"'
[    1,  18,  18 ]         PPI::Token::Operator         ','
[    1,  20,  20 ]         PPI::Token::Symbol   '$filename'
[    1,  31,  31 ]     PPI::Token::Structure    ';'

variables() on the PPI::Statement::Variable returns just '$fh', which seems right to me, but it doesn't seem right that anything following '$fh' is part of the statement.

It also doesn't seem right that initializers for declared variables become part of the PPI::Statement::Variable:

ppidump 'my $x = 1;';
                    PPI::Document
                      PPI::Statement::Variable
[    1,   1,   1 ]     PPI::Token::Word         'my'
[    1,   4,   4 ]     PPI::Token::Symbol       '$x'
[    1,   7,   7 ]     PPI::Token::Operator     '='
[    1,   9,   9 ]     PPI::Token::Number       '1'
[    1,  10,  10 ]     PPI::Token::Structure    ';'

The absence of a facility like initializers() in PPI::Statement::Variables implies (to me) that including the initializer is not by design.

In playing around with Lexer.pm I found it pretty easy to have a variable declaration without parens stop after it sees the variable:

+               if ( $Statement->isa('PPI::Statement::Variable') ) {
+                       my @schildren = $Statement->schildren();
+                       if ( @schildren > 1 and !$schildren[1]->isa('PPI::Structure::List') ) {
+                               return $self->_rollback( $Token );
+                       }
+               }

but from the results it looks like that change is too naive:

ppidump 'open( my $fh, ">", $filename );'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'open'
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Variable
[    1,   7,   7 ]         PPI::Token::Word     'my'
[    1,  10,  10 ]         PPI::Token::Symbol   '$fh'
                          PPI::Statement::Expression
[    1,  13,  13 ]         PPI::Token::Operator         ','
[    1,  15,  15 ]         PPI::Token::Quote::Double    '">"'
[    1,  18,  18 ]         PPI::Token::Operator         ','
[    1,  20,  20 ]         PPI::Token::Symbol   '$filename'
[    1,  31,  31 ]     PPI::Token::Structure    ';'

I don't know enough about the lexing/parsing to know whether it's just a case of needing a little more logic at statement end/statement begin, whether it's a fundamental problem of a variable declaration being an expression, or what.

x operator not recognized in '$a x3'

ppidump '$a x3'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Symbol       '$a'
[    1,   4,   4 ]     PPI::Token::Word         'x3'

The 'x' should be recognized as the x operator, since Perl does:

perl -WE '(1)x3'
Useless use of repeat (x) in void context at -e line 1.

request for comaint :)

Mithaldu here.

As mentioned in my Email, i'd like to help PPI index better on metacpan, as well as give it love in regards to the many bug tickets it has. As such, i would like to have comaint. :)

Can't load file written in perl string normally

Hi.

If file which target to load contains perl string; like

use utf8;
my $hash = { 東京 => 'tokyo' };

then result of PPI::Document::File->new($file) is undef, means it cannot handle perl string rightly (if I remember correctly, perl implementation allows bare word written in perl string as key of hash).

So I wrote a patch that adds perl string option to constructor.
moznion@757f382
However I think other better way probably exists.

How do you feel?

Comments still direct people to rt.cpan.org

ack rt.cpan.org
lib/PPI.pm
762:L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=PPI>

lib/PPI/Structure/List.pm
34:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/Subscript.pm
37:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/Given.pm
32:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/For.pm
32:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/When.pm
32:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/Unknown.pm
38:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/Constructor.pm
31:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/Block.pm
41:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Structure/Condition.pm
36:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/QuoteLike/Command.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/QuoteLike/Readline.pm
36:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/QuoteLike/Backtick.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Cast.pm
30:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Regexp/Transliterate.pm
35:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Regexp/Match.pm
41:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Regexp/Substitute.pm
31:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Pod.pm
25:Got any ideas for more methods? Submit a report to rt.cpan.org!

lib/PPI/Token/ArrayIndex.pm
25:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Attribute.pm
31:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Quote/Interpolate.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Quote/Literal.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Quote/Double.pm
30:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Quote/Single.pm
33:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Label.pm
27:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Token/Operator.pm
40:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Statement/Given.pm
32:Got any ideas for methods? Submit a report to rt.cpan.org!

lib/PPI/Statement/When.pm
40:Got any ideas for methods? Submit a report to rt.cpan.org!

t/08_regression.t
45:# Regression Test for rt.cpan.org #11522
132:# rt.cpan.org: Ticket #16671 $_ is not localized

inc/Module/Install/Metadata.pm
560:     https?\Q://rt.cpan.org/\E[^>]+|

php-style error handling to perl-style error handling?

A generic question:

Errors in PPI are handled by capturing them and stuffing them in ->errstr, effectively hiding them unless the user knows to look for them. Over the past years the perl community has converged on treating this as an anti-pattern. Are there any stringent reasons against migrating PPI to "die on failure" behavior?

POD below DATA not recognized.

Pod in the __DATA__ section is valid acording to the documentation. Maybe this is on purpose (how good is good enough?) but it will be correctly parsed below __END__ so it maybe is not.

use strict;
use warnings;
use feature 'say';
use PPI::Document;

=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.
=cut

my $content = '';
{
    local $/;
    open my $fh, '<', $0;
    $content = <$fh>;
    close $0;
}

my $doc = PPI::Document->new(\$content);
my $pod .= PPI::Token::Pod->merge(@{$doc->find('PPI::Token::Pod')});
say $pod;

__DATA__

=head1 More
This should also be a piece of POD. Should it?
=cut

This little test yields:

=pod

=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.

=cut

When it should yield:

=pod

=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.

=head1 More
This should also be a piece of POD. Should it?

=cut

How to deal with ambiguous parses?

Right now PPI seems to be decidedly undecided on how to deal with code that cannot be decided confidently as to its meaning. An example as follows:

sub d { 1 };
my @c = 3 .. 6;
say 1 if d ~~ @c;

use v5.10.1;             # !
sub d () { 1 };          # !
my @c = 3 .. 6;
say 1 if d ~~ @c;

The ~~ in the last line interpreted as a single operator in both cases. However in the first case it should be two operators and only in the second case should it be parsed as the smart-match operator.

In my opinion the current handling of that code is unacceptable.

I am however unsure on how it should be handled instead and am as such fishing for general opinions on how PPI should handle ambiguous parses.

rt.cpan.org tickets to close

From a pass over the PPI rt.cpan.org queue:

RT tickets to close with 1.216 from due to merging of moregan branches:

68176 and 71705 -- support all augmented assignment operators
3353672
RT 75039: don't allow '=CUT' to terminate POD
1bdce9b
RT 36540 -- support upper case in hex and binary numbers
6279fbb
RT 45014: parse '12.34..56.78' parsed as version string + '..' + float
b4d5644
RT 51693: fix pod markup containing '>'
1cafad1
RT 30863: spelling fix
c2d6b37
RT 67264: fix spelling of Tom Christiansen
dda1721

Other RT tickets to close due to changes in 1.216 and before:

RT 85049 -- Merge pull request #6 from dsteinbrunner/patch-1
7b07326
RT 69026 Patched in #3
45968ef
RT 90792 Patched in: #2
4158513
RT 45471 -- appears to not be a bug, according to the submitter's remarks
RT 36556 -- apparent fixed between Sat Jun 07 18:37:32 2008 PPI 1.215
RT 35829 -- in 1.215 there is no code at all in the perldoc output.
Submitter was seeing either something that is now gone (submitted
May 2008), or the inline tests.

RT 27364: DESTROY and AUTOLOAD don't parse as subs without 'sub'

Jeff Thalhammer points out that Perl allows you to omit 'sub' from DESTROY and AUTOLOAD:

moregan@moregan[~]$ perl -WE 'AUTOLOAD {1;}'
moregan@moregan[~]$ perl -WE 'package x; DESTROY {1;}'
moregan@moregan[~]$

but PPI doesn't recognized them as subs unless 'sub' is included:

ppidump 'AUTOLOAD {;}'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word     'AUTOLOAD'
                        PPI::Structure::Block   { ... }
                          PPI::Statement::Null
[    1,  11,  11 ]         PPI::Token::Structure    ';'

ppidump 'sub AUTOLOAD {;}'
                    PPI::Document
                      PPI::Statement::Sub
[    1,   1,   1 ]     PPI::Token::Word     'sub'
[    1,   5,   5 ]     PPI::Token::Word     'AUTOLOAD'
                        PPI::Structure::Block   { ... }
                          PPI::Statement::Null
[    1,  15,  15 ]         PPI::Token::Structure    ';'

https://rt.cpan.org/Public/Bug/Display.html?id=27364

Capture variables above $9 misparsed

Perl-Critic/Perl-Critic#455
https://rt.cpan.org/Public/Bug/Display.html?id=72980

PPI does not recognize that the numbered capture variables can go higher than $9. E.g. for:

$_ = 'xxxxxxxxxxxxxxxxxxxx';
/(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)/;
my $x = $13;
print $x;

ppidump 'my $x = $13;'
                    PPI::Document
                      PPI::Statement::Variable
[    1,   1,   1 ]     PPI::Token::Word         'my'
[    1,   4,   4 ]     PPI::Token::Symbol       '$x'
[    1,   7,   7 ]     PPI::Token::Operator     '='
[    1,   9,   9 ]     PPI::Token::Magic        '$1'
[    1,  11,  11 ]     PPI::Token::Number       '3'
[    1,  12,  12 ]     PPI::Token::Structure    ';'

dead code?

I'm currently writing tests to improve the coverage of the test suite, and am finding what looks like dead code. After about 8 hours altogether of trying to find code that will trigger this if condition i haven't found any, and it looks strongly like the handling of these circumstance has been moved to the commit function. Most notably i see this because the only way to satisfy the if condition is to have a single : recognized as a sub attribute, followed by triggering of Word->__TOKENIZER__on_char, however the execution path seems to only be able to lead into Whitespace->__TOKENIZER__on_char, which can only lead to Word->__TOKENIZER__commit.

If you know of any string that can trigger this code, please let me know so i can add the test, otherwise i regard this as dead code that can be removed.

[Feature Request] Parse/Token plugins

As said on IRC to @wchristian , if a rewrite is going to happen, something that would be "nice" to think about is having a proviso for non-standard syntax extensions.

So that perhaps, code that knows it is about to parse Devel::Declare based code, can load a plugin that knows how to spice the syntax, and pass the plugin to PPI, and PPI can emit structures, and re-serialize back, sanely and safely.

Then maybe down the road, we could work out how to write a plugin that dynamically loads other plugins on demand based on hints in the source being parsed, and cover more of the edgecases encountered by metasyntax.

Though I don't exactly have any idea of how such a plugin would look, or how such a plugin would be passed over, just a general sense of "this would be nice and useful"

better api for encoded input

Ether summarized this in IRC just now and it seems perfectly fine and i'll need to do it asap.

14-01-22@02:41:35 (@ether) Mithaldu: basically, PPI needs separate interfaces for new_from_file, new_from_handle, and new_from_content -- for the first, you need a separate encoding argument; for the second, you should force the caller to apply the right layers to the $fh in advance; for new_from_content, presume characters (decoded)

-1 parsing as number rather than operator and 1

Perl-Critic/Perl-Critic#500 is not happy with this "-1" parsing as a number rather than the operator '-' followed by the number 1:

ppidump '(1)-1'
                    PPI::Document
                      PPI::Statement
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Expression
[    1,   2,   2 ]         PPI::Token::Number   '1'
[    1,   4,   4 ]     PPI::Token::Number       '-1'

However it's a different story with '+':

ppidump '(1)+1'
                    PPI::Document
                      PPI::Statement
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Expression
[    1,   2,   2 ]         PPI::Token::Number   '1'
[    1,   4,   4 ]     PPI::Token::Operator     '+'
[    1,   5,   5 ]     PPI::Token::Number       '1'

Pod below END not parsed if DATA section is present

Pod below __END__ is not recognized if a __DATA__ section is present.
See #15 for additional info.

use strict;
use warnings;
use feature 'say';
use PPI::Document;

=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.
=cut

my $content = '';
{
    local $/;
    open my $fh, '<', $0;
    $content = <$fh>;
    close $0;
}

my $doc = PPI::Document->new(\$content);
my $pod .= PPI::Token::Pod->merge(@{$doc->find('PPI::Token::Pod')});
say $pod;

__DATA__
# some data here
__END__

=head1 More
This should also be a piece of POD. Should it?
=cut

This test script yields:

=pod

=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.

=cut

Expected output:

=pod

=head1 Test
This is a piece of POD.
=head2 Sub
This is also a piece of POD.

=head1 More
This should also be a piece of POD. Should it?

=cut

content is read as octets, not characters, with no concept of decoding

new($filename) reads the file as bytes, with no encoding layers, so any content that isn't Latin1 will cause issues. new(\$content) is read exactly the same way.

Files need to be decoded, so an encoding parameter is needed. Content strings either need to be decoded similarly, or be passed as already-decoded characters.

I would suggest new APIs that don't conflict with the existing ones, to try to preserve backcompat as much as possible.

RT 37352: $$$a parsed $$ magic plus $a

https://rt.cpan.org/Ticket/Display.html?id=37352
https://rt.cpan.org/Ticket/Display.html?id=72679

ppidump '$$$a = 3;'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Magic        '$$'
[    1,   3,   3 ]     PPI::Token::Symbol       '$a'
[    1,   6,   6 ]     PPI::Token::Operator     '='
[    1,   8,   8 ]     PPI::Token::Number       '3'
[    1,   9,   9 ]     PPI::Token::Structure    ';'

whereas:

perl -WE 'my $a=\\"foo"; print $$$a;'
foo

PPI::Token::Prototype->prototype does not strip parens/whitespace as documented

=head2 prototype

The C<prototype> accessor returns the actual prototype pattern, stripped
of braces and any whitespace inside the pattern.

=cut

sub prototype {
        my $self  = shift;
        my $proto = $self->content;
        $proto =~ s/\(\)\s//g; # Strip brackets and whitespace
        $proto;
}

The documentation says the return of prototype() has parentheses and internal whitespace, but stripping never happens due to the malformed regex, which probably intended for the parens and \s to be in a character class. As it stands, prototype() will always return the same value as content().

x sometimes parsing as operator not word

PPI 1.215 is parsing some instances of 'x' as an operator rather than a word:

ppidump '1=>x'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number       '1'
[    1,   2,   2 ]     PPI::Token::Operator     '=>'
[    1,   4,   4 ]     PPI::Token::Operator     'x'

ppidump '%hash=(1=>x)'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Symbol       '%hash'
[    1,   6,   6 ]     PPI::Token::Operator     '='
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Expression
[    1,   8,   8 ]         PPI::Token::Number   '1'
[    1,   9,   9 ]         PPI::Token::Operator         '=>'
[    1,  11,  11 ]         PPI::Token::Operator         'x'

perl (5.8.8 and 5.18.1) parses the x as a word:

perl -We "my %hash=(1=>x);"
Unquoted string "x" may clash with future reserved word at -e line 1.

Curiously, xor gets different treatment from perl:

perl -WE "my %hash=(1=>xor);"
syntax error at -e line 1, near "xor)"
Execution of -e aborted due to compilation errors.

cpan testing script

This should be written as an author test script:

A script that requires a minicpan to be available, unpacks all distributions in it, and runs all perl files in it through PPI, throwing a fail when errors are encountered, both to find general bugs and to find cases that trigger code thought to be dead (#9). It could additionally also collect statistics of file size and file parsing time in order to find cases where PPI performs badly (#5).

Should AUTOLOAD, DESTROY, et al. tokenize as PPI::Statement::Scheduled ?

In https://rt.cpan.org/Public/Bug/Display.html?id=27364 Jeffrey Thalhammer suggests that AUTOLOAD should yield a PPI::Statement::Scheduled, not a mere PPI::Statement::Sub.

DESTROY is similar to AUTOLOAD. They are both special methods called by Perl. They subs even when, as Perl allows, "sub" is omitted. They are not quite as special as BEGIN, END, etc. which are code blocks and can be repeated. threads.pm calls the CLONE method (from 5.7.3) and CLONE_SKIP method (from 5.8.7).

Is PPI::Statement::Scheduled reserved only for the five special blocks (BEGIN, UNITCHECK, CHECK, INIT, END) "intended to be run at a specific time during the loading process." as the documentation says, or should it apply to special functions Perl will call in general? If the latter, would the tie methods count? Anything else?

nonsensical code in Whitespace->TOKENIZERon_char

There is a bit there that seems to try to determine whether a character outside of the ASCII range is word or whitespace, however instead of actually looking at the current character it looks at the stringified tokenizer, which is just a perl address. I'm unclear on whether the tokenizer is supposed to stringify, or whether this was just a piece where the meaning of $t changed without the code adapting. Anything but the obvious change to chr($char) you'd like done here?

Many operators/builtins not separated from following single quote

PPI::Token::Word has code to make sure that some operators/builtins (eq ne ge le gt lt q qq qx qw qr m s tr y pack unpack) are separated from an immediately-following single quote because that’s what perl does. E.g.: “ $foo eq’bar’ “ is parsed by PPI and by perl as a symbol, the eq operator, and a single-quoted string. However, there are many words that perl separates that PPI does not, e.g.: ‘cmp’:

ppidump "\$foo cmp'bar'"
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Symbol       '$foo'
[    1,   6,   6 ]     PPI::Token::Word         'cmp'bar'
[    1,  13,  13 ]     PPI::Token::Quote::Single        '''

There are dozens of words PPI::Token::Word doesn't handle in regen/keywords.pl of the perl sources. Presumably most (all?) of them should be handled.

See Perl-Critic/Perl-Critic#451 for a real-world example.

what is a structure without braces?

Currently in the hospital, so keeping myself short.

Adding more tests I found a few places where code assumes that structure objects without braces (strictly: without ->start()) can exist. How would such an object come to be? I can't think of an initial parse that would result in such, nor does deleting the opening brace token do it.

Package names beginning with 'v' plus a digit parsed as version strings

PPI 1.215 and 1.216_01

ppidump 'package v10;'
                    PPI::Document
                      PPI::Statement::Package
[    1,   1,   1 ]     PPI::Token::Word         'package'
[    1,   9,   9 ]     PPI::Token::Number::Version      'v10'
[    1,  12,  12 ]     PPI::Token::Structure    ';'

perl -WE 'package v10; print __PACKAGE__'
v10

and

ppidump 'package v10g;'
                    PPI::Document
                      PPI::Statement::Package
[    1,   1,   1 ]     PPI::Token::Word         'package'
[    1,   9,   9 ]     PPI::Token::Number::Version      'v10'
[    1,  12,  12 ]     PPI::Token::Word         'g'
[    1,  13,  13 ]     PPI::Token::Structure    ';'

anon hashref after operator treated as code block

from Perl-Critic/Perl-Critic#192 , hash constructor parses as a code block

ppidump '0 || {b => 1, a => 1};'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number       '0'
[    1,   3,   3 ]     PPI::Token::Operator     '||'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,   7,   7 ]         PPI::Token::Word     'b'
[    1,   9,   9 ]         PPI::Token::Operator         '=>'
[    1,  12,  12 ]         PPI::Token::Number   '1'
[    1,  13,  13 ]         PPI::Token::Operator         ','
[    1,  15,  15 ]         PPI::Token::Word     'a'
[    1,  17,  17 ]         PPI::Token::Operator         '=>'
[    1,  20,  20 ]         PPI::Token::Number   '1'
[    1,  22,  22 ]     PPI::Token::Structure    ';'

At least some other operators do not parse it as a block:

ppidump '0, {b => 1, a => 1};'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number       '0'
[    1,   2,   2 ]     PPI::Token::Operator     ','
                        PPI::Structure::Constructor     { ... }
                          PPI::Statement::Expression
[    1,   5,   5 ]         PPI::Token::Word     'b'
[    1,   7,   7 ]         PPI::Token::Operator         '=>'
[    1,  10,  10 ]         PPI::Token::Number   '1'
[    1,  11,  11 ]         PPI::Token::Operator         ','
[    1,  13,  13 ]         PPI::Token::Word     'a'
[    1,  15,  15 ]         PPI::Token::Operator         '=>'
[    1,  18,  18 ]         PPI::Token::Number   '1'
[    1,  20,  20 ]     PPI::Token::Structure    ';'

most of the tests check whether parsing completed, but don't diag errors that resulted if parsing failed

extract generated test scripts?

I just realized that a whole bunch of test scripts that i need to change are generated from POD at runtime. Since having code in comments is a terrible idea, i'd like to extract them and put them into scripts permanently. Any particular opposition to this?

PPI::Statement::Sub methods need testing.

There is no testing of the PPI::Statement::Sub methods name, prototype, block, forward, and reserved. Verified by inserting a 'die;' as the first line of all these methods and getting a clean test run.

There is some incidental coverage of reserved, name, and block (merely because they're used) in https://github.com/moregan/PPI/tree/AUTOLOAD-DESTROY-without-sub

RT 36384: PPI won't parse source containing NUL

https://rt.cpan.org/Public/Bug/Display.html?id=36384

PPI 1.215/1.216_01:

perl -WE 'open( my $fh, ">", "contains_nul.pl"); print $fh "my \$a; \0 my \$b; print 1;";'

xxd -g1 contains_nul.pl
0000000: 6d 79 20 24 61 3b 20 00 20 6d 79 20 24 62 3b 20  my $a; . my $b;
0000010: 70 72 69 6e 74 20 31 3b                          print 1;

perl contains_nul.pl
1

ppidump contains_nul.pl
Could not parse code: Encountered unexpected character '0'

RT 30037: minus operator turns function name into two words

https://rt.cpan.org/Ticket/Display.html?id=30037
With PPI 1.215:

ppidump '$a=-xx::cc()'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Symbol       '$a'
[    1,   3,   3 ]     PPI::Token::Operator     '='
[    1,   4,   4 ]     PPI::Token::Word         '-xx'
[    1,   7,   7 ]     PPI::Token::Word         '::cc'
                        PPI::Structure::List    ( ... )

Without the minus you get what you'd expect:

ppidump '$a=xx::cc()'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Symbol       '$a'
[    1,   3,   3 ]     PPI::Token::Operator     '='
[    1,   4,   4 ]     PPI::Token::Word         'xx::cc'
                        PPI::Structure::List    ( ... )

See also https://rt.cpan.org/Public/Bug/Display.html?id=55749, which has a lot of analysis and sample Perl code

Misparse of &&= and ||=

Hi there-

I was smoking Perl::Critic with the latest PPI, and I came across this...

As of 1.216_01, PPI incorrectly parses the expression $foo ||= 0; as follows:

                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Symbol   '$foo'
[    1,   6,   6 ]     PPI::Token::Operator     '||'
[    1,   8,   8 ]     PPI::Token::Operator     '='
[    1,  10,  10 ]     PPI::Token::Number   '0'
[    1,  11,  11 ]     PPI::Token::Structure    ';'

Notice that || and = are parsed as two operators. Same goes for &&= but not //= or other types of assignment operators.

Caching ->isa()

Has anyone looked into caching ->isa() at the PPI::Element level?

The number of calls to ->isa() in a "make nytprof" run of Perl::Critic is staggering, and I have to think that at least some of those are redundant. Maybe it would be a win if each call to ->isa( 'whatever' ) would cache the result of that lookup.

I can go poking at this, but didn't want to waste my time if this was already considered and rejected.

logic for detection of labeled statements in Lexer::_add_element needs to be tested

Right now it checks that $Parent->schild(1) is false, but then it goes on to request and use $second->content, which seems broken. Flipping the comparison doesn't cause any tests to fail so i assume this is untested.

Tried to dig through commit history, but it ends at "cvs import" before any code changes to that segment are made.

RT 67831: implicit statement end not recognized for perl 5.12-style package

https://rt.cpan.org/Public/Bug/Display.html?id=67831

PPI 1.1.215/1.216_01 do not recognize the implicit end of statement that follows the block in a Perl 5.12 package statement:

ppidump 'package Foo {} sub bar { 1; }'
                    PPI::Document
                      PPI::Statement::Package
[    1,   1,   1 ]     PPI::Token::Word         'package'
[    1,   9,   9 ]     PPI::Token::Word         'Foo'
                        PPI::Structure::Block   { ... }
[    1,  16,  16 ]     PPI::Token::Word         'sub'
[    1,  20,  20 ]     PPI::Token::Word         'bar'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,  26,  26 ]         PPI::Token::Number   '1'
[    1,  27,  27 ]         PPI::Token::Structure        ';'

With an explicit statement terminator, it's fine:

ppidump 'package Foo {} ; sub bar { 1; }'
                    PPI::Document
                      PPI::Statement::Package
[    1,   1,   1 ]     PPI::Token::Word         'package'
[    1,   9,   9 ]     PPI::Token::Word         'Foo'
                        PPI::Structure::Block   { ... }
[    1,  16,  16 ]     PPI::Token::Structure    ';'
                      PPI::Statement::Sub
[    1,  18,  18 ]     PPI::Token::Word         'sub'
[    1,  22,  22 ]     PPI::Token::Word         'bar'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,  28,  28 ]         PPI::Token::Number   '1'
[    1,  29,  29 ]         PPI::Token::Structure        ';'

The RT ticket includes a patch.

PPI::Token::Prototype::TOKENIZERon_char uses capture var in undefined state

sub __TOKENIZER__on_char {
        my $class = shift;
        my $t     = shift;

        # Suck in until we find the closing bracket (or the end of line)
        my $line = substr( $t->{line}, $t->{line_cursor} );
        if ( $line =~ /^(.*?(?:\)|$))/ ) {
                $t->{token}->{content} .= $1;
                $t->{line_cursor} += length $1;
        }

        # Shortcut if end of line
        return 0 unless $1 =~ /\)$/;

        # Found the closing bracket
        $t->_finalize_token->__TOKENIZER__on_char( $t );
}

If $line does not match the regex, there will nevertheless be a regex match against whatever contents $1 had when this function was called.

I haven't come up with a failing test yet.

RT 41170: When token before ":" in a ternary expression is a bareword, it's misparsed as a label

Still an issue in 1.215:

ppidump  '$foo = $condition ? undef : 1;'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Symbol       '$foo'
[    1,   6,   6 ]     PPI::Token::Operator     '='
[    1,   8,   8 ]     PPI::Token::Symbol       '$condition'
[    1,  19,  19 ]     PPI::Token::Operator     '?'
[    1,  21,  21 ]     PPI::Token::Label        'undef :'
[    1,  29,  29 ]     PPI::Token::Number       '1'
[    1,  30,  30 ]     PPI::Token::Structure    ';'

RT 75038: PPI::Token::Number::Version tokens cut off at first underscore

https://rt.cpan.org/Public/Bug/Display.html?id=75038

An example from the PPI::Token::Number::Version documentation (PPI 1.215):

ppidump '10_000.10_000.10_000'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number::Float        '10_000.10_000'
[    1,  14,  14 ]     PPI::Token::Number::Float        '.10_000'

alternately:

ppidump 'v10_000.10_000.10_000'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number::Version      'v10'
[    1,   4,   4 ]     PPI::Token::Word         '_000'
[    1,   8,   8 ]     PPI::Token::Number::Float        '.10_000'
[    1,  15,  15 ]     PPI::Token::Number::Float        '.10_000'

whereas

ppidump '10000.10000.10000'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number::Version      '10000.10000.10000'

ppidump 'v10000.10000.10000'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Number::Version      'v10000.10000.10000'

size limit questions

Currently PPI has a hard-coded size limit on the files it is willing to parse. There are two questions here:

the commend on that code says that big files "blow up the Tokenizer/Lexer". What does this mean? Crash, subtle errors? Too much resource use?
I'm thinking of making the tokenizer taken an option for maximum size (in addition to an env var), but the tokenizer constructor does not yet have and code for options it can take. Is there a recommended example or method, or should i just go with what seems proper?

RT 86553: hashref in function call parses as block not constructor

https://rt.cpan.org/Public/Bug/Display.html?id=86553

ppidump 'do_something({ %options });'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'do_something'
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Compound
                            PPI::Structure::Block       { ... }
                              PPI::Statement
[    1,  16,  16 ]             PPI::Token::Symbol       '%options'
[    1,  27,  27 ]     PPI::Token::Structure    ';'

Happily, the normal Perl workaround makes it parse as expected:

ppidump 'do_something(+{ %options });'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'do_something'
                        PPI::Structure::List    ( ... )
                          PPI::Statement::Expression
[    1,  14,  14 ]         PPI::Token::Operator         '+'
                            PPI::Structure::Constructor         { ... }
                              PPI::Statement
[    1,  17,  17 ]             PPI::Token::Symbol       '%options'
[    1,  28,  28 ]     PPI::Token::Structure    ';'

Please release the current master as 1.216_01

@adamkennedy There's more to do, but we've got a sizable amount of changes that we'd like to see chewed through by the CPAN smokers. Can you please release the current master as dev version 1.216_01?

Alternately, if you feel like handing out COMAINT, i'd happily do it myself too. :)

RT 74527: sub v2 {} parsed as a version string

ppidump 'sub v2 {1;}'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'sub'
[    1,   5,   5 ]     PPI::Token::Number::Version      'v2'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,   9,   9 ]         PPI::Token::Number   '1'
[    1,  10,  10 ]         PPI::Token::Structure        ';'

even if the sub name only starts off looking like a version string:

ppidump 'sub v2go {1;}'
                    PPI::Document
                      PPI::Statement
[    1,   1,   1 ]     PPI::Token::Word         'sub'
[    1,   5,   5 ]     PPI::Token::Number::Version      'v2'
[    1,   7,   7 ]     PPI::Token::Word         'go'
                        PPI::Structure::Block   { ... }
                          PPI::Statement
[    1,  11,  11 ]         PPI::Token::Number   '1'
[    1,  12,  12 ]         PPI::Token::Structure        ';'

After this PPI sticks everything up to the next explicit statement separator into the sub's statement, a la #31

The RT ticket includes an idea of where to fix the problem: https://rt.cpan.org/Public/Bug/Display.html?id=74527

RFC: ellipsis "..." statement parses as operator. What types would be better?

Perl 5.12 introduced the ellipsis statement, "...". Currently "..." always parses as a PPI::Token::Operator. perl5120delta.pod refers to it as an operator, but perlsyn is pretty clear that it's really a statement, making the use of Operator wrong.

It's not too bad that the ellipsis becomes a child of a simple PPI::Statement, but, given the fact that it throws, would it be more appropriate to have it be a child of PPI::Statement::Break? A new statement type altogether?

The existing token types don't seem to fit the ellipsis. Should there be a PPI::Token::Ellipsis (subclass of PPI::Token)?

comments?

perl-critic / ppi Goto Github PK

ppi's Introduction

BUILD STATUS

NAME

SYNOPSIS

DESCRIPTION

PREREQUISITES

INTERFACE SUPPORT

CONSTRUCTOR

METHODS

FUNCTIONAL INTERFACE

CONFIGURATION

THE POLICIES

POLICY THEMES

BENDING THE RULES

THE Perl::Critic PHILOSOPHY

EXTENDING THE CRITIC

PREREQUISITES

CONTACTING THE DEVELOPMENT TEAM

SEE ALSO

BUGS

CREDITS

AUTHOR

COPYRIGHT

ppi's People

Contributors

Stargazers

Watchers

Forkers

ppi's Issues

Recommend Projects

Recommend Topics

Recommend Org