Giter Site home page Giter Site logo

astutesource / chasten Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 8.0 1.49 MB

:dizzy: Chasten Uses XML and XPATH to Check a Python Program's AST for Specified Patterns!

Home Page: https://pypi.org/project/chasten/

License: GNU General Public License v2.0

Python 99.41% Dockerfile 0.59%
python python-ast static-analysis xml

chasten's Introduction

Chasten Logo

๐Ÿ’ซ chasten

build Coverage Language: Python Code Style: black Maintenance License LGPL v3

๐ŸŽ‰ Introduction

  • Chasten is a Python program that uses XPath expressions to find patterns in the abstract syntax tree (AST) of a Python program. You can use Chasten to quickly implement your own configurable linting rules, without having to use a complex AST analysis framework or resorting to imprecise regular expressions.

  • Do you want to ensure that a Python program has does not have any triple-nested for loops inside of async functions? Or, do you want to confirm that every function inside your Python program has type annotations and a docstring comment? Chasten can help! It allows you to express these checks โ€” and many other types of analyses as well โ€” in simple YAML files that contain XPath expressions.

๐Ÿ˜‚ Definitions

  • chasten (transitive verb) "to make someone aware of failure or of having done something wrong", Cambridge Dictionary.

    • Example Sentence: "Her remarks are a gift to me even as they chasten and redirect my efforts to expand the arguments of this book into a larger one.", Cambridge English Corpus
  • chasten (uncountable or singular noun) "a tool that analyzes the abstract syntax tree of a Python program to detect potential sources of programmer mistakes so as to prevent program failure", AstuteSource Developers.

    • Student Sentence: "I'm glad that chasten reminded me to add docstrings and type annotations to all of the functions in main.py. It was easy to see what to fix!"
    • Instructor Sentence: "chasten makes it easy for me to reliably confirm that student programs have the required coding constructs. It's much better than using regular expressions!"
    • Developer Sentence: "Since I was already familiar with XPath expressions, chasten made it fun and easy for me to do an automate analysis of a Python codebase that I maintain."
    • Researcher Sentence: "In addition to helping me quickly scan the source code of Python projects, chasten's analysis dashboard lets me effectively explore the data I collect."

๐Ÿ”‹Features

  • โœจ Easy-to-configure, automated analysis of a Python program's abstract syntax tree
  • ๐Ÿ“ƒ Flexible and easy-to-use YAML-based configuration file for describing analyses and checks
  • ๐Ÿช‚ Automated generation and verification of the YAML configuration files for an analysis
  • ๐Ÿš€ Configurable saving of analysis results in the JSON, CSV, or SQLite formats
  • ๐Ÿšง Automated integration of result files that arise from multiple runs of the tool
  • ๐ŸŒ„ Interactive results analysis through the use of a locally running datasette server
  • ๐ŸŒŽ Automated deployment of a datasette server on platforms like Fly or Vercel
  • ๐Ÿฆš Detailed console and syslog logging to furnish insights into the tool's behavior
  • ๐Ÿ’  Rich command-line interface with robust verification of arguments and options
  • ๐Ÿคฏ Interactive command-line generation through an easy-to-use terminal user interface

โšก๏ธ Requirements

  • Python 3.11
  • Chasten leverages numerous Python packages, including notable ones such as:
    • Datasette: Interactive data analysis dashboards
    • Pyastgrep: XPath-based analysis of a Python program's AST
    • Pydantic: Automated generation and validation of configuration files
    • Rich: Full-featured formatting and display of text in the terminal
    • Trogon: Automated generation of terminal user interfaces for a command-line tool
    • Typer: Easy-to-implement and fun-to-use command-line interfaces
  • The developers of Chasten use Poetry for packaging and dependency management

๐Ÿ”ฝ Installation

Follow these steps to install the chasten program:

  • Install Python 3.11 for your operating system
  • Install pipx to support program installation in isolated environments
  • Type pipx install chasten to install Chasten
  • Type pipx list and confirm that Chasten is installed
  • Type chasten --help to learn how to use the tool

๐Ÿช‚ Configuration

You can configure chasten with two YAML files, normally called config.yml and checks.yml. Although chasten can generate a starting configuration, you can check out the ๐Ÿ“ฆ AstuteSource/chasten-configuration repository for example(s) of configuration files that setup the tool. Although the config.yml file can reference multiple check configuration files, this example shows how to specify a single checks.yml file:

# chasten configuration
chasten:
  # point to a single checks file
  checks-file:
    - checks.yml

The checks.yml file must contain one or more checks. What follows is an example of a check configuration file with two checks that respectively find the first executable line of non-test and test-case functions in a Python project. Note that the pattern attribute specifies the XPath version 2.0 expression that chasten will use to detect the specified type of Python function. You can type chasten configure validate --config <path to chasten-configuration/ directory | config url> after filling in <path to chasten-configuration/directory | config url> with the fully-qualified name of your configuration directory and the tool will confirm that your configuration meets the tool's specification. You can also use the command chasten configure create command to automatically generate a starting configuration! Typing chasten configure --help will explain how to configure the tool.

checks:
  - name: "all-non-test-function-definition"
    code: "FUNC"
    id: "FUNC001"
    description: "First executable line of a non-test function, skipping over docstrings and/or comments"
    pattern: '//FunctionDef[not(contains(@name, "test_"))]/body/Expr[value/Constant]/following-sibling::*[1] | //FunctionDef[not(contains(@name, "test_"))]/body[not(Expr/value/Constant)]/*[1]'
  - name: "all-test-function-definition"
    code: "FUNC"
    id: "FUNC002"
    description: "First executable line of a test function, skipping over docstrings and/or comments"
    pattern: '//FunctionDef[starts-with(@name, "test_")]/body/Expr[value/Constant]/following-sibling::*[1] | //AsyncFunctionDef[starts-with(@name, "test_")]/body/Expr[value/Constant]/following-sibling::*[1] | //FunctionDef[starts-with(@name, "test_")]/body[not(Expr/value/Constant)]/*[1] | //AsyncFunctionDef[starts-with(@name, "test_")]/body[not(Expr/value/Constant)]/*[1]'
    count:
      min: 1
      max: 10

โœจ Analysis

Since chasten needs a project with Python source code as the input to its analysis sub-command, you can clone the ๐Ÿ“ฆ AstuteSource/lazytracker and ๐Ÿ“ฆ AstuteSource/multicounter repositories that are forks of existing Python projects created for convenient analysis. To incrementally analyze these two projects with chasten, you can type the following commands to produce a results JSON file for each project:

  • After creating a subject-data/ directory that contains a lazytracker/ directory, you can run the chasten analyze command for the lazytracker program:
chasten analyze lazytracker \
        --config <path to the chasten-configuration/ directory | config url> \
        --search-path <path to the lazytracker/ directory> \
        --save-directory <path to the subject-data/lazytracker/ directory> \
        --save
  • Now you can scan the output to confirm that, for instance, chasten finds 6 test functions in the lazytracker project. If you look in the subject-data/lazytracker directory you will find a JSON file with a name like chasten-results-lazytracker-20230823162341-4c23fc443a6b4c4aa09886f1ecb96e9f.json. Running chasten on this program more than once will produce a new results file with a different timestamp (i.e., 20230823162341) and unique identifier (i.e., 4c23fc443a6b4c4aa09886f1ecb96e9f) in its name, thus ensuring that you do not accidentally write over your prior results when using --save.

  • After creating a multicounter/ directory in the existing subject-data/ directory, you can run the chasten analyze command for the multicounter program:

chasten analyze multicounter \
        --config <path to the chasten-configuration/ directory | config url> \
        --search-path <path to the multicounter/ directory> \
        --save-directory <path to the subject-data/lazytracker/ directory> \
        --save
  • Now you can scan the output to confirm that, as an example, chasten finds 10 test functions in the multicounter project. If you look in the subject-data/lazytracker directory you will find a JSON file with a name like chasten-results-multicounter-20230821171712-5c52f2f1b61b4cce97624cc34cb39d4f.json and name components that are similar to the JSON file created for the multicounter program.

  • Since the all-test-function-definition check specifies that the program must have between 1 and 10 tests you will notice that this check passes for both lazytracker and multicounter. This means that chasten returns a 0 error code to communicate to your operating system that the check passed.

  • You can learn more about how to use the analyze sub-command by typing chasten analyze --help. For instance, chasten supports the --check-include and --check-exclude options that allow you to respectively include and exclude specific checks according to fuzzy matching rules that you can specify for any of a check's attributes specified in the checks.yml file.

๐Ÿšง Integration

After running chasten on the lazytracker and multicounter programs you can integrate their individual JSON files into a single JSON file, related CSV files, and a SQLite database. Once you have made an integrated-data/ directory, you can type this command to perform the integration:

chasten integrate all-programs \
        <path to subject-data>/**/*.json \
        --save-directory <path to the integrated-data/ directory>

This command will produce a directory like chasten-flattened-csvs-sqlite-db-all-programs-20230823171016-2061b524276b4299b04359ba30452923/ that contains a SQLite database called chasten.db and a csv/ directory with CSV files that correspond to each of the tables inside of the database.

You can learn more about the integrate sub-command by typing chasten integrate --help.

๐Ÿ’  Verbose Output

When utilizing the chasten command, appending this --verbose flag can significantly enhance your troubleshooting experience and provide a detailed understanding of the tool's functionality. Here is an example with chasten analyze lazytracker:

chasten analyze lazytracker \
        --config <path to the chasten-configuration/ directory> \
        --search-path <path to the lazytracker/ directory> \
        --save-directory <path to the subject-data/lazytracker/ directory> \
        --save
        --verbose

Upon executing this command, you can expect the output to contain informative messages such as โœจ Matching source code: indicating that the tool is actively comparing the source code against the specified patterns. Additionally, you will receive detailed match results, providing insights into the identified checks.

๐ŸŒ„ Results

If you want to create an interactive analysis dashboard that uses ๐Ÿ“ฆ simonw/datasette you can run chasten datasette-serve <path containing integrated results>/chasten.db --port 8001. Now you can use the dashboard in your web browser to analyze the results while you study the source code for these projects with your editor! Examining the results will reveal that chasten, through its use of ๐Ÿ“ฆ spookylukey/pyastgrep, correctly uses the XPath expression for all-test-function-definition to find the first line of executable source code inside of each test, skipping over a function's docstring and leading comments.

For the lazytracker program you will notice that chasten reports that there are 6 test cases even though pytest only finds and runs 5 tests. This is due to the fact that tests/test_tracked.py test suite in lazytracker contains a function starting with test_ inside of another function starting with test_. This example illustrates the limitations of static analysis with chasten! Even though the tool correctly detected all of the "test functions", the nesting of the functions in the test suite means that pytest will run the outer test_ function and use the inner test_ function for testing purposes.

With that said, chasten correctly finds each of the tests for the multicounter project. You can follow each of the previous steps in this document to apply chasten to your own Python program!

๐ŸŒŽ Deployment

If you want to make your chasten.db publicly available for everyone to study, you can use the chasten datasette-publish sub-command. As long as you have followed the installation instructions for ๐Ÿ“ฆ simonw/datasette-publish-fly and ๐Ÿ“ฆ simonw/datasette-publish-vercel, you can use the plugins to deploy a public datasette server that hosts your chasten.db. For instance, running the command chasten datasette-publish <path containing integrated results>/chasten.db --platform vercel will publish the results from running chasten on lazytracker and multicounter to the Vercel platform.

Importantly, the use of the chasten datasette-publish command with the --platform vercel option requires you to have previously followed the instructions for the datasette-publish-vercel plugin to install the vercel command-line tool. This is necessary because, although datasette-publish-vercel is one of chasten's dependencies neither chasten nor datasette-publish-vercel provide the vercel tool even though they use it. You must take similar steps before publishing your database to Fly!

๐Ÿคฏ Interaction

Even though chasten is a command-line application, you create interactively create the tool's command-line arguments and options through a terminal user interface (TUI). To use TUI-based way to create a complete command-line for chasten you can type the command chasten interact.

๐Ÿ“ŠLog

Chasten has a built-in system log. While using chasten you can use the command chasten log in your terminal. The system log feature allows the user to see events and messages that are produced by chasten. In addition, the chasten log feature will assist in finding bugs and the events that led to the bug happening. For the chasten program to display to the system log you will have to open a separate terminal and use the command chasten log. In addition for each command that is run the --debug-level <choice of level> and --debug-dest SYSLOG will need to be added.

For example, chasten datasette-serve --debug-level DEBUG --debug-dest SYSLOG < database path to file> will produce the following output in the system log.

๐Ÿ’ซ chasten: Analyze the AST of Python Source Code
๐Ÿ”— GitHub: https://github.com/gkapfham/chasten
โœจ Syslog server for receiving debugging information

Display verbose output? False
Debug level? DEBUG
Debug destination? SYSLOG

In each command in chasten, there is an option to add a --debug-level. The debug level has 5 options debug, info, warning, error, and critical. Each level will show different issues in the system log where debug is the lowest level of issue from the input where critical is the highest level of error. To leverage more info on this you can reference debug.py file:

class DebugLevel(str, Enum):
    """The predefined levels for debugging."""

    DEBUG = "DEBUG"
    INFO = "INFO"
    WARNING = "WARNING"
    ERROR = "ERROR"
    CRITICAL = "CRITICAL"

โœจ chasten --help

 Usage: chasten [OPTIONS] COMMAND [ARGS]...                                                    
                                                                                               
โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ --install-completion          Install completion for the current shell.                     โ”‚
โ”‚ --show-completion             Show completion for the current shell, to copy it or          โ”‚
โ”‚                               customize the installation.                                   โ”‚
โ”‚ --help                        Show this message and exit.                                   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
โ•ญโ”€ Commands โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ analyze                      ๐Ÿ’ซ Analyze the AST of Python source code.                      โ”‚
โ”‚ configure                    ๐Ÿช‚ Manage chasten's configuration.                             โ”‚
โ”‚ datasette-publish            ๐ŸŒŽ Publish a datasette to Fly or Vercel.                       โ”‚
โ”‚ datasette-serve              ๐Ÿƒ Start a local datasette server.                             โ”‚
โ”‚ integrate                    ๐Ÿšง Integrate files and make a database.                        โ”‚
โ”‚ interact                     ๐Ÿš€ Interactively configure and run.                            โ”‚
โ”‚ log                          ๐Ÿฆš Start the logging server.                                   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

๐Ÿง‘โ€๐Ÿ’ป Development Enviroment

๐Ÿ  Local

Follow these steps to install the chasten tool for future development:

  • The development and use of Chasten requires Python 3.11, must be greater or equal to version 3.11.5.
  • The developers of Chasten use Poetry for packaging and dependency management.

Once Python and Poetry is installed, please go to the Chasten repository on github and install the tool using the git clone command in your terminal. Then navigate to the Chasten directory and run the command poetry install to install all the dependencies.

๐Ÿ‹ Docker

There is also the option to use Docker to use chasten

Follow these steps to utilize Docker:

  • Install Docker Desktop for your operating system
  • Ensure Docker Desktop is running
  • cd into the chasten directory where the Dockerfile is located
  • Type docker build -t chasten . to build the container
  • Type one of the following commands to run the container:
    • Windows (Command Prompt) -> docker run --rm -v "%cd%":/root/src -it chasten
    • Windows (Powershell) -> docker run --rm -v ${pwd}:/root/src -it chasten
    • Mac/Ubuntu -> docker run --rm -v $(pwd):/root/src -it chasten
  • Inside the container type poetry install
  • Outside of the container type docker ps to view running container information
  • Outside of the container type docker commit <your-container-id> <your-image-name> to save the dependecy installation
  • Now you can use Docker for all of your chasten needs!

๐Ÿ“‹ Development Tasks

  • Linting and Formatting
    • We use the linting tools Black and Ruff on Chasten to ensure code consistency, readability, and adherence to predefined formatting standards across the entire project, ultimately enhancing maintainability and collaboration among developers.
    • Please ensure all content in the project follow the appropriate format by running the following commands: poetry run task fiximports and/or poetry run task fixformat before shipping new features. If features are shipped with linting issues, the build will break on github due to the failure of the test suite.
  • Testing and Coverage
    • Chasten uses the testing tools Pytest and Hypothesis which enables us to fortify code consistency, readability, and alignment with established formatting standards throughout the project. When writing test cases for features, create a new file in the tests directory with the naming convention test_(name of file).
    • Please ensure all content in the project passes the tests by running the following commands: poetry run task test for most cases or if you would like to test the OpenAI API based features poetry run task test-api before shipping. If features are shipped without a test suite, the coverage will be lowered on github due to the addition of untested code and may potenitally lead to larger issues in the future.

๐Ÿค— Learning

  • Curious about the nodes that are available in a Python program's AST?
  • Want to learn more about how to write XPath expressions for a Python AST?
    • Pyastgrep offers examples of XPath expressions for querying a Python program's AST
    • XPath Documentation describes how to write XPath expressions
    • XPath Axes summaries the ways that XPath axes relate a note to other nodes
  • Interested in exploring other approaches to querying source code?
    • srcML supports XPath-based querying of programs implemented in C, C#, C++, and Java
    • Treesitter provides a general-purpose approach to modelling and querying source code
    • Python Treesitter offers a Python language bindings for to parsing and querying with Treesitter

๐Ÿค“ Chasten vs. Symbex

Chasten and Symbex, which was created by Simon Willison, are both tools designed for analyzing Python source code, particularly focusing on searching for functions and classes within files. While they share a common goal, there are notable differences between the two, especially in terms of their command-line interfaces and functionality.

In terms of Command-Line Interface, Symbex employs a concise CLI, utilizing abbreviations for various options. For instance, the command to search for function signatures in a file named test_debug.py is as follows:

command :symbex -s -f symbex/test_debug.py
    def test_debug_level_values():
    def test_debug_level_isinstance():
    def test_debug_level_iteration():
    def test_debug_destination_values():
    def test_debug_destination_isinstance():
    def test_debug_destination_iteration():
    def test_level_destination_invalid():
    def test_debug_destination_invalid():

Chasten, on the other hand, leverages Python packages such as Typer and Rich to provide a user-friendly and feature-rich command-line interface. The available commands for Chasten include:

  • analyze ๐Ÿ’ซ Analyze the AST of Python source code
  • configure ๐Ÿช‚ Manage chasten's configuration
  • datasette-publish ๐ŸŒŽ Publish a datasette to Fly or Vercel
  • datasette-serve ๐Ÿƒ Start a local datasette server
  • integrate ๐Ÿšง Integrate files and make a database
  • interact ๐Ÿš€ Interactively configure and run
  • log ๐Ÿฆš Start the logging server.

In terms of functionality, Symbex is designed to search Python code for functions and classes by name or wildcard. It provides the ability to filter results based on various criteria, including function type (async or non-async), documentation presence, visibility, and type annotations.

On the other hand, Chasten's analyze command performs AST analysis on Python source code. It allows users to specify a project name, XPATH version, search path, and various filtering criteria. Chasten supports checks for inclusion and exclusion based on attributes, values, and match confidence levels. The tool also provides extensive configuration options and the ability to save results in different formats, including markdown.

In summary, while both Chasten and Symbex serve the common purpose of analyzing Python source code, Chasten offers a more versatile and user-friendly CLI with additional features of configuration and result management. Symbex, on the other hand, adopts a concise CLI with a focus on searching and filtering functionalities. The choice between the two tools depends on the user's preferences and specific requirements for Python code analysis.

๐Ÿ“ฆ Similar Tools

In addition to Chasten and Symbex, several other tools offer unique capabilities for analyzing and searching through Python source code, each catering to specific use cases.

  • pyastgrep is a tool developed by Luke Plant that provides advanced capabilities for viewing and searching AST using XPath expressions. It allows users to define complex patterns and queries to navigate and extract information from Python code, making it a powerful tool for in-depth code analysis.
  • treesitter offers a generic and efficient approach to parsing source code and building AST. It supports multiple languages, providing a consistent API for interacting with parsed code across different language ecosystems.

๐Ÿง—Improvement

chasten's People

Contributors

aidanneeson avatar alishc1 avatar alishchhetri avatar boulais01 avatar calebkendra avatar finley8 avatar gkapfham avatar hankgref avatar hayleepierce avatar jaclynpqc avatar kellerliptrap avatar kevenduverglas avatar laurennevill avatar milesf25 avatar poiuy7312 avatar rodriguez03 avatar simojo avatar vitaljoseph avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

chasten's Issues

Create a Markdown Report (Or Some Other "Browse-able" Report) for an Analysis

While chasten analyze provides a significant amount of output, especially when it is run in verbose mode, it does not have a way to save an analysis report in any format except for JSON. It would be nice if the tool could also produce a "browse-able" report in Markdown format or, alternatively, offer a terminal user-interface (TUI) mode that supports the browsing of results. Ultimately, the tool needs some human-friendly way to browse the results! With that said, it is important to note that there is one way to browse results right now --- and that is through the Datasette interface that you can run after using the chasten integrate and chasten datasette-serve or chasten datasette-publish commands. However, these seem somewhat "heavyweight" and I anticipate that it will still be great if there was a way to browse one or more result files in a terminal window either by using rich-cli on a Markdown file or through a bespoke TUI.

Document the Use of Console Logging Options through the Tool's Command-Line

The current version of the README.md file does not explain the various ways in which it is possible to configure either console-based on syslog-based logging. Here is an example of a command-line that turns on and configures logging:

chasten analyze diagrams --config /home/gkapfham/working/source/astute-subjects/chasten-configuration --search-path /home/gkapfham/working/source/astute-subjects/subject-forks/diagrams/ --save-directory /home/gkapfham/working/source/astute-subjects/subject-data/diagrams --debug-level ERROR --debug-dest SYSLOG --save

The README.md file needs more details about logging and how it works. It should also explain the various logging levels and give examples of the types of logging information that are displayed.

Use `Path` infix operator `/` in unit tests

Describe the bug

After a simple regex search, the I discovered the following tests assume a POSIX operating system in file paths are written:

test_constants.py
47:        fs.Current_Directory = "/new/path"

test_main.py
94:    configuration_directory = test_one + "/.chasten"
125:    configuration_directory = str(cwd) + "/.chasten"
147:    configuration_directory = str(cwd) + "/.chasten"
300:    configuration_directory = str(cwd) + "/.chasten"

test_filesystem.py
15:    directory_str = "./tests/"
23:    directory_str = "./testsNOT/"
31:    file_str = "./tests/test_filesystem.py"
39:    file_str = "./tests/test_filesystemNOT.py"

This should be fixed to merge paths using Pathlib. Ultimately this could hinder teammate's progress with developing on a windows machine, for example.

Example fix

test_constants.py
+ from pathlib import Path
...
- 47:        fs.Current_Directory = "/new/path"
+ 47:        fs.Current_Directory = str(Path("/new") / Path("path"))

Create a Wiki or an External Web Site that Provides Detailed Tool Use Documentation

While the current README.md file for chasten provides a basic overview of the commands
and how they work, it does not include a significant number of examples. This means that it is important to create either a wiki inside of this repository or an external web site, ideally using Material for Mkdocs or Quarto, that will provide additional documentation for the tool!

Installing Chasten for MacOS

We could write in the issues that here are several ways to install chasten. Anyone is welcome to contribute more ways

pip failed to build package:
    pysqlite3

Some possibly relevant errors from pip install:
    error: subprocess-exited-with-error
    src/connection.c:1385:10: error: implicit declaration of function 'sqlite3_enable_load_extension' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
    src/connection.c:1409:10: error: implicit declaration of function 'sqlite3_load_extension' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
    error: command '/usr/bin/clang' failed with exit code 1

or another error looks like:

failed to building wheels for pysqlite3

Chasten May Undercount or Overcount Specific Patterns, Like the Number of Test Cases

There are certain situations in which chasten over-count the presence of
specific source code patterns. For instance, when the tool uses the following
XPath expression to count the number of test cases:

checks:
  - name: "all-non-test-function-definition"
    code: "FUNC"
    id: "FUNC001"
    description: "First executable line of a non-test function, skipping over docstrings and/or comments"
    pattern: '//FunctionDef[not(contains(@name, "test_"))]/body/Expr[value/Constant]/following-sibling::*[1] | //FunctionDef[not(contains(@name, "test_"))]/body[not(Expr/value/Constant)]/*[1]'
  - name: "all-test-function-definition"
    code: "FUNC"
    id: "FUNC002"
    description: "First executable line of a test function, skipping over docstrings and/or comments"
    pattern: '//FunctionDef[starts-with(@name, "test_")]/body/Expr[value/Constant]/following-sibling::*[1] | //AsyncFunctionDef[starts-with(@name, "test_")]/body/Expr[value/Constant]/following-sibling::*[1] | //FunctionDef[starts-with(@name, "test_")]/body[not(Expr/value/Constant)]/*[1] | //AsyncFunctionDef[starts-with(@name, "test_")]/body[not(Expr/value/Constant)]/*[1]'
    count:
      min: 1
      max: 10

And a program has source code that looks like this:

def test_cached():
    with TemporaryDirectory() as cache_dir:
        updated = False

        @cached(
            cache_dir=cache_dir,
            input_dirs=["input_dir"],
            output_dirs=["output_dir"]
        )
        def test_function(input_dir: str, output_dir: str, parameter: int):
            nonlocal updated

            with open(f"{output_dir}/test.txt", 'w') as f:
                f.write(str(parameter))

            updated = True
            return parameter

        with TemporaryDirectory() as input_dir:
            with open(f"{input_dir}/test.txt", 'w') as f:
                f.write("test_file")

            with TemporaryDirectory() as output_dir:
                assert test_function(
                    input_dir=input_dir, 
                    output_dir=output_dir, 
                    parameter=3
                ) == 3
                assert updated == True

                # Don't change antything
                updated = False
                assert test_function(
                    input_dir=input_dir, 
                    output_dir=output_dir, 
                    parameter=3
                ) == 3
                assert updated == False

                # Change parameter
                assert test_function(
                    input_dir=input_dir, 
                    output_dir=output_dir, 
                    parameter=5
                ) == 5
                assert updated == True

                # Change input dependency
                updated = False
                with open(f"{input_dir}/test.txt", 'w') as f:
                    f.write("changed_test_file")

                assert test_function(
                    input_dir=input_dir, 
                    output_dir=output_dir, 
                    parameter=5
                ) == 5
                assert updated == True

                # Corrupt output
                with open(f"{output_dir}/test.txt", 'w') as f:
                    f.write("corrupted output")

                assert test_function(
                    input_dir=input_dir, 
                    output_dir=output_dir, 
                    parameter=5
                ) == 5
                assert updated == True

it will return a count of 2 test cases even though, from the perspective of
pytest there is only one executable test case! Are there any intelligent ways
in which we can improve the XPath expressions or detect when the tool is likely
to over-count or under-count the presence of specific types of AST nodes?

Integrate The Tool into the Workflow for Atleast 5 Complete Undergraduate Projects

Once there is evidence that chasten tool works correctly on at least 20 small
projects (see the 12 forked repositories in the AstuteSource organization), then
there should be a major push to apply chasten to at least 5 course projects
created during the Fall 2023 semester in the Department of Computer and
Information Science at Allegheny College and to scan those projects for the most
relevant patterns of the patterns that are ultimately implemented as part of the
completion of Issue #6.

There should be convincing evidence that chasten works on the solution and
starter repositories that faculty members and students create as part of a
Computer Science, Data Science, Software Engineering, or Informatics course at
Allegheny College.

Error while running chasten after fresh install

simon@nixos /_scratch/chasten impure 127 chasten
Traceback (most recent call last):
  File "/home/simon/.cache/pypoetry/virtualenvs/chasten-LtXvhvXr-py3.11/bin/chasten", line 3, in <module>
    from chasten.main import cli
  File "/_scratch/chasten/chasten/main.py", line 13, in <module>
    from chasten import (
  File "/_scratch/chasten/chasten/database.py", line 9, in <module>
    from chasten import constants, enumerations, filesystem, output
  File "/_scratch/chasten/chasten/filesystem.py", line 10, in <module>
    import flatterer  # type: ignore
    ^^^^^^^^^^^^^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/chasten-LtXvhvXr-py3.11/lib/python3.11/site-packages/flatterer/__init__.py", line 8, in <module>
    import pandas
  File "/home/simon/.cache/pypoetry/virtualenvs/chasten-LtXvhvXr-py3.11/lib/python3.11/site-packages/pandas/__init__.py", line 16, in <module>
    raise ImportError(
ImportError: Unable to import required dependencies:
numpy:

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  * The Python version is: Python3.11 from "/home/simon/.cache/pypoetry/virtualenvs/chasten-LtXvhvXr-py3.11/bin/python"
  * The NumPy version is: "1.25.2"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: libstdc++.so.6: cannot open shared object file: No such file or directory

I'm putting this here per @gkapfham. I will try telling ld about the location of the shared object file and get back to this issue.

Issue running the tests (pytest)

I ran into this issue while trying to remove the pysqlite3 dependency. After removing the dependency, I used the pytest -x -s -vv -n auto command to run the tests. Although, it gave me the following error: 'pytest' is not recognized as an internal or external command, operable program or batch file. I have ran poetry install, poetry lock, and poetry update to confirm that pytest was installed as a dependency.

Display Counts of Processed Data When Running the Integrate Sub-Command

Here is a sample of output from using the integrate sub-command provided by chasten:

โ€ข Directory: /home/gkapfham/working/source/astute-subjects/subject-data/aeval
  โ€ข File: 'chasten-results-aeval-20230821160207-03304d9d2ff540e98783fc185f4ea688.json'
โ€ข Directory: /home/gkapfham/working/source/astute-subjects/subject-data/chasten
  โ€ข File: 'chasten-results-chasten-20230818150121-3b45c34cbccd4baa8e9307d76c060da3.json'
โ€ข Directory: /home/gkapfham/working/source/astute-subjects/subject-data/dev4py-utils
  โ€ข File: 'chasten-results-dev4py-utils-20230821162716-8384a1c5cb1b41f1a67cf241b4e46ba2.json'
โ€ข Directory: /home/gkapfham/working/source/astute-subjects/subject-data/diagrams
  โ€ข File: 'chasten-results-diagrams-20230821164959-06bbe3cc64854876878fb1c2160bfe52.json'
โ€ข Directory: /home/gkapfham/working/source/astute-subjects/subject-data/lazytracker
  โ€ข File: 'chasten-results-lazytracker-20230821170554-7d41a62951404ef2b671fe0fca403a6c.json'
  โ€ข File: 'chasten-results-lazytracker-20230823161838-def36ce988884462bed9d770e2f5ef9d.json'
  โ€ข File: 'chasten-results-lazytracker-20230823162333-40f79665c10f4ebc91d538ef46604146.json'
  โ€ข File: 'chasten-results-lazytracker-20230823162341-4c23fc443a6b4c4aa09886f1ecb96e9f.json'
โ€ข Directory: /home/gkapfham/working/source/astute-subjects/subject-data/multicounter
  โ€ข File: 'chasten-results-multicounter-20230821171712-5c52f2f1b61b4cce97624cc34cb39d4f.json'
โ€ข Directory: /home/gkapfham/working/source/astute-subjects/subject-data/poethepoet
  โ€ข File: 'chasten-results-poethepoet-20230821174504-9d72b90791894194a8beadd41f7e9e8f.json'
โ€ข Directory: /home/gkapfham/working/source/astute-subjects/subject-data/poetry-version-plugin
  โ€ข File: 'chasten-results-poetry-version-plugin-20230821154554-d0160be5a09e434587df0cec7be3f87c.json'
โ€ข Directory: /home/gkapfham/working/source/astute-subjects/subject-data/rich
  โ€ข File: 'chasten-results-rich-20230818153100-64ba2a2f8453444ea5c528f2a11593e4.json'
โ€ข Directory: /home/gkapfham/working/source/astute-subjects/subject-data/sqlmodel
  โ€ข File: 'chasten-results-sqlmodel-20230821153426-8548d6b28f344b1aa7923e1e95ff3972.json'
โ€ข Directory: /home/gkapfham/working/source/astute-subjects/subject-data/textual-dev
  โ€ข File: 'chasten-results-textual-dev-20230817202006-658eb0594fb7427386e15d3bda29e110.json'
โ€ข Directory: /home/gkapfham/working/source/astute-subjects/subject-data/trogon
  โ€ข File: 'chasten-results-trogon-20230817194010-172db77a99304754bc4f5a93629b1a52.json'

The output of this command should be enhanced so that it summarizes the number
of files found for each individual directory and then across all directories.

It may also be useful if the program displays the size of all of the files that it created.

Improve GitHub Actions Configuration

Right now the GitHub Actions configuration only runs the steps on Linux. There
are also other improvements to make to the GitHub Actions configuration! Here
is a list to start off the discussion:

  • Considering using Python 3.9, 3.10, and 3.11 and relaxing the requirement
    that chasten only works with Python 3.11
  • Build and testing process should work on Linux, MacOS, and Windows

Provide a Sub-Command for the Output of Version Information

Right now it is not possible for a person who uses chasten to see what
version of the program that they have installed. This means that there should
be a command like chasten version that should display version information of
the tool itself and, potentially, the version of key packages used by chasten.

Create a Video that Shows how Chasten Analyzes a Codebase

It would be useful if there was a screencast/video that illustrated how to
perform each of the steps associated with doing an analysis with Chasten.

This screencast/video would not necessarily need to have an audio. However, it
should show each of the command-lines that you must run and then the output that
they would each produce. It should also show each of the files and overview the
content of those files.

Support the Download of Configuration Files from a Remote Location Through HTTP

Right now chasten only supports the specification of a configuration file's
location by pointing to a local path. However, it would be ideal if the tool
could also access, read, and download configuration files that were available
through public URLs served by an HTTP server. The completion of this feature
would make it possible for a researcher to make public their configuration and
thus make it easier for another researcher to replicate their experiments.
Alternatively, this completed feature would also allow an instructor to host the
configuration for chasten in a location where students could not modify it.

Add Documentation to Illustrate and Explain The Tool's Verbose Output

Right now the documentation in the README.md file does not give an examples of
the verbose output that you can produce by passing the --verbose flag. It
would be useful if the README.md file and/or the Wiki gave examples of how the
verbose output helps the person who uses chasten to understand the tool's
behavior.

Use Treesitter S-Expressions to Run Queries Instead of XPath Expressions

Right now the chasten tool uses XPath expressions to query an XML-based
representation of a Python program. However, this means that chasten can only
work correctly for Python programs!

It would be awesome if chasten could parse source code from any
Treesitter-supported language (there are many!) and then use Treesitter
S-expressions to run queries for source code patterns.

The complete implementation of this feature would likely require a significant
amount of re-implementation of core functionality. However, this feature would
be a game changer for chasten because the tool would then support static
analysis of programs implemented in many different programming languages!

Implement All XPath Patterns Described in the Referenced Journal Paper

The following paper:

https://www.researchgate.net/publication/347335615_How_to_kill_them_all_An_exploratory_study_on_the_impact_of_code_observability_on_mutation_testing

describes a number of "mutation score anti-patterns" that limit the achievement
of a high mutation score for a test suite.

This project should come with XPath expressions that can scan a Python project
for all of the anti-patterns mentioned in this paper. The project should also:

  • Support the execution of one or more mutation testing tools for Python,
    thereby supporting the study of the correlation between anti-patterns and
    mutation score
  • Support the execution of test coverage monitoring and the collection of
    per-test coverage information with pytest-cov and coverage.py to support
    the study of the correlation between anti-patterns and coverage score
  • Ensure that all of these data sources are completely integrated and available
    for statistical analysis and/or machine learning prediction

Provide Deployment Integrations for Other Platforms Supported by Datasette

Right now Chasten provides integrations for the Fly and Vercel platforms.
However, it is also possible to deploy a Datasette to other platforms like
Heroku. We should also investigate whether other platforms besides Heroku are
also available and then determine whether or not Chasten should provide
deployment command-line arguments/options for these other platforms.

No such Column as filelines error when running integrate

chasten integrate all-programs C:\Users\Preston\CMPSC-203\ChastenProject\chasten\subject-data\lazytracker\chasten-results-lazytracker-20230913151249-d0046ac472434a709f6489e53f28b042.json -s subject-data/

The following error appears when I run the above command

๐Ÿ’ซ chasten: Analyze the AST of Python Source Code
๐Ÿ”— GitHub: https://github.com/gkapfham/chasten

โœจ Combining data file(s) in:

โ€ข Directory: C:\Users\Preston\CMPSC-203\ChastenProject\chasten\subject-data\lazytracker
  โ€ข File: 'chasten-results-lazytracker-20230913151249-d0046ac472434a709f6489e53f28b042.json'

โœจ Saved the file 'chasten-integrated-results-all-programs-20230913151315-18a0214c8b664b6ea93b86a5622c375b.json'
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ C:\Users\Preston\CMPSC-203\ChastenProject\chasten\chasten\main.py:742 in integrate               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   739 โ”‚   โ”‚   output.console.print(f"\n:sparkles: Saved the file '{combined_json_file_name}'")   โ”‚
โ”‚   740 โ”‚   # "flatten" (i.e., "un-nest") the now-saved combined JSON file using flatterer         โ”‚
โ”‚   741 โ”‚   # create the SQLite3 database and then configure the database for use in datasett      โ”‚
โ”‚ โฑ 742 โ”‚   combined_flattened_directory = filesystem.write_flattened_csv_and_database(            โ”‚
โ”‚   743 โ”‚   โ”‚   combined_json_file_name,                                                           โ”‚
โ”‚   744 โ”‚   โ”‚   output_directory,                                                                  โ”‚
โ”‚   745 โ”‚   โ”‚   project,                                                                           โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ C:\Users\Preston\CMPSC-203\ChastenProject\chasten\chasten\filesystem.py:280 in                   โ”‚
โ”‚ write_flattened_csv_and_database                                                                 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   277 โ”‚   # create a view that combines all of the data                                          โ”‚
โ”‚   278 โ”‚   database.create_chasten_view(database_file_name_str)                                   โ”‚
โ”‚   279 โ”‚   # enable full-text search in the SQLite3 database                                      โ”‚
โ”‚ โฑ 280 โ”‚   database.enable_full_text_search(database_file_name_str)                               โ”‚
โ”‚   281 โ”‚   # return the name of the directory that contains the flattened CSV files               โ”‚
โ”‚   282 โ”‚   return flattened_output_directory_str                                                  โ”‚
โ”‚   283                                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ C:\Users\Preston\CMPSC-203\ChastenProject\chasten\chasten\database.py:67 in                      โ”‚
โ”‚ enable_full_text_search                                                                          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    64 โ”‚   โ”‚   ]                                                                                  โ”‚
โ”‚    65 โ”‚   )                                                                                      โ”‚
โ”‚    66 โ”‚   # enable full-text search on the sources database table                                โ”‚
โ”‚ โฑ  67 โ”‚   database["sources"].enable_fts(                                                        โ”‚
โ”‚    68 โ”‚   โ”‚   [                                                                                  โ”‚
โ”‚    69 โ”‚   โ”‚   โ”‚   "filename",                                                                    โ”‚
โ”‚    70 โ”‚   โ”‚   โ”‚   "filelines",                                                                   โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ C:\Users\Preston\AppData\Local\pypoetry\Cache\virtualenvs\chasten-Ti8Qy4ta-py3.11\Lib\site-packa โ”‚
โ”‚ ges\sqlite_utils\db.py:2358 in enable_fts                                                        โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   2355 โ”‚   โ”‚   โ”‚   self.disable_fts()                                                            โ”‚
โ”‚   2356 โ”‚   โ”‚                                                                                     โ”‚
โ”‚   2357 โ”‚   โ”‚   self.db.executescript(create_fts_sql)                                             โ”‚
โ”‚ โฑ 2358 โ”‚   โ”‚   self.populate_fts(columns)                                                        โ”‚
โ”‚   2359 โ”‚   โ”‚                                                                                     โ”‚
โ”‚   2360 โ”‚   โ”‚   if create_triggers:                                                               โ”‚
โ”‚   2361 โ”‚   โ”‚   โ”‚   old_cols = ", ".join("old.[{}]".format(c) for c in columns)                   โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ C:\Users\Preston\AppData\Local\pypoetry\Cache\virtualenvs\chasten-Ti8Qy4ta-py3.11\Lib\site-packa โ”‚
โ”‚ ges\sqlite_utils\db.py:2408 in populate_fts                                                      โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   2405 โ”‚   โ”‚   โ”‚   โ”‚   table=self.name, columns=", ".join("[{}]".format(c) for c in columns)     โ”‚
โ”‚   2406 โ”‚   โ”‚   โ”‚   )                                                                             โ”‚
โ”‚   2407 โ”‚   โ”‚   )                                                                                 โ”‚
โ”‚ โฑ 2408 โ”‚   โ”‚   self.db.executescript(sql)                                                        โ”‚
โ”‚   2409 โ”‚   โ”‚   return self                                                                       โ”‚
โ”‚   2410 โ”‚                                                                                         โ”‚
โ”‚   2411 โ”‚   def disable_fts(self) -> "Table":                                                     โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ C:\Users\Preston\AppData\Local\pypoetry\Cache\virtualenvs\chasten-Ti8Qy4ta-py3.11\Lib\site-packa โ”‚
โ”‚ ges\sqlite_utils\db.py:524 in executescript                                                      โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    521 โ”‚   โ”‚   """                                                                               โ”‚
โ”‚    522 โ”‚   โ”‚   if self._tracer:                                                                  โ”‚
โ”‚    523 โ”‚   โ”‚   โ”‚   self._tracer(sql, None)                                                       โ”‚
โ”‚ โฑ  524 โ”‚   โ”‚   return self.conn.executescript(sql)                                               โ”‚
โ”‚    525 โ”‚                                                                                         โ”‚
โ”‚    526 โ”‚   def table(self, table_name: str, **kwargs) -> Union["Table", "View"]:                 โ”‚
โ”‚    527 โ”‚   โ”‚   """                                                                               โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
OperationalError: no such column: filelines

The only way I've been able to stop this error from occurring so far is by removing "filelines" from the following code in
database.py

database["sources"].enable_fts(
        [
            "filename",
            "filelines",
            "check_id",
            "check_name",
            "check_description",
            "check_pattern",
        ]
    )

Which then results in integrate running with no errors

๐Ÿ’ซ chasten: Analyze the AST of Python Source Code
๐Ÿ”— GitHub: https://github.com/gkapfham/chasten

โœจ Combining data file(s) in:

โ€ข Directory: C:\Users\Preston\CMPSC-203\ChastenProject\chasten\subject-data\lazytracker
  โ€ข File: 'chasten-results-lazytracker-20230913151249-d0046ac472434a709f6489e53f28b042.json'

โœจ Saved the file 'chasten-integrated-results-all-programs-20230913151430-f51221df497746a19191baa735dc6087.json'

โœจ Created this directory structure in C:\Users\Preston\CMPSC-203\ChastenProject\chasten\subject-data:

๐Ÿ“‚ chasten-flattened-csvs-sqlite-db-all-programs-20230913151430-90f2041a3273407e963e8b13cb404f5a
โ”œโ”€โ”€ ๐Ÿ“„ chasten.db
โ”œโ”€โ”€ ๐Ÿ“‚ csv
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ main.csv
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ sources.csv
โ”‚   โ””โ”€โ”€ ๐Ÿ“„ sources_check_matches.csv
โ”œโ”€โ”€ ๐Ÿ“„ datapackage.json
โ”œโ”€โ”€ ๐Ÿ“„ fields.csv
โ””โ”€โ”€ ๐Ÿ“„ tables.csv

Encoding Error when saving analyzed data to JSON

On windows when analyze wants to save the results into a json file it throws and encoding error because its not using the right encoding

To Reproduce
Steps to reproduce the behavior:

  1. Go to the chasten repository and enter the venv
  2. run the command 'chasten analyze chasten --config $PWD/.chasten/ --search-path . --save-directory chasten --save'

It then results in this

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ C:\Users\Preston\CMPSC-203\ChastenProject\chasten\chasten\main.py:662 in analyze                 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   659 โ”‚   # display all of the analysis results if verbose output is requested                   โ”‚
โ”‚   660 โ”‚   output.print_analysis_details(chasten_results_save, verbose=verbose)                   โ”‚
โ”‚   661 โ”‚   # save all of the results from this analysis                                           โ”‚
โ”‚ โฑ 662 โ”‚   saved_file_name = filesystem.write_chasten_results(                                    โ”‚
โ”‚   663 โ”‚   โ”‚   output_directory, project, chasten_results_save, save                              โ”‚
โ”‚   664 โ”‚   )                                                                                      โ”‚
โ”‚   665 โ”‚   # output the name of the saved file if saving successfully took place                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ C:\Users\Preston\CMPSC-203\ChastenProject\chasten\chasten\filesystem.py:198 in                   โ”‚
โ”‚ write_chasten_results                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   195 โ”‚   โ”‚   results_path_with_file = results_path / complete_results_file_name                 โ”‚
โ”‚   196 โ”‚   โ”‚   results_json = results_content.model_dump_json(indent=2)                           โ”‚
โ”‚   197 โ”‚   โ”‚   # use the built-in method with pathlib Path to write the JSON contents             โ”‚
โ”‚ โฑ 198 โ”‚   โ”‚   results_path_with_file.write_text(results_json)                                    โ”‚
โ”‚   199 โ”‚   โ”‚                                                                                      โ”‚
โ”‚   200 โ”‚   โ”‚   # return the name of the created file for diagnostic purposes                      โ”‚
โ”‚   201 โ”‚   โ”‚   return complete_results_file_name                                                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ C:\Users\Preston\AppData\Local\Programs\Python\Python311\Lib\pathlib.py:1079 in write_text       โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   1076 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   data.__class__.__name__)                                      โ”‚
โ”‚   1077 โ”‚   โ”‚   encoding = io.text_encoding(encoding)                                             โ”‚
โ”‚   1078 โ”‚   โ”‚   with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f  โ”‚
โ”‚ โฑ 1079 โ”‚   โ”‚   โ”‚   return f.write(data)                                                          โ”‚
โ”‚   1080 โ”‚                                                                                         โ”‚
โ”‚   1081 โ”‚   def readlink(self):                                                                   โ”‚
โ”‚   1082 โ”‚   โ”‚   """                                                                               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ C:\Users\Preston\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py:19 in encode    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    16                                                                                            โ”‚
โ”‚    17 class IncrementalEncoder(codecs.IncrementalEncoder):                                       โ”‚
โ”‚    18 โ”‚   def encode(self, input, final=False):                                                  โ”‚
โ”‚ โฑ  19 โ”‚   โ”‚   return codecs.charmap_encode(input,self.errors,encoding_table)[0]                  โ”‚
โ”‚    20                                                                                            โ”‚
โ”‚    21 class IncrementalDecoder(codecs.IncrementalDecoder):                                       โ”‚
โ”‚    22 โ”‚   def decode(self, input, final=False):                                                  โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f680' in position 34312: character maps to <undefined>

This should not throw an error at all and shouldn't have issues with encoding when there are special characters.
I believe this is only an issue on windows because this was an issue with running the workflow and getting it to pass on
windows other platforms don't seem to have this issue.

Add Integration with a Large Language Model (LLM) for Help with Creating XPath Expressions from XML Snippets

One of the most challenging tasks associated with using Chasten is that a person
must write their own XPath expressions in order to query the AST of a Python
program. It would be ideal if Chasten could provide some support for
automatically creating the XPath expressions for a specific query.

One option that we should consider is giving a person who uses Chasten the
option of providing an OpenAI key and then sending some XML and a natural
language description of a query to OpenAI ChatGPT-3.5-turbo. We could then return
back an XPath expression that is likely to meet the query and then give the
person using the tool the option of adding it to their checks.yml
configuration file.

Allow for the Specification of Either XPath Version 1 or XPath Version 2 When Running Checks

The current implementation of chasten is hard-coded so that it only works with
XPath 2 expressions. While this is nice because of the fact that it supports the
greatest range of XPath expressions, it is likely going to make the execution of
XPath expressions slower when they would work sufficiently well with the version
1 system instead of the version 2 system.

There are some more details about this issue at:

https://github.com/spookylukey/pyastgrep

One idea would be that chasten analyze could accept a command line argument
that would support the specification of whether or not the tool should use XPath
1 or XPath 2 to run all of the expressions associated with the checks.

Alternatively, it might be a good idea to allow the person who writes the
checks.yml file to use an extra attribute to specify whether a certain check
should use the XPath version 1 or version 2 parse.

Necessary documentation for error while running `poetry install`: Building wheel for pysqlite3 (pyproject.toml) did not run successfully.

After running poetry install in the root directory of this project, I encoutered the following error:

    Installing build dependencies: started
    Installing build dependencies: finished with status 'done'
    Getting requirements to build wheel: started
    Getting requirements to build wheel: finished with status 'done'
    Preparing metadata (pyproject.toml): started
    Preparing metadata (pyproject.toml): finished with status 'done'
  Building wheels for collected packages: pysqlite3
    Building wheel for pysqlite3 (pyproject.toml): started
    Building wheel for pysqlite3 (pyproject.toml): finished with status 'error'
    error: subprocess-exited-with-error

    ร— Building wheel for pysqlite3 (pyproject.toml) did not run successfully.
    โ”‚ exit code: 1
    โ•ฐโ”€> [20 lines of output]
  โ€ข Installing flatterer (0.19.8): Installing...
  โ€ข Installing hypofuzz (23.7.1)
  โ€ข Installing hypothesis-jsonschema (0.22.1)
  โ€ข Installing isort (5.12.0)
  โ€ข Installing mypy (1.4.1)
  โ€ข Installing pyastgrep (1.2.2)
  โ€ข Installing pydantic (2.0.3)
  โ€ข Installing pysqlite3 (0.5.1): Failed

...

    ร— Building wheel for pysqlite3 (pyproject.toml) did not run successfully.
    โ”‚ exit code: 1
    โ•ฐโ”€> [20 lines of output]
        running bdist_wheel
        running build
        running build_py
        creating build
        creating build/lib.linux-x86_64-cpython-311
        creating build/lib.linux-x86_64-cpython-311/pysqlite3
        copying pysqlite3/dbapi2.py -> build/lib.linux-x86_64-cpython-311/pysqlite3
        copying pysqlite3/__init__.py -> build/lib.linux-x86_64-cpython-311/pysqlite3
        running build_ext
        Builds a C extension linking against libsqlite3 library
        building 'pysqlite3._sqlite3' extension
        creating build/temp.linux-x86_64-cpython-311
        creating build/temp.linux-x86_64-cpython-311/src
        gcc -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -I/nix/store/vshdx1ds1rmpl4by04i2g807zvkqyq8q-libxcrypt-4.4.33/include -fPIC -DMODULE_NAME=\"pysqlite3.dbapi2\" -I/usr/include -I/home/simon/.cache/pypoetry/virtualenvs/chasten-LtXvhvXr-py3.11/include -I/nix/store/l87x5cpmxpfxb93nl4madnr4mmmlvhy4-python3-3.11.1/include/python3.11 -c src/blob.c -o build/temp.linux-x86_64-cpython-311/src/blob.o
        In file included from src/blob.c:1:
        src/blob.h:4:10: fatal error: sqlite3.h: No such file or directory
            4 | #include "sqlite3.h"
              |          ^~~~~~~~~~~
        compilation terminated.
        error: command '/nix/store/1y8i61anhs9hh1g5x3zw2fvdbivwixzg-gcc-wrapper-11.3.0/bin/gcc' failed with exit code 1
        [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip.
    ERROR: Failed building wheel for pysqlite3
  Failed to build pysqlite3
  ERROR: Could not build wheels for pysqlite3, which is required to install pyproject.toml-based projects

The same error is repeated numerous times in the output, because it propagates to many parts of the project. I'm leaving out the other repeated error messages. The last blurb would indicate to me that the problem is because poetry can't find the development headers for sqlite3:

src/blob.h:4:10: fatal error: sqlite3.h: No such file or directory
            4 | #include "sqlite3.h"
              |          ^~~~~~~~~~~

This was solved by the following:

Solution (NixOS)

Install pkgs.sqlite.dev to add the directory holding sqlite3.h to $CMAKE_INCLUDE_DIRS.

Solution (Ubuntu)

Install libsqlite3-dev to add the directory holding sqlite3.h to $CMAKE_INCLUDE_DIRS.

Proposed solution

Either include in documentation that sqlite3 dev files are required, or automate this.

Stop Using a Hard Coded Project/App Name During Deployment to Fly or Vercel

The current implementation of chasten has a hard-coded name for the
application/project that is deployed to a hosting platform like Fly or Vercel.

This means that when a person is using chasten's datasette-publish
sub-command they will not be able to deploy a datasette.

To solve this problem, this sub-command should offer an new required argument
that is the name of the application/project that should be deployed.

Display Diagnostic Details About the CSV Files, SQLite Database, Full Text Search Indices, and Other Integration Details

When you run the chasten integrate command that are a number of steps that it
takes without providing any diagnostic displays to let you know that the steps
were completed. For instance, here are some steps for which more diagnostic
information may be helpful:

  • Creation of the SQLite database
  • Adding tables to the SQLite database
  • Adding full-text search indices to the SQLite database
  • Creating the chasten_complete virtual data for use in datasette
  • Performing other steps related to creating/saving/modifying files

The key to completing this task is to balance the display of information with
the possibility over overwhelming the person who uses the program with too much
diagnostic information. It is also important to remember that the diagnostic
output must use the existing display and/or logging infrastructure and appear in
the same aesthetically pleasing format.

Successfully Run The Tool on At Least 50 Major Python Projects Using all Patterns Finished in Issue #6

Once there is evidence that the chasten tool works correctly on at least 20 small
projects (see the 12 forked repositories in the AstuteSource organization), then
there should be a major push to apply chasten to at least 50 major projects and
to scan those projects for all of the patterns that are ultimately implemented
as part of the completion of Issue #6.

There should be convincing evidence that chasten works on large-scale Python
projects that are available for download from GitHub.

Create a Video that Shows How to Use Chasten's Datasette to Analyze Data

Chasten currently has an integration with Datasette that offers a nice way to
interact with the data from analyzing numerous programs/projects according to
the patterns described inside of a checks.yml file. It would be awesome if
there was a screencast/video that illustrated how to do a data analysis with the
Datasette interface that the Chasten tool provides.

Issue finding executable for datasette on Windows

chasten datasette-serve C:\Users\Preston\CMPSC-203\ChastenProject\chasten\subject-data\chasten-flattened-csvs-sqlite-db-all-programs-20230905151458-42dbbaac257447cfa602106ce0d5a93d\chasten.db --port 8001

๐Ÿ’ซ chasten: Analyze the AST of Python Source Code
๐Ÿ”— GitHub: https://github.com/gkapfham/chasten

โœจ Starting a local datasette instance:
   โ€ข Database: '...
subject-data\chasten-flattened-csvs-sqlite-db-all-programs-20230905151458-42dbbaac257447cfa602106ce0d5a93d\chasten.db'
   โ€ข Metadata: 'None'
   โ€ข Port: 8001

โœจ Details for datasette startup:
   โ€ข Venv: 'C:\Users\Preston\AppData\Local\pypoetry\Cache\virtualenvs\chasten-Ti8Qy4ta-py3.11'
   โ€ข Cannot find: 'C:\Users\Preston\AppData\Local\pypoetry\Cache\virtualenvs\chasten-Ti8Qy4ta-py3.11/bin/datasette'

๐Ÿคท Was not able to find '{executable_name}'

I got this error when running the above command and found that its searching for the executable in a bin file when they are stored in a file called Scripts on Windows. I fixed this by changing the following code.

executable_name = constants.datasette.Datasette_Executable
    # define the name of the file that contains datasette metadata;
    # note that by default the metadata could be None and thus it
    # will not be passed as a -m argument to the datasette program
    metadata = datasette_metadata
    # identify the location at which the virtual environment exists;
    # note that this is the location where executable dependencies of
    # chasten will exist in a bin directory. For instance, the "datasette"
    # executable that is a dependency of chasten can be found by starting
    # the search from this location for the virtual environment.
    virtual_env_location = sys.prefix
    full_executable_name = virtual_env_location + "/bin/" + executable_name
    (found_executable, executable_path) = filesystem.can_find_executable(
        full_executable_name
    )
    # output diagnostic

To

 executable_name = constants.datasette.Datasette_Executable + ".exe"
    # define the name of the file that contains datasette metadata;
    # note that by default the metadata could be None and thus it
    # will not be passed as a -m argument to the datasette program
    metadata = datasette_metadata
    # identify the location at which the virtual environment exists;
    # note that this is the location where executable dependencies of
    # chasten will exist in a bin directory. For instance, the "datasette"
    # executable that is a dependency of chasten can be found by starting
    # the search from this location for the virtual environment.
    virtual_env_location = sys.prefix
    script = " \Scripts\ "
    scriptst = script.strip()
    full_executable_name = virtual_env_location  + scriptst +  executable_name
    (found_executable, executable_path) = filesystem.can_find_executable(
        full_executable_name
    )

and it ran on windows

๐Ÿ’ซ chasten: Analyze the AST of Python Source Code
๐Ÿ”— GitHub: https://github.com/gkapfham/chasten

โœจ Starting a local datasette instance:
   โ€ข Database: '...
subject-data\chasten-flattened-csvs-sqlite-db-all-programs-20230913151430-90f2041a3273407e963e8b13cb404f5a\chasten.db'
   โ€ข Metadata: 'None'
   โ€ข Port: 8001

โœจ Details for datasette startup:
   โ€ข Venv: 'C:\Users\Preston\AppData\Local\pypoetry\Cache\virtualenvs\chasten-Ti8Qy4ta-py3.11'
   โ€ข Program: 'C:\Users\Preston\AppData\Local\pypoetry\Cache\virtualenvs\chasten-Ti8Qy4ta-py3.11\Scripts\datasette.exe'

INFO:     Started server process [31096]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit)

This is a solution for Windows however it may mean we may need to run different code depending what OS people are using
since executables are stored differently on different OS

Support the Integration of Remote Results Files That Are Available through HTTP

Right now the chasten integrate command only allows the specification of one
or more result local files on the command-line. This means that if there are,
for instance, results files in a remote GitHub repository a person who wants to
perform the integration step must first clone the GitHub repository and then
type all of the directory names when running the chasten integrate command.

Ultimately, it would be nice if all of the various paths that chasten supports
for input could either be a file name, directory, or remote file name. Which one
of these it actually is should be automatically detected by the tool! This means
that, for instance, the tool should never crash when it accidentally tries to
treat a remote directory like it is a remote directory.

Add Git Pre-Commit Hooks to Check Commit Messages and Run All Checks

While this repository has many checks that are run at the point when you commit to the GitHub repository, it would be ideal if many/most/all of these checks could also be run before a commit takes place. This is possible through the use of Git pre-commit hooks that developers for the project can install after cloning the repository the first! As such, we should add a pre-commit configuration for this repository and run checks on each commit. To complete this task we will need to confirm that the pre-commit hooks work correctly when a commit happens from a text editor, an IDE, and from the command-line.

Integrate Chasten with GatorGrader and/or GatorGrade

Right now the GatorGrade/GatorGrader tools provide features that can
intelligently look for structures inside of a Markdown file. However, the
tool(s) do not have any native facilities for scanning Python source code,
excepting the fact that they have limited features for scanning for comments or
using regular expressions to scan for source code.

As such, it would be ideal if Chasten was directly integrated into a tool like
GatorGrade/GatorGrader. Please note that GatorGrade can be considered like the
"front end" and GatorGrader can be considered like the "back end". It would
already be possible for GatorGrade to call Chasten directly and to produce
results. However, this would make Chasten like all of the other shell commands
that you can run through GatorGrade.

Coverage Badge Is Currently Updated Even for a Pull Request that Was Not Merged

The current GitHub Actions configuration for chasten updates the coverage
badge in the README.md file even when the tasks in the build.yml file
are being run for a pull request and not for a commit to the main branch.

If possible, the update to the coverage badge should be adjusted so that it does
not happen when test coverage monitoring occurs during a build for a pull
request.

Dump and/or Display the XML Representation for a Python File

Right now the current implementation of chasten uses the pyastgrep package
to search through the XML-based representation of a Python source code file.

However, even though pyastgrep supports the dumping of a Python source code
file to an XML file, the chasten tool does not also offer that feature.

It would be helpful if the tool also gave a way to dump and/or display the XML
representation of a Python source code file. This is particularly useful in the
situation when a person is trying to write an XPath expression for a specific
feature of the Python AST and it is not working correctly.

Sometimes it is helpful to be able to see the XML representation of the Python
source code when you are trying to debug the XPath expression.

Create a GitHub Repository that Can Contain Submodules for All Programming Projects

We need to be able to clearly demonstrate that chasten can analyze the source
code of the programming projects that are commonly used in computer and
information science classes at Allegheny College.

If would be ideal if all of the solution repositories were in submodules in a containing GitHub repository.

It would also be be great if all of the starter repositories were in submodules in a containing GitHub repository.

Ultimately, we may need multiple "container" repositories in order to complete this task.

Please note that this task will require approval from the instructors who create the repos.

Dump and/or Display XML Files for a Potentially Nested Directory Structure of Python Source Code

As described in issue #31, it would be useful to dump the XML representation of
a single specified file on the command-line. With that said, it would also be
great if it was possible to create/display an XML file for all of the Python
files in a directory structure. This would make it possible to save a bunch of
different XML files that could then be browsed while writing XPath expressions
for querying the code base of a Python project.

Attemp to resolve Chasten installation issue by removing pysqlite3

I encountered this issue while trying to install Chasten.

Issue encountered:
`pip failed to build package:
pysqlite3

Some possibly relevant errors from pip install:
error: subprocess-exited-with-error
src/connection.c:1385:10: error: call to undeclared function 'sqlite3_enable_load_extension'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
src/connection.c:1409:10: error: call to undeclared function 'sqlite3_load_extension'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
error: command '/usr/bin/clang' failed with exit code 1

Error installing chasten.`

Expected Result: I expected chasten to be installed successfully without any issues after removing 'pysqlite3' from teh project dependencies

Steps I've taken:

  1. Created a new branch 'Chasten_wo_pysqlite_test'
  2. Removed pysqlite3 dependency using poetry remove pysqlite3 and updated pyproject.toml and poetry.lock accordingly
  3. Ran poetry install and poetry show --tree
  4. Updated pip, setuptools, and pipx and cleaned the build cache with the following commands:
    pip install --upgrade pip setuptools
    pip install --upgrade pipx
    pip install --upgrade pysqlite3
    pipx uninstall chasten pip cache purge pipx install chasten
  5. Checked if chasten was successfully installed by running:
    poetry shell
    pip list (verified that chasten was in the list)
  6. Ran chasten --help and chasten --install-completion and received the following result:
    zsh completion installed in /Users/jaclynpham/.zfunc/_chasten Completion will take effect once you restart the terminal
  7. Attempted to run tests with poetry run task test and received the following result:
    Results (6.33s): 66 passed

Actual Result: The installation of 'chasten' and running of tests proceeded without any errors, indicating that 'chasten' was successfully integrated into the project.

Next Step: I will continue to validate the changes by following the instructions in README to run all the recommended commands and confirm that removing pysqlite3 packages ensures the smooth execution of the program. I will provide further updates on the results of this validation process in this issue

Test case that uses source code of `chasten` fails in development branch

Describe the bug
When on a development branch, after having made some changes to the source code, I was running tests and realized that chasten is being called to analyze itself:

chasten analyze testing --search-path /path/to/chasten/repo --config /path/to/chasten/repo/.chasten --verbose

The problem with this is that chasten is using a hard-coded config located in the .chasten directory, and because chasten's code changes, these checks will not always pass.

.chasten
โ”œโ”€โ”€ checks.yml
โ””โ”€โ”€ config.yml

This hard-coded config was being read by the test case tests/test_main.py::test_cli_analyze_correct_arguments_analyze_chasten_codebase. Chasten is essentially shooting itself in the foot, and will not pass test cases when significant changes are made to it that break the following checks:

checks:
  - name: "class-definition"
    code: "CDF"
    id: "C001"
    pattern: './/ClassDef'
    count:
      min: 1
      max: 50
  - name: "all-function-definition"
    code: "AFD"
    id: "F001"
    pattern: './/FunctionDef'
    count:
      min: 1
      max: 200
  - name: "non-test-function-definition"
    code: "NTF"
    id: "F002"
    pattern: './/FunctionDef'
    count:
      min: 40
      max: 70
  - name: "single-nested-if"
    code: "SNI"
    id: "CL001"
    pattern: './/FunctionDef/body//If'
    count:
      min: 1
      max: 100
  - name: "double-nested-if"
    code: "DNI"
    id: "CL002"
    pattern: './/FunctionDef/body//If'
    count:
      min: 1
      max: 15

In my case, you can read the output below to see that the non-test-function-definition and double-nested-if patterns were not being found. There is no need to enforce these arbitrary checks on our code.

โœจ Analyzing Python source code in: /_scratch/chasten

๐ŸŽ‰ Performing 5 check(s):

  โœ“ id: 'C001', name: 'class-definition', pattern: './/ClassDef', min=1, max=50
    โ€ข /_scratch/chasten/chasten/constants.py - 10 matches
    โ€ข /_scratch/chasten/chasten/debug.py - 2 matches
    โ€ข /_scratch/chasten/chasten/enumerations.py - 3 matches
    โ€ข /_scratch/chasten/chasten/results.py - 6 matches
    โ€ข /_scratch/chasten/chasten/server.py - 1 matches
  โœ“ id: 'F001', name: 'all-function-definition', pattern: './/FunctionDef', min=1, max=200
    โ€ข /_scratch/chasten/chasten/filesystem.py - 12 matches
    โ€ข /_scratch/chasten/chasten/validate.py - 4 matches
    โ€ข /_scratch/chasten/chasten/util.py - 5 matches
    โ€ข /_scratch/chasten/chasten/main.py - 9 matches
    โ€ข /_scratch/chasten/chasten/configuration.py - 10 matches
    โ€ข /_scratch/chasten/chasten/output.py - 12 matches
    โ€ข /_scratch/chasten/chasten/checks.py - 9 matches
    โ€ข /_scratch/chasten/chasten/database.py - 5 matches
    โ€ข /_scratch/chasten/chasten/server.py - 2 matches
    โ€ข /_scratch/chasten/chasten/process.py - 4 matches
    โ€ข /_scratch/chasten/tests/test_debug.py - 8 matches
    โ€ข /_scratch/chasten/tests/test_validate.py - 4 matches
    โ€ข /_scratch/chasten/tests/test_filesystem.py - 16 matches
    โ€ข /_scratch/chasten/tests/test_configuration.py - 3 matches
    โ€ข /_scratch/chasten/tests/test_main.py - 10 matches
    โ€ข /_scratch/chasten/tests/test_util.py - 4 matches
    โ€ข /_scratch/chasten/tests/test_constants.py - 5 matches
    โ€ข /_scratch/chasten/tests/test_process.py - 3 matches
    โ€ข /_scratch/chasten/tests/test_checks.py - 8 matches
  โœ— id: 'F002', name: 'non-test-function-definition', pattern: './/FunctionDef[not(contains(@name, "test_"))]', min=40,
max=70
    โ€ข /_scratch/chasten/chasten/filesystem.py - 12 matches
    โ€ข /_scratch/chasten/chasten/validate.py - 4 matches
    โ€ข /_scratch/chasten/chasten/util.py - 5 matches
    โ€ข /_scratch/chasten/chasten/main.py - 9 matches
    โ€ข /_scratch/chasten/chasten/configuration.py - 10 matches
    โ€ข /_scratch/chasten/chasten/output.py - 10 matches
    โ€ข /_scratch/chasten/chasten/checks.py - 9 matches
    โ€ข /_scratch/chasten/chasten/database.py - 5 matches
    โ€ข /_scratch/chasten/chasten/server.py - 2 matches
    โ€ข /_scratch/chasten/chasten/process.py - 4 matches
    โ€ข /_scratch/chasten/tests/test_main.py - 1 matches
  โœ“ id: 'CL001', name: 'single-nested-if', pattern: './/FunctionDef/body//If', min=1, max=100
    โ€ข /_scratch/chasten/chasten/filesystem.py - 11 matches
    โ€ข /_scratch/chasten/chasten/validate.py - 3 matches
    โ€ข /_scratch/chasten/chasten/util.py - 2 matches
    โ€ข /_scratch/chasten/chasten/main.py - 15 matches
    โ€ข /_scratch/chasten/chasten/configuration.py - 14 matches
    โ€ข /_scratch/chasten/chasten/output.py - 7 matches
    โ€ข /_scratch/chasten/chasten/checks.py - 13 matches
    โ€ข /_scratch/chasten/chasten/database.py - 11 matches
    โ€ข /_scratch/chasten/chasten/process.py - 5 matches
    โ€ข /_scratch/chasten/tests/test_filesystem.py - 1 matches
    โ€ข /_scratch/chasten/tests/test_util.py - 1 matches
    โ€ข /_scratch/chasten/tests/test_constants.py - 1 matches
    โ€ข /_scratch/chasten/tests/test_process.py - 1 matches
    โ€ข /_scratch/chasten/tests/test_checks.py - 1 matches
  โœ— id: 'CL002', name: 'double-nested-if', pattern: './/FunctionDef/body//If[ancestor::If and not(parent::orelse)]', min=1,
max=15
    โ€ข /_scratch/chasten/chasten/filesystem.py - 3 matches
    โ€ข /_scratch/chasten/chasten/validate.py - 1 matches
    โ€ข /_scratch/chasten/chasten/main.py - 2 matches
    โ€ข /_scratch/chasten/chasten/configuration.py - 2 matches
    โ€ข /_scratch/chasten/chasten/output.py - 1 matches
    โ€ข /_scratch/chasten/chasten/checks.py - 3 matches
    โ€ข /_scratch/chasten/chasten/database.py - 4 matches

To Reproduce
Steps to reproduce the behavior:

  1. In chasten repo: git checkout e264ae97c4dfa3f408695158bc7fb867ea2cd769
  2. poetry lock; poetry install
  3. poetry run task test-not-randomly (this ensures no other failed test cases clutter up the output)
  4. Previously mentioned test case will fail: tests/test_main.py:118 test_cli_analyze_correct_arguments_analyze_chasten_codebase

Expected behavior
This test case should pass.

Desktop (please complete the following information):

  • OS: NixOS 22.11.2301.cff83d5032a

Proposed Solution

We simply need to remove this hard coded config and rely on something more static, such as a small sample python program located in tests that has code that will never change rather than analyzing our own codebase.

This is very meta that we are learning how our tool works because we're using our tool to analyze our tool, isn't it?

Issue installing pysqlite3 on Windows

When using the command poetry install
fatal error C1083: Cannot open include file: 'sqlite3.h': No such file or directory
ERROR: Failed building wheel for pysqlite3
Failed to build pysqlite3
ERROR: Could not build wheels for pysqlite3, which is required to install pyproject.toml-based projects

When using pip install pysqlite3

Collecting pysqlite3
Using cached pysqlite3-0.5.1.tar.gz (40 kB)
Preparing metadata (setup.py) ... done
Building wheels for collected packages: pysqlite3
Building wheel for pysqlite3 (setup.py) ... error
error: subprocess-exited-with-error

I've tried looking for solutions however most of them aren't for Windows

  • Ones that I've tried are updating my build tools
  • updating python from 3.11.1 to 3.11.5
  • Tried installing sqlite3 from the actual website to get the files it says are missing

Document Pain Points & Solutions Regarding Dev Environments

As a team, we've run into some roadblocks getting everyone up and running with Chasten on their local machines. Of course, this prevents us from progressing at all in closing the gap between where Chasten is currently and where we want it to be in order to be "production-ready".

We need to either:
A. Define and enforce best practices requiring each team member to document pain points they've encountered in getting a dev environment set up on their local machine, as well as any solutions implemented.
B. Task a team member or two with interviewing the team (making sure each operating system is represented) to understand how successful dev environments have been created, and ultimately write documentation that can be utilized by the rest of the team to achieve similar results.
C. Implement some combination of the above two strategies.

Create a Containerized Dev Environment

It may be worth considering utilizing Docker to create a containerized dev environment for working on Chasten with. This will allow for added mobility long-term if additional dependencies get added (or removed) in such a way that it breaks local dev environments; rather than have to worry about the entire team (across several different operating systems) patching local environments, changes could be made to a single containerized environment that can be accessed and utilized by the entire team.

While this is a task that's not strictly necessary, given the trouble we've seen getting the team started on just running Chasten, this may be a worthwhile endeavor to prevent team-wide roadblocks like this in the future.

This is a task that would require some outside investigation into Docker and Dockerfiles, and would be a good issue for those interested in acquiring a new skill/technical proficiency!

Create a GitHub Repository that Can Contain Submodules for All Experimental Subjects

Right now the AstuteSource organization that houses the GitHub repository for
the chasten tool has forks of other Python projects that all use Poetry for
their dependency management and virtual environment creation.

For example, here is a Python project to which we've already applied chasten:

https://github.com/AstuteSource/pls-cli

It would be ideal if all of the forks of these repositories were added as Git
submodules to a repository called, for instance, chasten-open-source-projects.
This would make it easier for someone who wants to apply chasten to a wide
variety of projects because they would only have to clone one repository and
then the submodules inside of that repository.

Improve the Logging Output that the Tool Produces

Right now the chasten tool does not have a sufficient amount of logging
statements inside of the source code. This means that it will be difficult for a
person who is debugging the tool to see what steps it is performing. Ultimately,
each of the key functions need at least a few additional logging statements!

During the completion of this task, please make sure to balance the number of
logging statements and the details that the logging statements provide and the
way in which logging statements can sometimes increase the complexity of the
code base and make the program's source code more difficult to understand.

Automatically Generate a Chasten Configuration Based on Detected Source Code Patterns

The chasten tool comes pre-configured with internally stored checks.

Once the tool has a significant number of internally checks, it should be
possible to scan the source code of a project and detect how many regions match
the specific checks.

This would be particularly useful for an instructor who implements a solution
for a project. This person could then run chasten to detect, for instance,
that there are 19 functions in the modules of the program and that they all
have type annotations and docstring comments.

If chasten could then automatically generate a config.yml and checks.yml
that express these and all of the other detected checks. When the instructor
then removes some of these functions or docstrings or type annotations during
the preparation of the "starter repository" for the project, the chasten
checks will start to fail even then they previously passed in the solution
repository.

Ultimately, this would be a super-easy way for a teacher to start using
chasten!

Improve Test Coverage for the Entire Codebase

As new features have been added to chasten the test coverage has dropped to,
at last check, 73%! We should add new test cases that focus on effectively
covering all of the code segments that are not currently covered. The ultimate
goal should be to, for instance, increase code coverage to at least 95% unless
there is a convincing argument to be made for why that is not possible.

Provide Summaries for the Counts of Checks Across All Matched Files

When the chasten tool is run on a program with multiple files, it will produce output like this:

๐ŸŽ‰ Performing 12 check(s):

  โœ“ id: 'CLS001', name: 'class-definition', pattern: './/ClassDef', min=None, max=None
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/lazytracker.py - 1 matches
  โœ“ id: 'FUNC001', name: 'all-function-definition', pattern: '//FunctionDef/body/Expr[value/Constant]/following-sibling::*[1] |
//FunctionDef/body[not(Expr/value/Constant)]/*[1]', min=None, max=None
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/tests/test_lazytracker.py - 6 matches
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/tests/test_tracked.py - 2 matches
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/lazytracker.py - 6 matches
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/tracked.py - 3 matches
  โœ“ id: 'FUNC002', name: 'all-function-definition-with-docstring', pattern: '//FunctionDef/body/Expr[value/Constant]/following-sibling::*[1]',
min=None, max=None
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/lazytracker.py - 6 matches
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/tracked.py - 1 matches
  โœ“ id: 'FUNC003', name: 'all-function-definition-with-no-docstring', pattern: '//FunctionDef/body[not(Expr/value/Constant)]/*[1]', min=None,
max=None
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/tests/test_lazytracker.py - 6 matches
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/tests/test_tracked.py - 2 matches
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/tracked.py - 2 matches
  โœ“ id: 'FUNC004', name: 'all-non-test-function-definition', pattern: '//FunctionDef[not(contains(@name,
"test_"))]/body/Expr[value/Constant]/following-sibling::*[1] | //FunctionDef[not(contains(@name, "test_"))]/body[not(Expr/value/Constant)]/*[1]',
min=None, max=None
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/tests/test_lazytracker.py - 2 matches
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/lazytracker.py - 6 matches
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/tracked.py - 3 matches
  โœ— id: 'FUNC004', name: 'all-test-function-definition', pattern: '//FunctionDef[starts-with(@name,
"test_")]/body/Expr[value/Constant]/following-sibling::*[1] | //AsyncFunctionDef[starts-with(@name,
"test_")]/body/Expr[value/Constant]/following-sibling::*[1] | //FunctionDef[starts-with(@name, "test_")]/body[not(Expr/value/Constant)]/*[1] |
//AsyncFunctionDef[starts-with(@name, "test_")]/body[not(Expr/value/Constant)]/*[1]', min=7, max=22
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/tests/test_lazytracker.py - 4 matches
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/tests/test_tracked.py - 2 matches
  โœ“ id: 'CTRL001', name: 'single-nested-if-in-function', pattern: './/FunctionDef/body//If', min=None, max=None
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/lazytracker.py - 1 matches
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/tracked.py - 7 matches
  โœ“ id: 'CTRL002', name: 'single-nested-if-anywhere-in-module', pattern: './/If', min=None, max=None
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/lazytracker.py - 1 matches
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/tracked.py - 7 matches
  โœ“ id: 'CTRL003', name: 'double-nested-if-in-function', pattern: './/FunctionDef/body//If[ancestor::If and not(parent::orelse)]', min=None,
max=None
  โœ“ id: 'CTRL004', name: 'double-nested-if-anywhere-in-module', pattern: './/If[ancestor::If and not(parent::orelse)]', min=None, max=None
  โœ“ id: 'CTRL005', name: 'single-nested-for-target', pattern: './/For/target/Name', min=None, max=None
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/lazytracker.py - 3 matches
  โœ“ id: 'CTRL006', name: 'single-nested-for-target-no-count', pattern: './/For/target/Name', min=None, max=None
    โ€ข /home/gkapfham/working/source/astute-subjects/subject-forks/lazytracker/lazytracker/lazytracker.py - 3 matches

โœจ Saved the file 'chasten-results-lazytracker-20230823214805-7c28a9801c25437f981cb7e2a51926f0.json'

๐Ÿ˜“ At least one check did not pass.

This example is small enough that it is easy to tell that, for instance,
single-nested-for-target-no-count has a total of 6 matches. However, it would
be useful if the tool's command-line output has sub-totals for the number of matches for a specific check.

Document the Use of the `log` Sub-Command in the README.md

Right now the README.md command does not explain how to the chasten log that
starts the syslog server. This documentation will need to explain what is a
syslog server, which it is useful to have a syslog server, and then given an
overview of the kind of output that chasten's syslog server provides.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.