gatoreducator / sheetshuttle Goto Github PK

View Code? Open in Web Editor NEW

3.0 5.0 0.0 4.8 MB

:truck: A plugin friendly tool to connect Google Sheets and GitHub

License: MIT License

Python 100.00%

python google-sheets-api github pygithub pluginbase

sheetshuttle's People

Contributors

Stargazers

Watchers

sheetshuttle's Issues

[SUB-TASK] Organize collected data and store in JSON and Pickle files

Task Description and Info (what needs to be done?)

After the data is retrieved from the API, it should be restructured in an intuitive and stored in JSON format.

Parent task: #5

Acceptance Criteria (when is it considered complete?)

Pandas data frame is converted into dictionary
The dictionary contains all information in the region
Dictionary is written to JSON or Pickle file in a folder of user choice

Time to complete (approximately, how much time will this take?)

2 hrs

Person/People involved (who is working on this?)

@tuduun
transferred to @noorbuchi

Other information

Anything else that doesn't fit in previous categories

Google sheets data can be collected, organized into a pandas data frame and stored.

[TASK] Add test coverage calculation

Task Description and Info (what needs to be done?)

Knowing test coverage is very important in understanding how effective our test suite is. A well known pytest plugin for calculating test coverage is pytest-cov.

Acceptance Criteria (when is it considered complete?)

The dependency is added to the project's poetry files
Test task includes argument to run pytest-cov
Command runs without errors and with coverage information

Time to complete (approximately, how much time will this take?)

Add estimate here

Person/People involved (who is working on this?)

Other information

Anything else that doesn't fit in previous categories

[SUB-TASK] Give the user control over the cell range of the data, and data columns to store.

Parent Task (Where is this being split from)

Sub-Task Description and Info (what needs to be done?)

Read the remaining parts of the configuration file and come up with ways that give the user more control over what data to get from the sheets (open ended)

Acceptance Criteria (when is it considered complete?)

Read YAML for cell range
Filter Pandas data frame based on the columns
Optional but preferable: investigate how we can use pluginbase with this

Time to complete (approximately, how much time will this take?)

5 hrs

Person/People involved (who is working on this?)

@Yanqiao4396

Other information

Plugin base would allow the user to write their own plugins regarding what kinds of data columns to get and how to store them. Needs more investigation

[TASK] Create new project banner

Task Description and Info (what needs to be done?)

The current project banner looks a bit funny but it's also somewhat misleading since it shows the Go language gophers which could tell the users that this tool is meant for Go. New ideas and designs would be great to replace the gophers and still create a cool look for our project.

Time to complete (approximately, how much time will this take?)

Add estimate here

Person/People involved (who is working on this?)

Other information

Anything else that doesn't fit in previous categories

Document the trade offs between using `.env` files vs `.json` files to store tokens and other authentication information

Currently, we don't have anything discussing the difference between using either approach to store the tokens and authentication data that users get from Github and Google Service accounts. It might be worth investigating each to understand how each one is retrieved and organized in the file system, and how are they passed to SheetShuttle as CLI arguments.
Once the differences are understood, documenting them could help out users in picking. The documentation can be either in the README or in the docs directory.

Handling empty cells in Google Sheets.

We currently have a somewhat of a big problem in SheetShuttle. When the user tries to access cells that are empty, we get different results in the returned data depending on the structure of the table. We also run into big problems when trying to convert that data into a pandas data frame and attempt to specify data types. There is no clear fix for this issue and we need to investigate and discuss things further after exploring the possible options.

[TASK] Complete Sheet Collector

Task Description and Info (what needs to be done?)

The current progress made on the Sheet Collector module has very limited functionality to demonstrate the ability to make calls to the Google Sheets API. More work is needed to collect and organize the data and to allow the user more control over this process. It's important to try and automate as much as possible of this part of the project.

Acceptance Criteria (when is it considered complete?)

Google sheets data can be collected, organized into a pandas data frame and stored.
The user has control over the cell range of the data, and data columns to store.
The data will be organized by student email and stored as JSON files
The added code is well tested and adheres to common standards.
There are no remaining TODOs in the code

Time to complete (approximately, how much time will this take?)

Add estimate here

Person/People involved (who is working on this?)

Preferably more than one person
@antlet
@Yanqiao4396
@tuduun

Other information

The tool PluginBase can be used in this situation to allow the user to create their own plugins for this part of the tool. This is not a priority for now.

[TASK] Complete README Information

Task Description and Info (what needs to be done?)

Currently the README.md file doesn't contain any relevant information about the project. It's important for an open source project to have good documentation so other people can understand how to use it and contribute to it.

Acceptance Criteria (when is it considered complete?)

Documentation in README.md file describe the current progress made on the project and the plan moving forward
Documentation is free of grammar and spelling errors
The documentation passes Markdown linter

Time to complete (approximately, how much time will this take?)

Add estimate here

Person/People involved (who is working on this?)

Other information

Markdown linter is not currently part of the project dependencies and it might be needed to run it locally or add it to the project workflow.

ModuleNotFoundError: No module named 'pluginbase._internalspace._spd4758f9e43e53645b3305a4ca8031e7c.default'

Parent Task (Where is this being split from)

#

Sub-Task Description and Info (what needs to be done?)

A error that appears when I call the main file with the command poetry run gridgopher. I tried to delete my own file and copy a new one from github. But it's still there.

Acceptance Criteria (when is it considered complete?)

Criteria 1
Criteria 2
Criteria 3

Time to complete (approximately, how much time will this take?)

Add estimate here

Person/People involved (who is working on this?)

Other information

Anything else that doesn't fit in previous categories

[TASK] Add the ability to pass custom arguments to the plugin from a json file

Task Description and Info (what needs to be done?)

The user might want to pass custom arguments from to their written plugin. There is no current support for this in GridGopher, however, adding this feature is simple and straightforward.

Steps and Acceptance Criteria (when is it considered complete?)

1. A CLI argument can accept the path to a json file that contains a variety of values that will be passed as arguments
2. main.py should accept that optional argument, open and read the json file
3. pass the content of the json file if an argument was passed to the plugins' run function
4. a **kwagrs should be included in the run function of the plugin

Time to complete (approximately, how much time will this take?)

Add estimate here

Person/People involved (who is working on this?)

@tuduun

Other information

Anything else that doesn't fit in previous categories

[TASK] Resolve Parallel Test Failures in Tests with API Calls

Task Description and Info (what needs to be done?)

Since some tests make GitHub API calls and modify the state of a sample repository, failures often happen when several test sessions are taking place at the same time. In order to solve this issue, we need to modify how the tests are ran and how assertions are phrased. Until #25 is completed, this issue is on hold.

Acceptance Criteria (when is it considered complete?)

Further investigation needed

Criteria 1
Criteria 2
Criteria 3

Time to complete (approximately, how much time will this take?)

Add estimate here

Person/People involved (who is working on this?)

Other information

Anything else that doesn't fit in previous categories

[TASK] Create GitHub Actions Workflow

Task Description and Info (what needs to be done?)

To ensure that newly added changes are working and are up to standard, a GitHub Actions workflow should be used.
More information about GitHub Actions can be found here

Acceptance Criteria (when is it considered complete?)

Actions are ran after every commit and a PR merge
Workflow runs on at least two operating systems (Ubuntu and Mac)
Workflow should use Python 3.8
Supported linters are ran as part of the action
Test suite is included in the action

Time to complete (approximately, how much time will this take?)

Add estimate here

Person/People involved (who is working on this?)

Other information

Anything else that doesn't fit in previous categories

[TASK]Sample Task

Task Description and Info (what needs to be done?)

Add description here

Acceptance Criteria (when is it considered complete?)

Criteria 1
Criteria 2
Criteria 3

Time to complete (approximately, how much time will this take?)

Add estimate here

Person/People involved (who is working on this?)

Other information

Anything else that doesn't fit in previous categories

[TASK] Create Fixtures to Teardown Test Cases the Use API Calls

Task Description and Info (what needs to be done?)

Problem description

while testing GitHub object in an integration style tests, we make API calls that create issues, files, and pull request on a sample GitHub repo. On a passing run, a test cases performs its action and changes some state of the repo, conducts assertions to check that the action was done correctly, and restores the state of the repository by reverting the actions takes. This last process is known as teardown and in our case it essentially closes the issues, and pull requests, or deletes the created files.

The problem occurs when the test fails preventing the teardown from taking places and polluting the state of the repository, which causes failures in future runs. We need to be able to teardown some tests even when the test case fails.

Possible solution

Pytest provides many ways to mark test cases and create fixtures that execute code before and after a test case is ran. It is a bit tricky to do this but we will need to create a fixture that guarantees that teardown of marked tests take place even on failure.

Acceptance Criteria (when is it considered complete?)

Further investigation is needed before criteria is established

Criteria 1
Criteria 2
Criteria 3

Time to complete (approximately, how much time will this take?)

Add estimate here

Person/People involved (who is working on this?)

Other information

Anything else that doesn't fit in previous categories

Using auto use = false for fixture prevents it from being used automatically

Merged Cell Headers Inconsistency

Describe the bug
When retrieving data from google sheets that contain merged cells as headers, it creates an inconsistent number of headers and columns.

To Reproduce
Using the following data:

And the following configuration:

source_id: 1jMbGVHjXs-lQbh5pstplrCOo5f76C_Nj2SOyL-bsZsQ
sheets:
    - name: Sheet1
      regions:
      - name: lab1
        start: A1
        end: E12
        contains_headers: true

This error is produced:

Traceback (most recent call last):
  File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 982, in _finalize_columns_and_data
    columns = _validate_or_indexify_columns(contents, columns)
  File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 1030, in _validate_or_indexify_columns
    raise AssertionError(
AssertionError: 2 columns passed, passed data had 5 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/home/noboshe/SheetShuttle/SheetShuttle/sheetshuttle/main.py", line 52, in sheetshuttle
    my_plugin.run(sheets_keys_file, sheets_config_directory)
  File "/home/noboshe/SheetShuttle/SheetShuttle/../sample_plugin.py", line 8, in run
    my_collector.collect_files()
  File "/home/noboshe/SheetShuttle/SheetShuttle/sheetshuttle/sheet_collector.py", line 135, in collect_files
    sheet_obj.collect_regions()
  File "/home/noboshe/SheetShuttle/SheetShuttle/sheetshuttle/sheet_collector.py", line 204, in collect_regions
    data = Sheet.to_dataframe(region_data)
  File "/home/noboshe/SheetShuttle/SheetShuttle/sheetshuttle/sheet_collector.py", line 270, in to_dataframe
    return pd.DataFrame(data[1:], columns=data[0])
  File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 721, in __init__
    arrays, columns, index = nested_data_to_arrays(
  File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 519, in nested_data_to_arrays
    arrays, columns = to_arrays(data, columns, dtype=dtype)
  File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 883, in to_arrays
    content, columns = _finalize_columns_and_data(arr, columns, dtype)
  File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 985, in _finalize_columns_and_data
    raise ValueError(err) from err
ValueError: 2 columns passed, passed data had 5 columns

Add contributors

@all-contributors please add @noorbuchi for code

[TASK] Add Contributing Guidlines

Task Description and Info (what needs to be done?)

There is currently no guidelines for new contributors to the project. Adding so would make the project more inviting for other to work with us. A contribution guidelines should include all the necessary information to set up a development environment and being work on the project, Additionally, it should inform the user of various quality assurance such as expected testing, coverage, and linting as well as describe the techniques we currently use to perform them. Other things to add would be how to go about picking an issue to work on and how to reach the developers with any questions. This will be a living document that gets updated frequently as we establish the guidelines for the project. All members of the development team should agree on its contents.

Acceptance Criteria (when is it considered complete?)

The documentation is free of spelling and grammar errors
The documentation passes linter checks
Documentation should be in CONTRIBUTING.md file

Time to complete (approximately, how much time will this take?)

8-10 hours

Person/People involved (who is working on this?)

@antlet

Other information

Anything else that doesn't fit in previous categories

[TASK] Run tests that include API calls in Github Actions only

Task Description and Info (what needs to be done?)

Some test cases are somewhat slow because they make API calls to Google Sheets API or GitHub API. It's not necessary to run theses tests all the time when the developer run poetry run test or poetry run coverage. Instead, the user should specifically mention that all tests should be ran for these tests to run. Pytest, provides an easy way to mark tests in the code and to run specific ones from the CLI.

Acceptance Criteria (when is it considered complete?)

Mark all tests that perform API calls
Update the commands in pyproject.toml to only run unmarked test
Update the commands in GitHub actions to run ALL the tests

Time to complete (approximately, how much time will this take?)

9 hours

Person/People involved (who is working on this?)

@Yanqiao4396

Other information

Anything else that doesn't fit in previous categories

Bug: Pull Request Creation tests Not working

Describe the bug
When trying to manually create a pull request in the test repository, an error message shows that:

Pull request creation failed. Validation failed: cannot have more than 100 pull requests with the same head_sha.

This is causing two test cases that ensure pull request creation is working.

To Reproduce
Steps to reproduce the behavior:

locally run potery run task test-verbose
OR manually create a pull request in the test repository AC-GopherBot/test-1

Expected behavior
The tests are expected to pass.

Add feature to extract the Google Sheet ID automatically from the URL

Currently, the user must manually copy the ID of a Google Sheet into the yaml configuration.

For example, let's say we have a google sheet we want to access under this URL:

https://docs.google.com/spreadsheets/d/1WEycC91Qth9SqWvkNP1-F1Fxt7lwyp9ZJVgLxQEUj3k/edit#gid=0

In order for SheetShuttle to find it, the user must provide the id of the sheet copied from the URL which is this:

1WEycC91Qth9SqWvkNP1-F1Fxt7lwyp9ZJVgLxQEUj3k

We want an easier way to do this where the user can just provide the URL and then SheetShuttle can extract the id with the help of some regular expression or other parsing techniques.

This feature should be tested against many possibilities of URLs that can be copied when using Google sheets.
For example, how different will the URL will be in Read only mode? comment only mode? and many others? We probably should account for those cases.

Add GitHub Credentials and GitHub Configuration directory to Command Line Interface

Task Description and Info (what needs to be done?)

Currently, the command line interface doesn't accept the location of github access token credentials, or the directory location for github configuration. Adding those as optional arguments is important.

Acceptance Criteria (when is it considered complete?)

The CLI accepts the two additional argument in a similar fashion as the Google Sheet arguments are
Defaults values exist for the arguments and each one has a clear description
The values retrieved from the arguments are passed into the run function of the plugin

Time to complete (approximately, how much time will this take?)

3 hrs

Person/People involved (who is working on this?)

@tuduun

Other information

Anything else that doesn't fit in previous categories

Restructure how sheets are accessed in the object oriented structure of the SheetCollector.

From this code snippet:

my_collector = sheet_collector.SheetCollector()
my_collector.collect_files()

We can now access a region named names from a sheet called students like this:

names_region = my_collector.sheets_data["name_of_config_file_used"].regions["students_names"]

However, this is problematic because it's not clear to the user that we automatically format the region name as sheetName_regionName

What we would want to happen instead would be something along the lines of this:

names_region = my_collector.sheets_data["name_of_config_file_used"].sheets["students"].regions["names"]

This is much cleaner to read and understand as well as more conventional approach.

To add this change, many parts of the code base will likely change, including test cases as well as other code that imports and uses the SheetCollector class.

[SUB-TASK] Organize Google sheets into a pandas data frame and prepare to store it

Parent Task (Where is this being split from)

Sub-Task Description and Info (what needs to be done?)

Use pandas data frames to store the data.

Acceptance Criteria (when is it considered complete?)

Pandas is included in the poetry toml file and gets installed using poetry installl
The collected data can be printed as a correctly formatted table in the terminal window
Data can be accessed easily using column names and row numbers

Time to complete (approximately, how much time will this take?)

3 hrs

Person/People involved (who is working on this?)

@antlet

Other information

Anything else that doesn't fit in previous categories

[TASK] Create Plugin Examples of GridGopher

Task Description and Info (what needs to be done?)

The current documentation does not describe how the user can create a plugin and use the infrastructure and API provided by GridGopher. To assist with this issue, example guides can showcase code snippets with explanation of what they do and how the user can use this in their plugin. A simple tutorial of some sort should cover sheets collection and interacting with github using the Grid Gopher API.

Acceptance Criteria (when is it considered complete?)

The tutorial exists in the docs directory and is linked to from README.md
The documentation is clear and free of grammar errors and spelling mistakes
The documentation follows markdown lint rules and passes all the linter checks

Time to complete (approximately, how much time will this take?)

Add estimate here

Person/People involved (who is working on this?)

Other information

Anything else that doesn't fit in previous categories

[TASK] Update GitHub Interactions Schema Documentation

Task Description and Info (what needs to be done?)

schemas.md is currently not up to date with the incoming changes to github interactions. The documentations should be updated to help the user understand how sachems will be validated and the expected keys and values.

Acceptance Criteria (when is it considered complete?)

schemas.md contain a thorough description of all aspects of configuration in github interactions and github objects
- IssueEntry
- PullRequestEntry
- FileEntry
The documentation is clear and free of grammar errors and spelling mistakes
The documentation follows markdown lint rules and passes all the linter checks

Time to complete (approximately, how much time will this take?)

Add estimate here

Person/People involved (who is working on this?)

Other information

Anything else that doesn't fit in previous categories

Restructure how sheets config files are passed to SheetShuttle

Our goal

I am currently working with a team to create a plugin for SheetShuttle that leverages a Discord Bot to allow students and faculty to communicate with the bot to get and display information about the course. For example, our vision is to have the bot be able to leverage multiple different Google Sheets that would represent a course schedule, an assignment list, etc. This would require multiple sheets config files to be present in the config/sheet_sources directory. But with our implementation, we would only be accessing one sheet at a time depending on what the student would be requesting, which conflicts with the way SheetShuttle is implemented.

What we want to change

We want to change it from passing the entire config/sheet_sources directory, to instead specifying exactly which sheets config file we would be using. Although this may complicate the code in some ways, we feel it would be beneficial overall as it would then allow SheetShuttle to support holding multiple sheets config files in the config/sheet_sources without the need to process data from all of those sheets at the same time. Are we misunderstanding this implementation? If not, is this change feasible?

[TASK]Implement Default Plugin

Task Description and Info (what needs to be done?)

Now that the plugin system is working, it's time to implement the default functionality of GridGopher. This means that the plugin will utilize the API created by sheet_collector.py to authenticate the API, collect the data, and process it in a meaningful way. Additionally, the plugin should be able to post content to github.

Add description of the plugin tasks

Acceptance Criteria (when is it considered complete?)

TODO criteria for this task are still needed

The plugin is expected to do the following:

Criteria 1
Criteria 2
Criteria 3

Time to complete (approximately, how much time will this take?)

Add estimate here

Person/People involved (who is working on this?)

Other information

Anything else that doesn't fit in previous categories

Create a Documentation Website with API References and Code Samples

Task Description and Info (what needs to be done?)

Although currently a high priority task, it would be very nice to have a documentation website that contains all the API references, functions, instance variables with clear descriptions. The website should be able to generate documentation automatically and be updated automatically when merges are made to the main branch. Many tools exist to do this but further search is needed to determine which would work best for the project.

Acceptance Criteria (when is it considered complete?)

Need more search before they can be determined

Criteria 1
Criteria 2
Criteria 3

Time to complete (approximately, how much time will this take?)

October 13th (~2 weeks)

Person/People involved (who is working on this?)

@aveetdesai
@brum0505
@ningerson2002
@BillOchieng

Other information

Anything else that doesn't fit in previous categories