gatoreducator / sheetshuttle Goto Github PK
View Code? Open in Web Editor NEW:truck: A plugin friendly tool to connect Google Sheets and GitHub
License: MIT License
:truck: A plugin friendly tool to connect Google Sheets and GitHub
License: MIT License
After the data is retrieved from the API, it should be restructured in an intuitive and stored in JSON format.
Parent task: #5
2 hrs
@tuduun
transferred to @noorbuchi
Anything else that doesn't fit in previous categories
Knowing test coverage is very important in understanding how effective our test suite is. A well known pytest plugin for calculating test coverage is pytest-cov
.
Add estimate here
@
Anything else that doesn't fit in previous categories
Read the remaining parts of the configuration file and come up with ways that give the user more control over what data to get from the sheets (open ended)
pluginbase
with this5 hrs
Plugin base would allow the user to write their own plugins regarding what kinds of data columns to get and how to store them. Needs more investigation
The current project banner looks a bit funny but it's also somewhat misleading since it shows the Go language gophers which could tell the users that this tool is meant for Go. New ideas and designs would be great to replace the gophers and still create a cool look for our project.
Add estimate here
@
Anything else that doesn't fit in previous categories
Currently, we don't have anything discussing the difference between using either approach to store the tokens and authentication data that users get from Github and Google Service accounts. It might be worth investigating each to understand how each one is retrieved and organized in the file system, and how are they passed to SheetShuttle as CLI arguments.
Once the differences are understood, documenting them could help out users in picking. The documentation can be either in the README or in the docs
directory.
We currently have a somewhat of a big problem in SheetShuttle. When the user tries to access cells that are empty, we get different results in the returned data depending on the structure of the table. We also run into big problems when trying to convert that data into a pandas data frame and attempt to specify data types. There is no clear fix for this issue and we need to investigate and discuss things further after exploring the possible options.
The current progress made on the Sheet Collector module has very limited functionality to demonstrate the ability to make calls to the Google Sheets API. More work is needed to collect and organize the data and to allow the user more control over this process. It's important to try and automate as much as possible of this part of the project.
TODO
s in the codeAdd estimate here
Preferably more than one person
@antlet
@Yanqiao4396
@tuduun
The tool PluginBase
can be used in this situation to allow the user to create their own plugins for this part of the tool. This is not a priority for now.
Currently the README.md file doesn't contain any relevant information about the project. It's important for an open source project to have good documentation so other people can understand how to use it and contribute to it.
Add estimate here
@
Markdown linter is not currently part of the project dependencies and it might be needed to run it locally or add it to the project workflow.
#
A error that appears when I call the main file with the command poetry run gridgopher
. I tried to delete my own file and copy a new one from github. But it's still there.
Add estimate here
@
Anything else that doesn't fit in previous categories
The user might want to pass custom arguments from to their written plugin. There is no current support for this in GridGopher, however, adding this feature is simple and straightforward.
main.py
should accept that optional argument, open and read the json file**kwagrs
should be included in the run function of the pluginAdd estimate here
Anything else that doesn't fit in previous categories
Since some tests make GitHub API calls and modify the state of a sample repository, failures often happen when several test sessions are taking place at the same time. In order to solve this issue, we need to modify how the tests are ran and how assertions are phrased. Until #25 is completed, this issue is on hold.
Further investigation needed
Add estimate here
@
Anything else that doesn't fit in previous categories
To ensure that newly added changes are working and are up to standard, a GitHub Actions workflow should be used.
More information about GitHub Actions can be found here
Add estimate here
@
Anything else that doesn't fit in previous categories
Add description here
Add estimate here
@
Anything else that doesn't fit in previous categories
while testing GitHub object in an integration style tests, we make API calls that create issues, files, and pull request on a sample GitHub repo. On a passing run, a test cases performs its action and changes some state of the repo, conducts assertions to check that the action was done correctly, and restores the state of the repository by reverting the actions takes. This last process is known as teardown and in our case it essentially closes the issues, and pull requests, or deletes the created files.
The problem occurs when the test fails preventing the teardown from taking places and polluting the state of the repository, which causes failures in future runs. We need to be able to teardown some tests even when the test case fails.
Pytest provides many ways to mark test cases and create fixtures that execute code before and after a test case is ran. It is a bit tricky to do this but we will need to create a fixture that guarantees that teardown of marked tests take place even on failure.
Further investigation is needed before criteria is established
Add estimate here
@
Anything else that doesn't fit in previous categories
Using auto use = false for fixture prevents it from being used automatically
Describe the bug
When retrieving data from google sheets that contain merged cells as headers, it creates an inconsistent number of headers and columns.
To Reproduce
Using the following data:
And the following configuration:
source_id: 1jMbGVHjXs-lQbh5pstplrCOo5f76C_Nj2SOyL-bsZsQ
sheets:
- name: Sheet1
regions:
- name: lab1
start: A1
end: E12
contains_headers: true
This error is produced:
Traceback (most recent call last):
File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 982, in _finalize_columns_and_data
columns = _validate_or_indexify_columns(contents, columns)
File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 1030, in _validate_or_indexify_columns
raise AssertionError(
AssertionError: 2 columns passed, passed data had 5 columns
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
return get_command(self)(*args, **kwargs)
File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/typer/main.py", line 497, in wrapper
return callback(**use_params) # type: ignore
File "/home/noboshe/SheetShuttle/SheetShuttle/sheetshuttle/main.py", line 52, in sheetshuttle
my_plugin.run(sheets_keys_file, sheets_config_directory)
File "/home/noboshe/SheetShuttle/SheetShuttle/../sample_plugin.py", line 8, in run
my_collector.collect_files()
File "/home/noboshe/SheetShuttle/SheetShuttle/sheetshuttle/sheet_collector.py", line 135, in collect_files
sheet_obj.collect_regions()
File "/home/noboshe/SheetShuttle/SheetShuttle/sheetshuttle/sheet_collector.py", line 204, in collect_regions
data = Sheet.to_dataframe(region_data)
File "/home/noboshe/SheetShuttle/SheetShuttle/sheetshuttle/sheet_collector.py", line 270, in to_dataframe
return pd.DataFrame(data[1:], columns=data[0])
File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 721, in __init__
arrays, columns, index = nested_data_to_arrays(
File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 519, in nested_data_to_arrays
arrays, columns = to_arrays(data, columns, dtype=dtype)
File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 883, in to_arrays
content, columns = _finalize_columns_and_data(arr, columns, dtype)
File "/home/noboshe/.cache/pypoetry/virtualenvs/sheetshuttle-PeNuXww9-py3.9/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 985, in _finalize_columns_and_data
raise ValueError(err) from err
ValueError: 2 columns passed, passed data had 5 columns
@all-contributors please add @noorbuchi for code
There is currently no guidelines for new contributors to the project. Adding so would make the project more inviting for other to work with us. A contribution guidelines should include all the necessary information to set up a development environment and being work on the project, Additionally, it should inform the user of various quality assurance such as expected testing, coverage, and linting as well as describe the techniques we currently use to perform them. Other things to add would be how to go about picking an issue to work on and how to reach the developers with any questions. This will be a living document that gets updated frequently as we establish the guidelines for the project. All members of the development team should agree on its contents.
CONTRIBUTING.md
file8-10 hours
Anything else that doesn't fit in previous categories
Some test cases are somewhat slow because they make API calls to Google Sheets API or GitHub API. It's not necessary to run theses tests all the time when the developer run poetry run test
or poetry run coverage
. Instead, the user should specifically mention that all tests should be ran for these tests to run. Pytest, provides an easy way to mark tests in the code and to run specific ones from the CLI.
pyproject.toml
to only run unmarked test9 hours
Anything else that doesn't fit in previous categories
Describe the bug
When trying to manually create a pull request in the test repository, an error message shows that:
Pull request creation failed. Validation failed: cannot have more than 100 pull requests with the same head_sha.
This is causing two test cases that ensure pull request creation is working.
To Reproduce
Steps to reproduce the behavior:
potery run task test-verbose
AC-GopherBot/test-1
Expected behavior
The tests are expected to pass.
Currently, the user must manually copy the ID of a Google Sheet into the yaml configuration.
For example, let's say we have a google sheet we want to access under this URL:
https://docs.google.com/spreadsheets/d/1WEycC91Qth9SqWvkNP1-F1Fxt7lwyp9ZJVgLxQEUj3k/edit#gid=0
In order for SheetShuttle to find it, the user must provide the id of the sheet copied from the URL which is this:
1WEycC91Qth9SqWvkNP1-F1Fxt7lwyp9ZJVgLxQEUj3k
We want an easier way to do this where the user can just provide the URL and then SheetShuttle can extract the id with the help of some regular expression or other parsing techniques.
This feature should be tested against many possibilities of URLs that can be copied when using Google sheets.
For example, how different will the URL will be in Read only mode? comment only mode? and many others? We probably should account for those cases.
Currently, the command line interface doesn't accept the location of github access token credentials, or the directory location for github configuration. Adding those as optional arguments is important.
3 hrs
Anything else that doesn't fit in previous categories
From this code snippet:
my_collector = sheet_collector.SheetCollector()
my_collector.collect_files()
We can now access a region named names
from a sheet called students
like this:
names_region = my_collector.sheets_data["name_of_config_file_used"].regions["students_names"]
However, this is problematic because it's not clear to the user that we automatically format the region name as sheetName_regionName
What we would want to happen instead would be something along the lines of this:
names_region = my_collector.sheets_data["name_of_config_file_used"].sheets["students"].regions["names"]
This is much cleaner to read and understand as well as more conventional approach.
To add this change, many parts of the code base will likely change, including test cases as well as other code that imports and uses the SheetCollector class.
Use pandas data frames to store the data.
3 hrs
Anything else that doesn't fit in previous categories
The current documentation does not describe how the user can create a plugin and use the infrastructure and API provided by GridGopher. To assist with this issue, example guides can showcase code snippets with explanation of what they do and how the user can use this in their plugin. A simple tutorial of some sort should cover sheets collection and interacting with github using the Grid Gopher API.
README.md
Add estimate here
@
Anything else that doesn't fit in previous categories
schemas.md
is currently not up to date with the incoming changes to github interactions. The documentations should be updated to help the user understand how sachems will be validated and the expected keys and values.
schemas.md
contain a thorough description of all aspects of configuration in github interactions and github objects
Add estimate here
@
Anything else that doesn't fit in previous categories
I am currently working with a team to create a plugin for SheetShuttle that leverages a Discord Bot to allow students and faculty to communicate with the bot to get and display information about the course. For example, our vision is to have the bot be able to leverage multiple different Google Sheets that would represent a course schedule, an assignment list, etc. This would require multiple sheets config files to be present in the config/sheet_sources
directory. But with our implementation, we would only be accessing one sheet at a time depending on what the student would be requesting, which conflicts with the way SheetShuttle is implemented.
We want to change it from passing the entire config/sheet_sources
directory, to instead specifying exactly which sheets config file we would be using. Although this may complicate the code in some ways, we feel it would be beneficial overall as it would then allow SheetShuttle to support holding multiple sheets config files in the config/sheet_sources
without the need to process data from all of those sheets at the same time. Are we misunderstanding this implementation? If not, is this change feasible?
Now that the plugin system is working, it's time to implement the default functionality of GridGopher. This means that the plugin will utilize the API created by sheet_collector.py
to authenticate the API, collect the data, and process it in a meaningful way. Additionally, the plugin should be able to post content to github.
Add description of the plugin tasks
TODO
criteria for this task are still needed
The plugin is expected to do the following:
Add estimate here
@
Anything else that doesn't fit in previous categories
Although currently a high priority task, it would be very nice to have a documentation website that contains all the API references, functions, instance variables with clear descriptions. The website should be able to generate documentation automatically and be updated automatically when merges are made to the main branch. Many tools exist to do this but further search is needed to determine which would work best for the project.
Need more search before they can be determined
October 13th (~2 weeks)
@aveetdesai
@brum0505
@ningerson2002
@BillOchieng
Anything else that doesn't fit in previous categories
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.