alexander-bauer / swirlypy Goto Github PK

Python3 rewrite of swirl

License: GNU General Public License v3.0

Python 100.00%

swirlypy's Introduction

Foreword (A Warning)

Quoting @WilCrofter on Swirlypy, April 09, 2019, in response to a recetly-opened issue:

Swirlypy was a initial prototype created one afternoon by @alexander-bauer at the behest of @WilCrofter (me) and @reginaastri, two of the original swirl developers. We have never followed up on swirlypy and, other than proof-of-concept material, there is no real course material associated with it.

We left the prototype up on GitHub in case a future developer wanted to follow up. Swirl's lead developer, @seankross, has expressed interest in that possibility, but the likely candidates have been preoccupied with other Projects.

Swirl itself is currently a very mature project with a great deal of course material. Swirl uses the R programming language and emphasizes statistics and data science.

If you are primarily interested in interactive coursework and not committed to Python, I'd suggest looking into swirl.

If you are committed to Python and primarily interested in course material, I'd suggest looking into Jupyter beginning with the examples at Binder which can be used in a browser.)

Of course you are welcome to pick up swirlypy as a developer, but I would again suggest looking a Swirl or Jupyter first.

As of this writing in April of 2019, Swirlypy is a proof-of-concept that has been left undeveloped for a handful of years. Though it and the Swirl project that it was inspired by have been important in my life, I can no longer claim to be an active maintainer of Swirlypy.

For any developers who are interested in the prospect of continuing Swirlypy where I have left it off, I am still alive and well, and available through GitHub and email to answer questions and justify my design choices. The code, though dense in some places, is commented reasonably well, and engineered initially with extensibility in mind. (Perhaps it was over-engineered and over-designed. Only time would tell.)

For Developers

swirlypy is a Python package, meaning that its directory must be located somewhere in your Python path. For individuals with sane directory structures, this likely means temporarily adding the path to the directory above swirlypy's to your $PYTHONPATH. Alternatively, you could add a symlink from an existing Python directory. Eventually, we should be able to install swirlypy as a package and avoid this issue, but for the moment this is the workaround.

Creating a Course

Swirlypy courses are distributed as tar archives (compressed or not) with a particular directory structure. They are required to have a course.yaml file, which describes the course in general. In addition, they must contain a lessons directory, with lesson files (see below).

Running a Course

For the purposes of development and testing, it is possible to run Swirlypy in a Python3 virtual environment. These are some steps, from the repository root:

virtualenv -p python3 env
env/bin/pip install --editable .

env/bin/swirlytool run courses/intro

If you activate the virtual environment with env/bin/activate, you won't need to specify env/bin/ before pip or swirlytool.

Note: Remember to specify the containing directory, not course.yaml, for unpackaged courses.

Course Data

The course.yaml file must be present in the root of the course, and contain the following fields: course (course title), lessonnames (list of human-readable lesson names), and author (human readable author name or names). It may also contain: description (explanatory text), organization (name of the course's sponsoring organization), version (a string, usually of numbers), and published (a timestamp in YAML format). An example is available here.

Lesson files

Lessons are YAML files contained in the lessons/ subdirectory. Their filenames are "sluggified," meaning that all non-ascii characters are replaced by dashes, and all ascii characters are lowercased. For example, a lesson called "Basics in Statistics" will be in a file named basics-in-statistics.yaml.

Each lesson is, itself, simply a list (what YAML calls a sequence) of questions. Fields at the root of lessons are not case sensitive, and an example lesson can be seen here.

Questions

Questions are, under the hood, all descended from a particular Python class. As such, they share certain properties, including the way they are parsed from YAML. Fields at the root are not case sensitive, and they are used as keyword arguments to construct Questions matching the listed category. For example, a Question of the "text" category will construct a TextQuestion.

The exact fields required by each question are determined by the type of question, but they at least require Category and Output. All of the questions in the standard library can be found here.

Furthermore, new questions can be defined within courses by placing them within a questions subdirectory, the same as with the standard library.

Packaging your Course

The swirlytool application that comes with Swirlypy is capable of packaging a course by using the create subcommand. This produces a Swirlypy course file, which is just a gzipped tar file with a particular format.

swirlypy's People

Contributors

Stargazers

Watchers

Forkers

cookiemaster9999 panduka sandeepk17 ajschumacher veskerb ztk13 manazevedof cobym rickhenderson epsimatic88 petrichorcode kkc-krish stacks8096 joseph-r-hamilton rr-razera angela080607 ritapanda9009 jessica-wyleung pyoelii

swirlypy's Issues

Write a fill-in-the-blanks question type which tolerates typos.

Peter Norvig wrote a 21 python line spelling corrector which covers 80%-90% of English misspellings. The core of it is edits1:

def edits1(word):
   splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
   deletes    = [a + b[1:] for a, b in splits if b]
   transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
   replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
   inserts    = [a + c + b     for a, b in splits for c in alphabet]
   return set(deletes + transposes + replaces + inserts)

Incorporating a simple spelling correct such as this one could make fill-in-the-blanks questions viable.

Recording.py handles name errors by printing traceback but exits console interact

Just one question in this lesson:

Swirlypy Test Lessons by Swirlypy Authors
1: Debugging
Selection: 1
Just type a valid command.
>>> x
Traceback (most recent call last):
  File "<console>", line 1, in <module>
NameError: name 'x' is not defined

ast:  Interactive(body=[Expr(value=Name(id='x', ctx=Load()))])

added:  {}
changed:  {}
removed:  {}

values:  []
coursedir:  ../swirlypy_test_lessons

Lesson complete!

Create common "UserShell" virtual question

This would capture a request each time the user enters a command, parse it as an AST, and capture the result if appropriate, then pass it to test_result to be evaluated.

Trap parse error caused by empty command

What is the square of 256?
>>> 
  File "<console>", line 0

   ^
SyntaxError: unexpected EOF while parsing

Allow randomization of question order in Lesson

I'm not sure how to implement this.

Work towards a flexible grading system

Questions should at least be able to collect statistics and numbers, such as number of tries before correct answer. Maybe what the incorrect answers were, in some cases.

I've just pushed some new changes that add a mutable 'data' field that's passed nicely between objects, so we'll be able to make use of that to collect statistics.

Decouple methods for user input from program control.

The approach sketchily indicated in question_classes.py couples methods for getting user input with other stages of processing a question, e.g., with testing, and with control. A clean, extensible design should decouple function wherever practical.

I suggest we chuck this approach, borrowed from R swirl, in favor simple classes solely for getting user input. These could be coordinated with other basic functions, e.g., testing, at a higher level.

E.g., perhaps a user response should be an object with a single method, get.

Consider passing around environments between questions

We already have a mechanism for this: data. We could have a data["userenv"] or something along those lines managed in a standard way across questions. That would allow questions within a lesson to share the environment.

Create issue type to capture shell expressions

There are two types of questions frequently discussed that involve capturing shell output. Here, I am specifically referring to expressions, which don't assign their value to anything. For example, x**2, as opposed to y = x**2. I want to be able to treat expressions like that as submissions, without requiring the user to explicitly exit the shell.

Most of the code, as of this writing, exists, including the InteractiveConsole callbacks, and all of the fun stuff. However, at its core, InteractiveConsole uses exec, which is notoriously difficult to get output from. My best advance thus far has been to use AST to edit any line of code marked as an Expr, and append that to a special value. Documentation is sparse, though.

Make it easier to require() attributes

In the Question class, there's a require() function intended to make it simple to require particular attributes in a finished Question subclass. However, using it requires overriding __init__(). It would be preferable to just set _requires_ to a list, or something along those lines.

Document information on creating and packaging courses

Make parsing of YAML input a bit saner

As it stands, it's somewhat nonstandard across questions to make use of YAML's abilities, such as producing Lists. Instead, this is implemented in some places as splitting a string by semicolon. We should eliminate this behavior and replace it with use of YAML-standard features.

Plugins for questions and tests

We'll be wanting a plugin system so that course authors, as well as swirlypy core contributors, can easily produce new types of questions which test user answers in new ways. Ideally, we could do this by dropping Python source in a directory and defining classes.

Add publication date field to courses

Possible code jam topics as issues?

I had the half-serious idea of a swirlypy-enhanced stack exchange. UMBC students would post Q&A's and developers would code swirlypy units for problems which rise to the top. According to a casual remark, prototyping such a site would make a good weekend project.

Though not meaning to push that particular topic, I like the idea of swirlypy code jams. It's not hard to think up topics, and it seems natural to post them here with code jam tags.

Requesting comment.

Problems installing in windows

Hi,

Im trying to install swirlypy in windows, but I got one problem.
Once the pip install --editable . Successfully installed swirlypy, the file swirlytool that is created in the environment has a size of 1Kb and when I tried to execute swirlytool run courses/intro got this error:
"swirlytool" is not recognized as an internal or external command,
program or batch file executable.

Can I get some help?
Thanks.

Show warning if question requirements not explicitly set

As it stands, _required_ is set to [] by default for all Questions. However, basic_selftest contains a section to show a warning if the question requirements are not explicitly set. Because of this default, basic_selftest will never show a warning. We should figure out a way to ensure that the warning can be shown if _required_ is not set, but for things to otherwise not fail.

Smooth out archival packaging of courses

Ideally, courses could be distributed as a single file. That would be a .tar.gz or .zip file, preferably the former, containing the entire course directory. The Course.load() method could be extended to handle these easily using the tarfile module.

A sensible directory structure within the archive might be

.
└── test
    ├── course.yaml
    └── lessons
        ├── basics.yaml
        └── advanced.yaml

The main downside to this is that in order to load the archive, you have to figure out its subdirectory's name. In this case, that is test. It would make it very simple to extract them, though, if they were to be shared. To solve the first issue, we could enforce that test.tar.gz's subdirectory's name is test. That could always be a slugified version of the Lesson title.

Recording.py allows only single line input without error.

Recording.py allows only single line input without error. Thus, attempting to enter

>>> def fct():
       print("hi")

results in a fatal error.

Does Recording.py really need to deepcopy twice?

Commit 119e4858 fixes a bug causing failure to detect changes to dictionaries, lists, etc., made in response to shell questions.

Replacing self.locals.copy by deepcopy(self.locals) on line 124, Recording.py, was a quick fix but results in 2 separate invocations of deepcopy in the same function, hence is likely redundant and inefficient.

This issue is meant as a reminder to revisit the fix as soon as time permits.

Modernize NewVariableQuestion

As it stands, NewVariableQuestion asks the user to submit their current environment with CTRL-D, but recent advances into lexical parsing and InteractiveConsole control flow could make this submission automatic.

f.shell-resubclass: locals not restored correctly after incorrect assignment.

In branch f.shell-resubclass, using courses/test/Basics for testing reveals a bug. When x=2_x is correct but x=3_x is given, x is not restored to its correct value.

Improve course and lesson self-tests

We need a couple things to say that self-tests are working, although the infrastructure is now in place.

Courses need to self-test for metadata
The validate() tool needs to be more robust
All default questions need to include automatic self-tests

To add a self-test to a Question class, define validate(self), which returns an error message, if any. Otherwise, none.

NewVariableQuestion reaches exception

Here we create a variable for use in subsequent questions. Set x=10
Press CTRL-D to submit.
>>> 
set()
Traceback (most recent call last):
  File "utils/swirlytool.py", line 115, in <module>
    sys.exit(main(parse(sys.argv[1:])))
  File "utils/swirlytool.py", line 86, in main
    args.func(args)
  File "utils/swirlytool.py", line 41, in run
    course.execute()
  File "/home/sasha/dev/python/swirlypy/course.py", line 120, in execute
    self.execute_lesson(identifier)
  File "/home/sasha/dev/python/swirlypy/course.py", line 139, in execute_lesson
    data = lesson.execute()
  File "/home/sasha/dev/python/swirlypy/lesson.py", line 20, in execute
    new_data = question.execute(data=data)
  File "/home/sasha/dev/python/swirlypy/question.py", line 117, in execute
    testresult = self.test_response(resp, data=data)
  File "/home/sasha/dev/python/swirlypy/questions/NewVariable.py", line 20, in test_response
    for newval in mustaddvals:
TypeError: 'int' object is not iterable

globals() rather than self.locals?

print(globals()) rather than print(self.locals), since the user works on globals?

Actually, I don't think the callback is actually firing when I enter, e.g, a=5. Change body to print("hi")or print(globals()) to see. We probably have to override some console method, or register a listener or something.

I failed to notice that hookcount, which doesn't print, was the callback. (Pity the poor lawn maintenance crew.) Works perfectly. Elegant. Closing issue.

Ensure that each question type requires the appropriate fields

lesson.py: adddata should probably be new_data

In lesson.py, adddata as used below, is probably a typo. new_data was probably meant.

            new_data = question.execute(data=data)
            if type(new_data) == dict:
                data.update(adddata)

Decide on a license

I suggest GPLv3.

Decide on standard place for userdata

There is a variable data passed between Questions within a Lesson, but it should not be used directly for storing user data. There should be a sub-variable within it to contain locals, and another for storing pristine data.

Determine whether to capture `None` in GetValueQuestion

Internationalization

Change `correctanswer` to `answer` in MultipleChoiceQuestion?

Installation and distro packaging

We should design an installation method, such as the common setup.py. We should also think about future packaging for Linux distributions.

Add some user interface snazz

I was testing a course earlier today, and it occurred to me that the questions become sort of illegible when there are a lot of them. We would do well to work on this a bit. Here are my suggestions:

Indent and space automatically
Page output automatically
Colorize output

Add all of the metadata to intro course

I finished patching up the YAML of the Intro course presented by @WilCrofter and @reginaastri. We need to add all of the appropriate metadata to its course.yaml file, under courses/intro/.

Make sure to also add the description field.

About testing user responses

R swirl's default, omnitest, was a stopgap measure which became a de facto standard. It is very limited and best not repeated. For an initial prototype, we should do a few simple tests, but be careful not to circumscribe "long tail" cases as described below.

Testing user responses resembles unit testing. To a first approximation, there are two objects to test, the expression which a user entered and the result of its evaluation.

IMO, many, many cases would be covered simply by comparing the user's result with a precomputed, correct result. Since, in a valid lesson, correct answers must pass their associated tests (ht @reginaastri,) a procedure to generate a dictionary of correct results could also serve as a validity check on lessons.

When expressions have side effects (e.g., plots,) but no direct effects such as return values, the user's expression must be checked instead. Insisting that the user's expression essentially match the instructor's preconception is too restrictive. R swirl users complain about it all the time. I believe regular expressions would cover most cases we've seen, e.g., testing if a particular function was used. Functions to construct common regex's could be provided. R swirl, for instance, provides the equivalent of regex | with the test any_of_exprs(...).

Simple checks like matching a correct result or regular expression would handle 80-90% of demand, and would surely be fine for a first prototype. However, there is a long and interesting tail. A trivial case is when a user is asked to generate 100 random numbers. There's no correct result for that question or for any subsequent questions which depend on it. Daphne Koller (Coursera's cofounder) gave the impressive example of a simple test of color balance which suggested a whole class of image processing questions suitable for MOOCs.

Eventually, custom tests should be accommodated. At this stage, I just want to scope the issue and leave the door open.