Welcome to the PyATL User Group Information Repo!
pyatl / jam-sessions Goto Github PK
View Code? Open in Web Editor NEWPython Atlanta Jam Sessions
Python Atlanta Jam Sessions
Welcome to the PyATL User Group Information Repo!
This challenge will be more of an educational one. The exercise will be to build our own UTF-16 decoder.
The problem will be divided into parts of increasing difficulty. In order:
chr
, str.join
.bytes
input instead of a list.EXAPUNKS is a 2018 video game by Zachtronics. It is set in a cyberpunk past
where all technology works using swarms of small programs, known as EXAs
(EXecution Agents). The principle of the game is to program your EXAs using an
assembly-like language, and accomplish various (illicit) jobs. It is a
challenging but accessible and immersive game, and I strongly recommend trying
it out.
For today's coding challenge, we will be building a simulator that can read and
execute a program written in the EXA language.
Disclaimer: EXAPUNKS and its contents, including the EXA language and the
excerpts from the TRASH WORLD NEWS manual used in this document, are a
property of Zachtronics LLC.
Following the tradition of Zachtronics' previous programming games, the manual
of EXAPUNKS is a core element of the gameplay (and you're encouraged to print
it on physical paper). Let's refer to the said manual (or, as it calls itself,
TRASH WORLD NEWS issue 1) for some explanations:
Every EXA contains code and registers.
CODE: This is a list of instructions that tell an EXA what to do. It's
written in a special computer language specifically designed for them. We'll
dig into the language in the following sections.
REGISTERS: Think of these are slots that can store numbers. Registers
can be read and written to by instructions in your code.
An EXA has three registers:
X
register is a general-purpose storage register. It can store aT
register is a general-purpose storage register like X
. It is alsoTEST
instructions (challenge #2), and is theF
. Its operation will be detailed inThrough every instruction description in this challenge, the following
abbreviations will be used to represent required operands:
R
: A registerR/N
: A register, or a number between -9999 and 9999L
: A label defined by a MARK
pseudo-instruction (see challenge #3)For the first part of the problem, we will be focusing on the very basic
features of the language, such as registers and arithmetics:
COPY R/N R
ADDI R/N R/N R
SUBI R/N R/N R
ADDI
, for substraction.MULI R/N R/N R
ADDI
, for multiplication.DIVI R/N R/N R
ADDI
, for integral division.MODI R/N R/N R
ADDI
, for modulo.Write a program that can understand a piece of code with any of the above
instructions and run it.
If you are relatively new to programming, this could be a steep challenge to
get into. Here are a few ideas to help you start out:
It is generally a good idea to start by identifying the inputs and outputs
of the program. Here, the input is the code for the EXA, which will be just
plain text. No need to worry about outputs for now; printing the register
values after each step should be enough. Don't hesitate to add more print
lines as needed.
The first piece of code you'll need is one that can read a line of EXA code
and make sense of it. For example, "ADDI 30 X T"
could be converted into
("ADDI", [30, "X", "T"])
or any other structure of your choice. This will
make it much easier to write the actual running logic. It is at this step
that you could check if instructions and their operands are valid.
Before implementing the instructions, think about how you want to handle
the registers. Here are some suggestions:
X
and T
are attributes ofIf you manage to design the code parsing and register handling carefully,
actually implementing the instructions should be fairly easy! The only
detail is that you will need to be careful about whether an argument is a
number or a register name when both are allowed, and choose the correct
behavior.
Here is an example:
COPY 70 X
ADDI X 1 X
COPY 3 T
MULI T X T
SUBI T 1 T
Decomposed, this program will:
X
to 70X
and store that value back in X
T
to 3X
by T
and store that value in T
T
and store the result in T
At the end, X
should hold 71 and T
should hold 212.
Here is another example for you to try your code on:
COPY 647 X
MODI X 7 T
DIVI X T X
MULI T T T
MULI X T X
MULI T T T
ADDI X T X
DIVI T 10 T
ADDI X 3 X
ADDI T X T
ADDI T X X
SUBI X T T
SUBI X T X
SUBI X T X
What are the values of the registers at the end? As a bonus, can you identify
what operation the last five instructions are effectively doing?
I mentionned earlier that the T
register was used for tests. This part of
the challenge will be to program the TEST
instruction:
TEST R/N = R/N
T
register to 1, otherwise set the T
register<
(less than) and >
(greater than)Here is an example of TEST
:
COPY 10 X
COPY X T
TEST X = T
SUBI X T T
TEST X > T
TEST T < 1
This will:
X
to 10T
to the value of X
X
and T
are equal. This is true, so T
is set to 1.T
from X
, store it in T
(now 9).X
is greater than T
. This is true again, T
is set to 1.T
is smaller than 1. This is false, so T
is set to 0.So far it might not look very useful, but the next challenge will fix that.
So far our program pointer only moves through the program from top to bottom,
and doen't have much in the way of actual logic. Time to introduce some useful
flow control instrustions:
MARK L
MARK
is a pseudo-instruction andJUMP L
TJMP L
T
register equals 1 (or any value otherTEST
result that was true.FJMP L
T
register equals 0. This correspondsTEST
result that was false.If the same label is marked multiple times, the program is entirely invalid
and should fail to parse or compile.
Note: Be careful to not write any infinite loops!
Let me walk you through a simple example:
COPY 1 X
COPY 7 T
MARK LOOP
MULI T X X
SUBI T 1 T
TJMP LOOP
X
to 1T
to 7.LOOP
.X
by T
and store that in X
.T
by 1.T
is not 0, jump back to LOOP
(step 4.).This effectively calculates 7! (factorial of 7), which should be 5040.
Can you implement a program that tests the Collatz conjecture for a number
of your choice? It doesn't need to output the number of steps or anything, for
now try to implement a program that goes through the sequence until it ends.
All that programming is no good if it cannot input and output. Thankfully, EXAs
can manipulate files and leave traces.
In our simplified emulator, a file is simply a list of numbers with a name.
To use a file, an EXA must first grab that file by its name. Once done, it has
to drop it before using a different file.
Here is also where the F
register comes in play. The reference guide reads:
The
F
register allows an EXA to read and write the contents of a held file.
When an EXA grabs a file, its "file cursor" will be set to the first value in
the file. Reading from theF
register will read this value; writing to the
F
register will overwrite this value. After reading or writing theF
register, the file cursor will automatically advance. Writing to the end of
the file will append a new value instead of overwriting.
Note: This is not the same file handling as the one you'd do with open
in Python. These files are virtual; lists which exist only in the emulator.
You will need to add them to your simulation.
The file manipulation instructions to implement are:
GRAB R/N
FILE R
SEEK R/N
SEEK
would move the file cursor past the biginning or end of the fileVOID F
DROP
TEST EOF
T
T
register to 0.Note that trying to access this register when the EXA isn't holding a file will
result in an "Invalid F
register access" error, causing the EXA to crash and
terminate. If an EXA tries to open a file that does not exist (or is otherwise
not available), it also crashes.
Modify your emulator to support file handling. On top of the code to run, it
will need to be initialized with files (and their contents). It will need to
return the file contents at the end of execution.
Let's look at a simple file handling program. In this scenario, the environment
starts with two existing files: one called 100 with a list of numbers, and a
second one called 200 that's empty.
GRAB 100
MARK FILE_READ
ADDI F X X
TEST EOF
FJMP FILE_READ
DROP
GRAB 200
COPY X F
DROP
This time, try to walk through the instructions yourself. This script reads the
numbers in 100, sums them, and writes the sum to 200.
Here's the final code I want you to test your emulator against. The virtual
world is expected to contain a file with the ID 400 that is initially empty.
GRAB 400
COPY 1 X
MARK A
SEEK -9999
ADDI X 1 X
TEST X < 50
FJMP D
MARK B
TEST EOF
TJMP C
MODI X F T
FJMP A
JUMP B
MARK C
COPY X F
JUMP A
MARK D
DROP
What are the contents of the file 400 at the end of the program? Can you
identify what calculation this program is doing?
If you enjoyed playing with the EXA language, then once again I recommend
purchasing and playing EXAPUNKS. In the real game, a single EXA is only
one of potentially many interconnected EXAs! This adds a whole new level
of richness to the idea, of which these challenges only scratched the surface.
Zachtronics' world of EXAs has a lot more to offer.
If you're already an EXAPUNK veteran, you will have already noticed the
simplifications made in this exercise. Here are some more advanced features
you could implement from the game:
SWIZ
, HALT
, NOTE
, NOOP
, RAND
)@
macro instructions for code repetitionLINK
and HOST
MAKE
and DROP
(create/delete files)REPL
and KILL
support for forking and terminationM
register and MODE
The idea of this is to go through 4 phases where you create increasingly more difficult CSV files to parse
First input will be Easy_CSV.csv that will require
import csv
with open('Easy_CSV.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
print(row)
then it will show names and integers from the CSV files
second Input will be Intermediate.csv that will require more effort
then a hard.csv then a hellmode.csv
Back in the days of Teletypes, the Baudot code was the standard protocol for most telecommunications. It was patented in 1870 by the French engineer Émile Baudot, and was later modified by the New-Zealander engineer Donald Murray. This later code remained in use through most of the first half of the XXth century.
For more information, see https://en.wikipedia.org/wiki/Baudot_code
The Baudot code is a 5-bit binary code, usually represented on a ribbon of punched tape.
You'll notice that such a code has only 32 possible combinations, which wouldn't be enough for the 26 letters and the ten digits… The trick is that the code has actually multiple tables, with special control characters to switch between each mode. To put it simply, to read the code you need to know at any time whether you're in "Letter mode" or "Symbol mode", and have to switch from one to another when needed.
In this exercise, you will be implementing a decoder for the ITA2 US variant. The full table is on the next page.
The code will need to be a function that:
Takes in a string that looks like tape. There will be one code per line. That code uses underscore (_) for 0 and star (*) for 1
returns a plain text string
As a bonus, you can make an encoder for the code as well!
Code Letter mode Figure mode
_____ Null (\x00) Null (\x00)
___*_ CR (\r) CR (\r)
_*___ NL (\n) NL (\n)
__*__ Space Space
***_* Q 1
**__* W 2
*____ E 3
_*_*_ R 4
____* T 5
*_*_* Y 6
***__ U 7
_**__ I 8
___** O 9
_**_* P 0
**___ A -
*_*__ S Bell (\x07)
*__*_ D $
*_**_ F !
_*_** G &
__*_* H #
**_*_ J '
****_ K (
_*__* L )
*___* Z "
*_*** X /
_***_ C :
_**** V ;
*__** B ?
__**_ N ,
__*** M .
**_** Figures Figures
***** Letters Letters
input_hw = """
*****
__*_*
*____
_*__*
_*__*
___**
__*__
**__*
___**
_*_*_
_*__*
*__*_
**_**
*_**_
"""
def test_hello_world():
assert baudot_decode(input_hw) == "HELLO WORLD!"
Write code that converts a number into its Roman Numeral notation.
The Roman numeral notation uses letters to represent multiples of ten and five, the common ones being:
Letter | Value |
---|---|
I |
1 |
V |
5 |
X |
10 |
L |
50 |
C |
100 |
D |
500 |
M |
1000 |
The multiples of 4 and 9 are generally represented by subtracting one from the next value, e.g.:
Letters | Value |
---|---|
IV |
4 |
IX |
9 |
XL |
40 |
XC |
90 |
CD |
400 |
CM |
900 |
When representing the numbers from 1 to 10, this looks like:
I, II, III, IV, V, VI, VII, VIII, IX, X
The input numbers will be between 1 and 3,999 included (the largest value the standard notation can represent).
See the Wikipedia page for details, and a bunch more examples: https://en.wikipedia.org/wiki/Roman_numerals#%22Standard%22_forms
Here are some extra features you can implement as a challenge:
Proposal for Python ATL Jam session
You probably know what a barcode looks like, and use it daily. In this exercise, we will be writing code that can convert the sequence of bars and spaces into the data it represents.
For most products sold in the USA, barcodes follow the UPC (Universal Product Code) specification (or symbology). Visually, a UPC barcode looks like a series of bars of various widths, and the spaces between those bars also vary in width. This is actually a binary code that represents a number.
A UPC barcode can be divided into sections (blocks) of equal width. We will be representing a block occupied by a bar with 1, and one occupied by a space with 0.
Consequently, the barcode below can be modeled by:
101001110100010110110011011001101110110001101010101
In the first and second parts of the exercise, we will write code for decoding sequences of 7 blocks into digits. In the third part, we will use that knowledge to read full UPC-A codes. In the fourth part, we will check their validity using checksums. In the fifth and final part, we will be expanding our UPC-A decoder to the broader EAN-13 symbology using parity decoding.
In all UPC variants, the digits of the code are encoded as sequences of seven blocks:
code | digit |
---|---|
0001101 |
0 |
0011001 |
1 |
0010011 |
2 |
0111101 |
3 |
0100011 |
4 |
0110001 |
5 |
0101111 |
6 |
0111011 |
7 |
0110111 |
8 |
0001011 |
9 |
Exercise 1: Write code that can convert a sequence of seven blocks into a digit.
Tips:
code = "0111011"
.You may notice that all the codes above start with 0 and end with 1. This is because UPC is what's called a "continuous" symbology. There always is a transition on digit boundaries. Those codes are said to be Left (or L) encoded.
But in the case of the common UPC-A barcode, only the first six of the twelve digits are encoded that way. The second half of the digit codes start with 1 and end with 0.
So we need new codes for the right-side digits. The way UPC solves this is by having the right codes be the complements of the left codes, i.e. every 0 is replaced by 1 and every 1 by 0. The new coding table becomes:
L-code | R-code | digit |
---|---|---|
0001101 |
1110010 |
0 |
0011001 |
1100110 |
1 |
0010011 |
1101100 |
2 |
0111101 |
1000010 |
3 |
0100011 |
1011100 |
4 |
0110001 |
1001110 |
5 |
0101111 |
1010000 |
6 |
0111011 |
1000100 |
7 |
0110111 |
1001000 |
8 |
0001011 |
1110100 |
9 |
Exercise 2: Make your code from exercise 1 also work when the input codes are inverted. Try to reuse your previous code intelligently instead of writing a new matching table!
The UPC specification defines several barcodes, but the most widely used by far is the twelve-digit UPC-A symbology. Almost every product you can find in an American store uses this variety of barcodes to identify your product at checkout. Except for books and tiny beverages, but more on that later.
A full UPC-A code is built with:
101
(3 blocks)01010
(5 blocks)101
(3 blocks)for a total of 95 blocks and 12 digits.
For example, this barcode below is written:
10100011010111011011000101110110110001011000101010100001010000101100110100100010011101000010101
Now, let's insert spaces at the boundaries of digits, for clarity:
101 0001101 0111011 0110001 0111011 0110001 0110001 01010 1000010 1000010 1100110 1001000 1001110 1000010 101
By isolating the left and right parts, we can decode it:
Left: 0001101 0111011 0110001 0111011 0110001 0110001
Right: 1000010 1000010 1100110 1001000 1001110 1000010
Right inverted: 0111101 0111101 0011001 0110111 0110001 0111101
Left digits: 0 7 5 7 5 5
Right digits: 3 3 1 8 5 3
So the UPC for this barcode is 075755331853, which matches the plain text code on the image.
Exercise 3: Write code that can decode a full UPC-A code!
Below are a few examples to test your code against.
My favorite brand of root beer:
10101110110101111000110101110110011001001001101010111001011101001110010111001011001101110100101
My sewing machine:
10100011010111101011101101000110111101001100101010100100010010001101100101110011100101110010101
$6.48 worth of imported cheese:
10100100110001011010111101000110110111000110101010100001011100101010000101110010010001011100101
The latest album from Atlanta artist Daily Bread
10100110010001011011110101101110111011001001101010110110011101001000010100001011001101001000101
Tips:
int
, since leading zeros matter. A list of digits or a string would be a better idea.Bonus: Can you make your program support reverted input, i.e. UPC-A codes read right to left by an upside-down scanner?
Because there is a high risk of the code being misread, the only purpose of the last digit of the code is to verify a checksum. In other words, it is not part of the information conveyed by the code but helps to detect mistakes.
Assume that we name our digits from d1
to d12
, respectively from left to right. In the example used in section 3, we'd have d1 = 0
, d1 = 7
, d2 = 5
, etc. and d12 = 3
is the checksum digit.
Then the following must hold true:
(3*d1 + d2 + 3*d3 + d4 + 3*d5 + d6 + 3*d7 + d8 + 3*d9 + d10 + 3*d11 + d12) % 10 == 0
In practice, d12
is chosen so that the checksum is correct. If not, we know that a digit was either incorrectly printed or read.
Exercise 4: Make your UPC-A reader program also check the checksum. The examples given before are valid, but here are a few incorrect ones below.
Examples of invalid checksum:
10100011010111011011000101110110110001011000101010100001010000101100110100100010011101000100101
10100100110001011010111101000110110111000110101010100001011100101110100101110010010001011100101
EAN-13, or International Article Number, is a specification that expands UPC-A by adding a thirteenth digit (as a prefix) to the existing code. Interestingly, it does so without any changes to the UPC-A format! EAN-13 barcodes look very similar to UPC-A, down to having 95 blocks and the same markers.
EAN-13 barcodes are relatively rare in North America (although UPC-A codes are technically valid EAN-13, more on that later), with the exception of books. That's because the specification allows books to be identified by their ISBN prefixed with 978. Otherwise, EAN-13 codes are universally used abroad.
So how does it work? You may have noticed that all L-codes have an odd number of 1s in them. EAN-13 makes use of that property by introducing a new set of codes, called "even parity" codes or G-codes. And those are simply R-codes but reverted, or read right to left:
L-code | G-code | R-code | digit |
---|---|---|---|
0001101 |
0100111 |
1110010 |
0 |
0011001 |
0110011 |
1100110 |
1 |
0010011 |
0011011 |
1101100 |
2 |
0111101 |
0100001 |
1000010 |
3 |
0100011 |
0011101 |
1011100 |
4 |
0110001 |
0111001 |
1001110 |
5 |
0101111 |
0000101 |
1010000 |
6 |
0111011 |
0010001 |
1000100 |
7 |
0110111 |
0001001 |
1001000 |
8 |
0001011 |
0010111 |
1110100 |
9 |
Exercise 5.1: Modify your digit-decoding code from exercises 1 and 2 to also support decoding even parity left codes / G-codes.
But this still does not explain where the extra digit comes from: it is encoded in the pattern of odd or even parities used for the first six digits! If we mark L-code digits as O
and G-code digits as E
, then the rule is:
Parities | Digit |
---|---|
OOOOOO |
0 |
OOEOEE |
1 |
OOEEOE |
2 |
OOEEEO |
3 |
OEOOEE |
4 |
OEEOOE |
5 |
OEEEOO |
6 |
OEOEOE |
7 |
OEOEEO |
8 |
OEEOEO |
9 |
For example, here are the six first digit codes from an EAN-13:
Code: 0001101 0111101 0010001 0010111 0011011 0001101
Digit: 0 3 7 9 2 0
Bit count: 3 5 2 4 4 3
Parity: Odd Odd Even Even Even Odd
In this case, the parity pattern is OOEEEO
which means the first digit is 3. Consequently, the start of the code is 3037920.
A couple of notes:
Because of how EAN-13 was designed, it is fully backward-compatible with UPC-A (technically speaking, EAN-13 is a superset of UPC-A). A UPC-A is the exact same as an EAN-13 with a prefix of 0.
Exercise 5.2: Modify your UPC-A decoding program from 3 and 4 to extract the parity information from the six first digit codes and translate it to the extra digit. If that digit is zero, then it should be ignored for UPC-A backward compatibility.
Examples:
A notebook I brought from France:
10100011010111101001000100101110011011000110101010110011011001101101100111001011100101001000101
A pack of cards made in Austria:
10100011010100111011001101101110010111000110101010110011011101001011100100100011001101001000101
Russian folk music:
10101011110100111011101100011010001001001000101010110110010010001000100100111010111001011100101
Books you should read:
10101110110001001011001101100010010111011110101010110110010001001001110111010011101001110010101
10101110110001001010011100100110100111001100101010101000011001101010000110110011011001011100101
10101110110001001010011100011010100111011011101010100001011011001000010101110010111001001000101
This is probably enough of a challenge for tonight. If you've breezed through it or feel like having more fun at home, here are some additional challenges:
Add UPC-E support. That thing is weird. It's also quite rare, I only found it on very small beverage containers from large companies. The example given in the introduction is one you can try to decode.
Add support for some of the many other barcode formats that exist: Code 128, EAN-8, EAN-5, ITF…
I previously recommended against using integers for modeling codes by manipulating their binary values. It's possible that you have already done this by habit, if not then I invite you to try it. As a test, can you identify the pop music album behind this code? 25279676323428426953040442277
Can you write a function that can create a barcode from a known UPC/EAN?
If you're feeling like going even further, try making a barcode scanner! For example, that application could take an image as an input, detect and locate barcodes in it, generate the binary code from the image and send it to the code we just made.
incode name of the person who snapped a picture or makes a texture file with Krita or photoshop and uses a least significant bit
input artist stage name output Unicode binary formate of the artist or company name in png or jpg file.
Since we are stuck at home, it has been a good time for me to play my favorite video game, Factorio. Which I play way too much.
This week's challenge is to write a calculator that can help us play the game.
In Factorio, your goal is to build a factory that can collect then transform raw materials into increasingly refined items. Your complex network of machinery eventually ingests iron, copper, stone, and coal then makes rockets or nuclear reactors come out of the other side.
Here is an example of some very simple recipes:
Question 1: How much Iron Ore and Copper Ore does it take to manufacture 10 Automation Science Packs?
Try to write code that can solve this problem. You may solve it by hand and check that your answers match.
Science Packs are used in Factorio to do research and unlock new technologies. In order to keep things interesting, science packs become increasingly difficult to manufacture the more you progress!
Here are some new recipes for a new Science Pack:
Question 2: How much Iron Ore and Copper Ore does it take to manufacture 200 Automation Science Packs and 200 Logistic Science Packs?
Doing all of this crafting by hand is possible but very time-consuming. The main gameplay element of Factorio is to build a network of machines and conveyors to do the job for you!
Machines take time to run their recipes, therefore you need to build enough machines to meet your production throughput targets. In this part, we will implement a tool that can compute how many machines we will need.
Let's look at the recipes from the first part. The time it takes to run each recipe are:
Let's say we want to make 10 Automation Science Packs per minute. Since it takes 10 seconds for one machine to make one, a single machine will be able to output 60 ÷ 10 = 6 items per minute. Therefore, we will need 2 machines to build the science packs.
To meet this quota, we will also need to make every minute:
Therefore, we will need a total 8 machines to run the full production chain for our production quota of 10 Automation Science Packs per minute.
Here are the timings for the recipes defined in part 2:
Question 3: How many machines do you need to sustain a production of 50 Automation Science Packs and 50 Logistic Science Packs per minute?
Later in the game you discover oil processing, and to your dismay its production process is quite a bit more complicated. The recipes are as follows:
In general, Petroleum Gas is the resource we need but refining oil to make it generates excess Light Oil and Heavy Oil. To make the process work without any excess produce, cracking operations need to be used to eliminate the excess oil.
Question 4: How much Crude Oil is needed to make 950 units of Petroleum Gas without any excess oil?
In practice, Heavy Oil and Light Oil are also useful in their own right and need to be produced (in smaller quantities) as well.
Question 5: How much Crude Oil is needed to make 2,000 units of Petroleum Gas, 800 units of Light Oil and 225 units of Heavy Oil?
There are two approaches to solving the oil processing problem:
From a programming perspective, the second approach is a lot more fun. And it has the added advantage to be easy to fix if the recipes change (which does happen in Factorio).
I find it useful in this situation to use "production cycles per minute" as variables to solve with, and write down how to adjust each variable in function of the other. It can take so trial and error to find a stable solution.
Sorting is arguably among the fundamental problems in Computer Science. Thankfully we rarely need to think about it too much. Like most other high-level programming languages, Python includes built-in sorting routines that use fast and efficient (and complex) algorithms. Sorted collections are always just one sorted()
call away!
While not essential to all programming tasks, understanding how sorting works is quite useful. While the concept of sorting is very simple, the techniques used to achieve it can be very complex. Sorting problems are also a very common type of interview questions, so it pays to be prepared.
The core of the exercise is to sort a list of signed integers without using the built-in sorting calls. The challengers are free to use any sorting algorithm they want. For those who have never implemented sorting before, the insertion sort will be explained in the instructions. More advanced coders are invited to implement a more complicated algorithm of their choosing.
In this challenge, we will try to build a simplified "password checker" like those commonly seen on most websites.
RlAHzL
Disclaimer: This project is for fun, and should not be used to check the complexity of passwords that will protect actual sensitive information.
(Though if this script says your password is bad, you should be concerned.)
NOTE: DO NOT WRITE YOUR ACTUAL PASSWORDS IN THE CYBER-DOJO SESSION. THEY WILL BE PUBLICLY STORED AS PLAIN TEXT FOR EVERYONE TO SEE!
This challenge will be hosted on cyber-dojo.org. This will allow you to work without needing to install Python on your computer, and you will be able to see other members' submissions.
To connect, follow these instructions:
RlAHzL
, then press Enter.Inside the session, you will see:
In this challenge, rate_password.py
is where you are expected to write your code. The tests are located in test_rate_password.py
, and you are free to play with them if you wish.
The expected form of this program is a function that takes a string as an argument, and returns an evaluation of its adequateness as a password.
This whole process being somewhat complicated, we will split it in separate steps, each built as its own function.
Research has shown that the length of a password is generally its most significant characteristic. Nowadays, even mid-range consumer-grade graphics cards pack up incredible amounts of processing power, enough to crack any password up to 8 characters in reasonable time.
Therefore, our first step will be to give a "Length score" to a password based on its length. The logic is as follows:
Examples:
password
has a score of 1password1234
has a score of 5john
has a score of 0For this exercise, edit the length_score
function in the rate_password.py
file. This function takes a password as an argument and returns its length score as a number. The TestLengthScore
class in test_rate_password.py
contains tests for it, you may edit those as well if you wish.
It is also a terrible idea to choose a password that's common. It is guaranteed that attackers will try all the most common password first when trying to crack one.
For this part, write the is_common
function, which takes a password as an argument and returns a boolean (True
or False
) indicating whether it's common or not. For this purpose, a (curated) list of approximately 1,000 common passwords is included as common_password.txt
. You code will need to open that file, read it and check if the given password is found or not.
Finally, a password should be somewhat complex so that the chances of guessing it at random are minimal. To measure this, we will give our password a "complexity score", which is described below.
Each password starts with a score of 1, then gets points by matching the following rules:
Note: "symbols" are any ASCII non-alphanumeric printable character, including spaces, such as:
.,<>/?{}[]\|()&^%$#@!*
Additionally, there are rules to penalize lazy passwords:
A few examples:
password
: 1 ptpassword123$
: 2 pts (all at the end)H3LLO_WORLD
: 3 pts (1 + 2 for numbers + 2 for symbols - 2 for only one of each)nen9aPhu
: 3 pts (1 + 1 for case + 2 for numbers - 1 for only one number)Ba$th5to
: 4 pts (1 + 1 for case + 2 for numbers + 2 for symbols - 2 for only one of each)Dre1käse
: 5 pts (1 + 1 for case + 2 for numbers + 3 for foreign characters - 2 for only one of each)Oo7,28=r+MU}
: 6 pts (1 + 1 for case + 2 for numbers + 2 for symbols)For this part, edit the complexity_score
function in rate_password.py
. As previously, you can find the tests for this in test_rate_password.py
. This will likely be the most complicated part of the problem, do not hesitate to call for help.
Now that we have all the components, we can finally make a function that judges passwords!
The total score of a password is calculated as follows:
The evaluation then returns an evaluation based of that score:
For this part, edit the rate_password
function in rate_password.py
.
If the above was relatively easy for you, one last possible improvement is to check for proximity to common passwords. This involves modifying the is_common
rule so that a password is also considered common if:
There are no tests for this, but you are welcome to write your own!
Measuring Password Strength: An Empirical Analysis. Matteo Dell'Amico, Pietro Michiardi, Yves Roudier
https://arxiv.org/pdf/0907.3402.pdf
Password Cracking. Computerphile
https://youtu.be/7U-RbOKanYs
OWASP's SecLists
https://github.com/danielmiessler/SecLists
https://www.owasp.org/index.php/OWASP_SecLists_Project
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.