A .NET Core microservice that diffs Base64 encoded binary data.
This is a coding test assignment for WAES.
The original description of the assignment can be found at doc/assignment.md.
-
Make sure you have installed .NET Core (version >= 2.0.0) for your platform.
-
Clone this repository.
git clone https://github.com/potz/base64-diff-assignment.git
-
Run the installation script.
This will restore NuGet packages, build the project and run the tests.
On Windows:
CD base64-diff-assignment install.bat
On Linux/Mac:
cd base64-diff-assignment ./install
-
Launch the server.
This starts the server and listens on
http://localhost:5000
.On Windows:
run.bat
On Linux/Mac:
./run
If you point your browser to the API root (http://localhost:5000/
), you'll get some usage examples.
This application is implemented as a REST service. In order to interact with it you need something like Postman.
This repository includes a comprehensive Postman test script. It's also documented online.
If you have Postman installed and the service is running at your local machine's port 5000 right now, you can just click the button below to run it:
- Diff IDs must be integers
- If a client accesses the
GET /v1/diff/:id
endpoint and the server has never before received the left or right part for that particular ID, the service shall respond with a404 Not found
http status. - If a client accesses the
GET /v1/diff/:id
endpoint and the server has received only one of the parts (either only the left or only the right), the service shall respond with a200 OK
http status, and indicate in the response body that the lengths are different (the server is assuming that the missing part is an empty string).
The assignment states:
Provide 2 http endpoints that accept JSON base64 encoded binary data on both endpoints
My interpretation is that the endpoints must accept a JSON document with a field (I'll use the field name data
) whose value contains binary data encoded as a Base64 string.
Example of a valid request:
PUT /v1/diff/1/left HTTP/1.1
Host: localhost
Content-type: application/json
Content-length: 78
{
"data": "VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZw=="
}
HTTP/1.1 200 OK
Connection: close
Content-Type: application/json; charset=utf-8
{"success":true}
-
Application design
This is a .NET MVC Core application, with a web-facing API layer (
src/api
) that uses a service layer (src/domain
) to retrieve and store its diffs. Dependency injection was used to provide the single controller with a diffing store. -
Database
We're not using a database. All the data is kept in memory. This is not scalable of course but works well for prototyping. Also note that since we used a service layer we can add a database later without changing the interface.
-
Tests
We have 100% code coverage for production code, which is a nice start but it's not to be taken as a guarantee that all the edge cases are being covered.
We are unit testing the controllers in isolation, so mocks were used. Although it's generally considered good practice, we might question the cost of maintaining unit tests for controllers when we are already exercising all the routes on our integration tests.
-
Diffing algorithm
I understand the focus of this exercise is application design and good documentation, so the most simple and naive diffing algorithm was chosen. We consider only differences in data with the exact same lengths.
This means for example that for the strings:
"The quick brown fox jumps over the lazy fox"
and"The lazy brown fox jumps over the quick dog"
we find only one difference, from character 4 and with length 35 (start of the wordlazy
all the way to the end of the wordquick
). A proper diffing algorithm should be able to recognize two differences in these sentences (the actual wordslazy
andquick
).In a real world situation we should probably use the widely-adopted O(ND) Difference Algorithm by Eugene Myers instead.
There is a (relatively unknown) standards-based file format named JSON-Base64 which allows encoding of tabular data like CSVs or spreadsheets. In this format the first row defines headers with column names and data types and the remaining of the file contains the actual data, one record per row. All the data is represented as JSON that is encoded via a slightly modified version of Base64. The format is stream-frendly and in public domain.
For this exercise I am assuming that this is not what the assignment's description meant. If we were supporting this format then the data would need to be posted to the endpoints as a raw string instead.