Giter Site home page Giter Site logo

urlshort's Introduction

urlshort: URL mapping service

Takes the URL GET param h (for hash)
and stores the URL under this key into a database.

The key can then be used to retrieve the URL
from the database. Or to forward/redirect to the URL.

Storage Engines

The LevelDB and BoltDB storages are workable and yield interesting benchmark results.

BoltDB is supposed to be slightly better at retrieval than at insert.
BoltDB's insertion speed seems relatively bad. That's because it's flushed/synced after each insert. If we pack batched of 1000 inserts between syncs, BoltDB reaches 300.000 inserts per second.

LevelDB is flushed/synced after an unclear number of inserts.
If we flush/sync on each write, performance breaks down too.

This is expected behavior - compare the C implementation docs:
https://github.com/google/leveldb/blob/master/doc/index.md

Also the internal implementation notes:
https://github.com/google/leveldb/blob/master/doc/impl.md

LevelDB is susceptible to corruption.
Use Recover() when corrupted (not implemented).

A parallel logfile (search code for transaction log) was coded, but discarded.
There are levelDB journal files. They can be processed by https://godoc.org/github.com/syndtr/goleveldb/leveldb/journal.

The wire format allows for limited recovery in the face of data corruption:  
on a format error (such as a checksum mismatch), the reader moves to the next block  
and looks for the next full or first chunk.

We cleanly close databases after HTTP server crash or regular application termination.

Example

Store/Save

localhost:8084/enc?url=https://www.wikipedia.org?h=wik1
localhost:8084/enc?url=https://en.wikipedia.org/wiki/Battle_of_San_Patricio?h=wik2

Check all

http://localhost:8084/dump

Check one

http://localhost:8084/dec/wik1

Call to redirect

http://localhost:8084/r/wik1
http://localhost:8084/r/wik2

Benchmark LevelDB

8-11 microseconds per saving operation -

90.000 inserts per second.

3-5 microseconds per loading operation -

200.000 loads per second.

C implementation

Source: https://github.com/google/leveldb

fillseq      :       1.765 micros/op;   62.7 MB/s
fillsync     :     268.409 micros/op;    0.4 MB/s (10000 ops)
fillrandom   :       2.460 micros/op;   45.0 MB/s
overwrite    :       2.380 micros/op;   46.5 MB/s

Benchmark BoltDB

2.500 microseconds per saving operation -
400 inserts per second.

3.300 microseconds per batch-of-1000 saving operation -
yielding 330.000 inserts per second.

2 microseconds per loading operation -
500.000 loads per second.

Database considerations

Possible database backends - LevelDB vs BoldDB

LevelDB and its derivatives (RocksDB, HyperLevelDB) underlying structure is a log-structured merge-tree (LSM tree).
An LSM tree optimizes random writes by using a write ahead log and multi-tiered, sorted files called SSTables.
Bolt uses a B+tree internally and only a single file.

https://github.com/etcd-io/bbolt

https://github.com/syndtr/goleveldb

Other

https://github.com/tidwall/buntdb based on https://github.com/tidwall/btree

https://github.com/dgraph-io/badger

urlshort's People

Contributors

pbberlin avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.