mrrobot's Introduction
README Run with: $ sbt > run $url - url: starting URI. MUST end with a '/' - Depth is limited to 3, change by editing Main.scala Make a pretty picture with: dot -Tpdf output.dot > foo.pdf TODO recognise links outside A.HREF (e.g. image maps) - very difficult in the general case becuase javascript can load stuff and navigate the browser. Even regexs aren't sufficient here becuase the new URL may never occur in one peice in the document properly parse HTML Try harder to determine "one domain" is, e.g. currently the host part of the URI is used, so a DNS name, its IP4 and IP6 addresses are considered different. - is a subdomain equal to its superdomain (e.g. www.google.com == google.com ?) take multiple base URLs on the command line, process all (in parallel) and name each output file after the URL TRADEOFFS In the interests of producion-quality: * Scala version pinned to a known quantity * Dependences kept to semver minor or patch ranges Environment Runs on Linux - a unikernel cluster might be better suited. Language Scala - modern, cool (easy to hire good people), very hard to work with if you don't know it, possibly a bit niche still. Gives us Akka which is a great computation model for this kind of problem - Erlang/Elixir have all the problems of Scala and more. Libraries async-http-library - canonical, bit java-focussed, pulls in netty
mrrobot's People
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.