Giter Site home page Giter Site logo

links's Introduction

links

Just a bunch of useful links. BTW see rust links as well.

Scala

Serialization / Off-heap Data Structures / Unsafe

  • Simple Binary Encoding - supposedly 20-50x faster than Google Protobuf !!
  • Comparison of Cap'n Proto, SBE, FlatBuffers from the Cap'n Proto people
    • Cap'n Proto native layout uses 64-bit words, relies on separate packing/unpacking to achieve efficient wire representation. Has RPC (but not for Java). Bitset support. Java is third party support.
    • Flatbuffers is from Google. 32-bit word size, more compact native representation, native Java support.
    • Both Cap'n Proto and Flatbuffers allows random access of lists, whereas SBE is really only for streaming access
  • Using Unsafe for C-like memory access speeds - a great guide. Many Unsafe operations turn into Java intrinsics - which translate to direct machine code
  • Scala-offheap - fast, safe off heap objects
  • FastTuple - a dynamic (runtime-defined) C-style struct library, with support for off-heap storage. Only works for primitives right now :(
    • and the excellent blog covers all of the on- and off-heap access and allocation patterns on the JVM very thoroughly.
  • ObjectLayout - efficient struct-within-array data structures
  • jvm-unsafe-utils - @rxin of Spark/Shark fame library for working with Unsafe.
  • Agrona and blog post - a ByteBuffer wrapper, off-heap, with atomic / thread-safe update operations. Good for building off heap data structures.
  • Sidney - an experimental columnar nested struct serializer, with Parquet-like repetition counts
  • OHC - Java off-heap cache
  • Boon ByteBuf and the JavaDoc - a very easy to use, auto-growable ByteBuffer replacement, good for efficient IO
  • Jawn - @d6's new fast JSON parser, parses to multiple ASTs including rojoma-json, spray-json, argonaut
  • Grisu-scala - much faster double to string conversion
  • Extracting case class param names using Macros
  • Fast-Serialization - a drop in replacement for Java Serialization but much faster

Concurrency, Actors

Reactive Streams

Database Libs

  • Asyncpools - Akka-based async connection pool for Slick. Akka 2.2 / Scala 2.10.

  • Postgresql-Async - Netty-based async drivers for PostgreSQL and MySQL

  • Relate - a very lightweight, fast Scala wrapper on top of JDBC

Caching

  • Cacheable - a clever memoization / caching library (with Guava, Redis, Memcached or EHCache backends) using Scala 2.10 macros to remember function parameters

Big Data Processing

  • Great list of Big Data Projects

  • List of Database Papers

  • List of free big data sources - includes some Socrata datasets, climate data, data from Google, tweets, etc.

  • Debasish G's list of streaming papers and algorithms - esp stuff on CountMinSketch and HyperLogLog

  • Cubert - CUBE operator + fast "cost-based" block storage on Hadoop / Tez/ Spark

  • Kylin - OLAP CUBEs from HIVE tables, includes query layer

  • Aesop - a scalable pub-sub / change propagation system, esp between different datastores, with reliability. Based on LinkedIn DataBus, suports pull or push producers.

  • Making Zookeeper Resilient, an excellent blog post from Pinterest

  • ImpalaToGo - run Cloudera Impala directly on S3 files without HDFS!

  • Calcite - new Apache project, offers ANSI SQL syntax over regular files and other input sources

  • redash.io - data visualization / collaboration. TODO: integrate this with Spark SQL / Hive...

  • Fast SQL Query Parser in Scala - based on the Scala-LMS project, compiles a query down to C!

  • Probability Monad - super useful for stats or random data generation

  • stringmetric - Approximate string matching and phonetic algorithms

  • Factorie - a Scala library for Natural Language Processing based on factor graphs

Spark

Geospatial and Graph

  • GeoTrellis - distributed raster processing on Spark. Also see GeoMesa - distributed vector database + feature filtering

  • ApertureTiles - system using Spark to generate a tile pyramid for interactive analytical geo exploration

  • Twofishes - Foursquare's Scala-based coarse forward and reverse geocoder

  • trails - parser combinators for graph traversal. Supports Tinker/Blueprints/Neo4j APIs.

  • scala-graph - in-memory graph API based on scala collections. Work in progress.

Collections, Numeric Processing, Fast Loops

  • Breeze, Spire, and Saddle - Scala numeric libraries
    • spire-ops - a set of macros for no-overhead implicit operator enrichment
  • Framian - a new data frame implementation from the authors of Spire
  • Scala DataTable - An immutable, updatable table with heterogenous types of columns. Easily add columns or rows, and have easy Scala collection APIs for iteration.
  • ScalaXY - collection of macros for performant for loops, extension methods etc
  • Squants - The Scala API for Quantities, Units of Measure and Dimensional Analysis
  • An immutable priority map for Scala
  • Unboxing, Runtime Specialization - a cool post on how to do really fast aggregations using unboxed integers
  • product-collections - useful library for working with collections of tuples. Also, great strongly-typed CSV parser.
  • SuperFastHash - also see Murmur3

Big Data Storage

  • Phantom - Scala DSL for Cassandra, supports CQL3 collections, CQL generation from data models, async API based on Datastax driver

  • Athena - Asynchronous Cassandra client built on Akka-IO

  • CCM - easily build local Cassandra clusters for testing!

  • Stubbed Cassandra - super useful for testing C* apps

  • Pithos - an S3-API-compatible object store for Cassandra

  • Doradus - A Graph / OLAP store on top of Cassandra

  • Stratio-Cassandra - a fork with Lucene full-text search and CQL support (see the blog). Also see Stargate.

  • How CQL maps to Cassandra Internal Storage

  • Sirius - Akka-based in-memory fast key-value store for JVM objects, with Paxos consistency, persistence/txn logs, HA recovery

  • CurioDB - distributed persistent Redis built on Akka cluster, etc. :)

  • Ivory - An immutable, versioned, RDF-triple / fact store for feature extraction / machine learning

  • Hibari - ordered key-value store using chain replicaton for strong consistency

  • Storehaus - Twitter's key-value wrapper around Redis, MySql, and other stores. Has a neat merge() functionality for aggregation of values, lists, etc.

  • ArDB - like Redis, but with spatial indexes, and pluggable storage engines

  • MapDB - Not a database, but rather a database engine with tunable consistency / ACIDness; support for off-heap memory; fast performance; indexing and other features.

  • HPaste - a nice Scala client for HBase

  • OctopusDB paper - interesting idea of using a WAL of RDF triples as the primary storage, with secondary views of row or column orientation

Distributed Systems

Web / REST / General

  • Scalaj-http - really simple REST API. Although, the latest Spray-client has been vastly simplified as well.

  • Quick Start to Twitter Finagle - though one should really look into Finatra

  • REPL as a service - would be kick ass if integrated into Spark

  • Enumeratum - a Scala Enum library, much better than built in Enumeration

  • Ammonite - Scala DSL for easy BASH-like filesystem operations

  • IScala - Scala backend for IPython. Looks promising. There is also Scala Notebook but it's more of a research project.

  • Scaposer - i18n / .po file library

  • Adding Reflection to Scala Macros - example of using reflection in an annotation macro to add automatic ByteBuffer serialization to case classes :)

  • Scaldi - A lightweight dependency injection library, with Akka integration

  • Knobs - Scala config library with reactive change detection, env var substitution, can read from Typesafe Config/HOCON, ZK, AWS

  • How to use Typesafe Config across multiple environments

  • lamma.io - the easiest date generation library

  • Pimpathon - a set of useful pimp-my-library extensions

  • Scala-rainbow - super simple terminal color output, easier than Console.XXX

Build, Tooling

SBuild seems like a promising replacement for SBT. Still Scala, but much much simpler, more like Scala version of Make. With MVN dependency and ScalaTest support.

JVM Other

  • Swiss Java Knife - super handy collection of JVM tools. Try java -jar sjk.jar ttop -p PID -o CPU -n 10 for regular reporting of the top 10 threads by CPU usage!
  • -XX:+PerfDisableSharedMem
  • Al Tobey's flags for running JDK8 apps. Note: G1GC! Also no need for MaxPermSize anymore: -Xmx8G -Xms8G -Xss256k -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=0
  • Quick dumping your JVM heap using GDB -- too bad it doesn't work on OSX.
  • Start a JMX agent in running JVM: jcmd <pid> ManagementAgent.start jmxremote.port=26010 jmxremote.ssl=false jmxremote.authenticate=false
  • HeapAudit - A Java agent for lightweight production heap profiling
  • Lion's Share - tools for memory analysis, outputs Google Charts compatible output
  • jHiccup -- "Hiccup" or GC pause analysis tool
  • Bintray - friendlier alternative to Sonatype OSS / Maven central. Also see bintray-sbt plugin.
  • Changing JVM flags live - such as enabling GC logging without restarting JVM. Cool!

Monitoring / Infrastructure

  • Keywhiz - a store for infrastructure secrets
  • HTrace - distributed tracing library, can dump data to Zipkin or HBase
  • cass_top - simple top utility for cass clusters
  • Grafana and Graphene - great replacement UIs for the clunky default Graphite UI
  • Elastic Mesos - create Mesos clusters on AWS with ZK, HDFS
  • Clustering Graphite - in depth look at how to scale out Graphite clusters

Databases

Indexing and OLAP

ML and Data Science

  • LearnDS - A set of IPython notebooks for learning data science

Distributed Systems

Sublime Text

I love Sublime and use it for everything, even Scala! Going to put my Sublime stuff in a separate page.

Best Practices and Design

Other Random Stuff

  • A list of great docs

  • Awesome public datasets - no doubt some are Socrata sites!

  • Mermaid - think of it as Markdown for diagrams. Would be awesome to integrate this into reveal.js!

  • How To Be a Great Developer - a reminder to be empathetic, humble, and make lives around us better. Awesome list.

  • JQ - JSON processor for the shell. Super useful with RESTful servers.

  • Underscore-CLI - a Node-JS based command line JSON parser

  • MacroPy - Scala-like macros, case classes, pattern matching, parser combos for Python (!!)

  • Scala 2.11 vs Swift - Apple's new iOS language is often compared to Scala.

  • Real World OCaml

  • Gherkin - a Lisp implemented in bash !!

  • Nimrod - a neat, compile-straight-to-binary, static systems language with beautiful Python-like syntax, union types, generics, macros, first-class functions. What Go should have been.

  • Bret Victor - A set of excellent essays and talks from a great visual designer

Tips on installing Ruby

becoz it's so darn painful.

  • On OSX: make sure setUID bit is not set on dtrace: sudo chmod -s /usr/sbin/dtrace (see this Homebrew issue)
  • Try chruby and ruby-install instead of rbenv. Installs rubies into /opt/rubies and lighter weight, also there is a fish shell chruby-fish.

links's People

Contributors

hairyfotr avatar mslinn avatar velvia avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.