Giter Site home page Giter Site logo

quant_scalaml's Introduction

Scala for Machine Learning Version 0.98.1
Copyright Patrick Nicolas All rights reserved 2013-2015
=================================================================
Source code, data files and utilities related to "Scala for Machine Learning"
Overview
Documentation
Minimum requirements
Updates
Project
installation
build
Run examples
Persistent models and configurations
Appendix

Overview

The source code provides software developers with a broad overview of the difference in machine learning algorithms. The reader is expected to have a good grasp of the Scala programming language along with some knowledge in basic statistics. Experience in data mining and machine learning is not a pre-requisite.

Source code guidelines are defined in the companion document SourceCodeGuide.html

The examples are related to investment portfolio management and trading strategies. For the readers interested either in mathematics or the techniques implemented in this library, I strongly recommend the following readings:

  • "Machine Learning: A Probabilistic Perspective" K. Murphy - MIT Press - 2012
  • "The Elements of Statistical Learning" T. Hastie, R. Tibshirani, J. Friedman - Springer - 2001
  • "Pattern Recognition and Machine Learning" C. Bishop - Springer 2006
The real-world examples, related to financial and market analysis, used for the sole purpose of illustrating the machine learning techniques. They do not constitute a recommendation or endorsement of any specific investment management or trading techniques.
The Appendix contains an introduction to the basic concepts of investment and trading strategies as well as technical analysis of financial markets.

Documentation

The best approach to learn about any particular learning algorithm is to
  • Read the appropriate chapter (i.e. Chapter 5: Naive Bayes modelsM)
  • Review source code guidelines used in the book SourceCodeGuide.html
  • Review scaladoc in scala_2.10-0.98-javadoc.jar
  • Look at the examples related to the chapter (i.e. org/scalaml/app/chap5/Chap5)
  • Browse through the implementation code (i.e. org/scalaml/supervised/bayes)

Minimum Requirements

Hardware: 2 CPU core with 4 Gbytes RAM for small datasets to build and run examples.
4 CPU Core and 8+ Gbytes RAM for datasets of size 75,000 or larger and/or with 50 features set or larger
Operating system: None
Software: JDK 1.7.0_45 or 1.8.0_25, Scala 2.10.3/2.10.4 or 2.11.1 and SBT 0.13+ (see installation section for deployment.

Updates

Version 0.98.1
- Added function minimization as a test case for Genetic algorithms
- Added monitoring callback for reproduction cycle of the genetic algorithm and update implementation of trading signals
- Standardized string representation of collection using mkString
- Added plots to the performance benchmark of parallel collection (Chap. 12)
- Simplified and re-implemented the Viterbi algorithm (HMM - decoding) as a tail recursion and normalize lambda probabilities matrices
- Expanded scaladocs with reference to the chapters of "Scala for Machine Learning"
- Replace some enumeration by case classes
- Added scalastyle options
Version 0.98
- Added comments to test cases
- Added Scala source guide - Wrapped Scalatest routines into futures
- Expand the number of test/evaluations from 39 to 60
Version 0.97
- Initial implementation

Project Components

Directory structure of the source code library for Scala for Machine Learning:

Source code



Directory structure of the source code of the examples for Scala for Machine Learning:

Examples



Installation and Build

Installation

The installation and build workflow is described in the following diagram:

Installation and build


Eclipse The Scala for Machine Learning library is compatible with Eclipse Scala IDE 3.0
Specify link to the source in Project/properties/Java Build Path/Source. The two links should be project_name/src/main/scala and project_name/src/test/scala
Add the jars required to build and execute the code within Eclipse Project/properties/Java Build Path/Add External Jarsas declared in the project_name/.classpath
Update the JVM heap parameters in eclipse.ini file as -Xms512m -Xmx8192m or the maximum allowed on your specific machine.

Build

The Simple Build Too (SBT) has to be used to build the library from the source code using the build.sbt file in the root directory
Executing the examples/test in Scala for Machine Learning require sufficient JVM Heap memory (~2G):
in sbt/conf/sbtconfig.text set Xmx to 2058m or higher, -XX:MaxPermSize to 512m or higher i.e. -Xmx4096m -Xms512m -XX:MaxPermSize=512m

Build script for Scala for Machine Learning:
To build the Scala for Machine Learning library package
$(ROOT)/sbt clean publish-local
To build the package including test and resource files
$(ROOT)/sbt clean package
To generate scala doc for the library
$(ROOT)/sbt doc
To generate scala doc for the examples
$(ROOT)/sbt test:doc
To generate report for compliance to Scala style guide:
$(ROOT)/sbt scalastyle
To compile all examples:
$(ROOT)/sbt test:compile

Run examples

examples in a chapter

To run the examples of a particular chapter (i.e. Chapter 4)
$sbt
>test-only org.scalaml.app.chap4.Chap4

All examples

To run all examples with output configuration:
$sbt "test:run options" where options is a list of possible outputs
  • console to output results onto standard output
  • logger to output results into a log file (log4j)
  • chart to plot results using jFreeChart
$sbt "test:run log chart" write test results into a log and charts
$sbt test:run write test results into the standard output and the charts.

Persistent models and configurations

The package object org.scalaml.core.Design provide the trait (or skeleton implementation) of the persistent model Design.Model and configuration Design.Config.
The persistency mechanisms is implemented for a couple of supervised learning models only for illustration purpose. The reader should be able to implement the persistency for configuration and models for all relevant learning algorithms using the template operator << and >>

Appendix

The examples have been built and tested with the following libraries:
Java libraries
CRF-Trove_3.0.2.jar
LBFGS.jar
colt.jar
CRF.jar
commons-math3-3.3.jar
libsvm.jar
jfreechart-1.0.17/lib/jcommon-1.0.21.jar
jfreechart-1.0.17/lib/servlets.jar
junit-4.11.jar
jfreechart-1.0.17/lib/jfreechart-1.0.17.jar
Scala 2.10.x related libraries
com.typesafe/config/1.2.1/bundles/config.jar
akka-actor_2.10-2.2.3.jar
scalatest_2.11.jar
spark-assembly-1.0.2-hadoop2.4.0.jar
Scala 2.11.x related libraries
com.typesafe/config/1.2.2/bundles/config.jar
scalatest_2.11.jar
akka-actor_2.11-2.3.6.jar
spark-assembly-1.1.0-hadoop2.4.0.jar

quant_scalaml's People

Contributors

prnicolas avatar

Watchers

James Cloos avatar Andrei Pozolotin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.