Giter Site home page Giter Site logo

hanshanley / smile Goto Github PK

View Code? Open in Web Editor NEW

This project forked from haifengl/smile

0.0 0.0 0.0 239.13 MB

Statistical Machine Intelligence & Learning Engine

Home Page: https://haifengl.github.io

License: Other

Java 59.72% Harbour 0.01% Scala 7.21% HTML 10.52% Shell 0.11% CSS 0.10% JavaScript 0.06% Jupyter Notebook 2.77% Scilab 12.53% Terra 2.73% Objective-C++ 1.36% Kotlin 1.79% Clojure 1.11%

smile's Introduction

Smile

Donate Join the chat at https://gitter.im/haifengl/smile Maven Central

Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala. With advanced data structures and algorithms, Smile delivers state-of-art performance. Smile is well documented and please check out the project website for programming guides and more information.

Smile covers every aspect of machine learning, including classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc.

Smile implements the following major machine learning algorithms:

  • Classification: Support Vector Machines, Decision Trees, AdaBoost, Gradient Boosting, Random Forest, Logistic Regression, Neural Networks, RBF Networks, Maximum Entropy Classifier, KNN, Naïve Bayesian, Fisher/Linear/Quadratic/Regularized Discriminant Analysis.

  • Regression: Support Vector Regression, Gaussian Process, Regression Trees, Gradient Boosting, Random Forest, RBF Networks, OLS, LASSO, ElasticNet, Ridge Regression.

  • Feature Selection: Genetic Algorithm based Feature Selection, Ensemble Learning based Feature Selection, Signal Noise ratio, Sum Squares ratio.

  • Clustering: BIRCH, CLARANS, DBSCAN, DENCLUE, Deterministic Annealing, K-Means, X-Means, G-Means, Neural Gas, Growing Neural Gas, Hierarchical Clustering, Sequential Information Bottleneck, Self-Organizing Maps, Spectral Clustering, Minimum Entropy Clustering.

  • Association Rule & Frequent Itemset Mining: FP-growth mining algorithm.

  • Manifold Learning: IsoMap, LLE, Laplacian Eigenmap, t-SNE, PCA, Kernel PCA, Probabilistic PCA, GHA, Random Projection, ICA.

  • Multi-Dimensional Scaling: Classical MDS, Isotonic MDS, Sammon Mapping.

  • Nearest Neighbor Search: BK-Tree, Cover Tree, KD-Tree, LSH.

  • Sequence Learning: Hidden Markov Model, Conditional Random Field.

  • Natural Language Processing: Sentence Splitter and Tokenizer, Bigram Statistical Test, Phrase Extractor, Keyword Extractor, Stemmer, POS Tagging, Relevance Ranking

You can use the libraries through Maven central repository by adding the following to your project pom.xml file.

    <dependency>
      <groupId>com.github.haifengl</groupId>
      <artifactId>smile-core</artifactId>
      <version>2.3.0</version>
    </dependency>

For NLP, use the artifactId smile-nlp.

For Scala API, please use

    libraryDependencies += "com.github.haifengl" %% "smile-scala" % "2.3.0"

For Kotlin API, add the below into the dependencies section of Gradle build script.

    implementation("com.github.haifengl:smile-kotlin:2.3.0")

For Clojure API, add the following dependency to your project or build file:

    [org.clojars.haifengl/smile "2.3.0"]

To enable machine optimized matrix computation, the users should make their machine-optimized libblas3 (CBLAS) and liblapack3 (Fortran) available as shared libraries at runtime. This module employs the highly efficient netlib-java library.

OS X

Apple OS X requires no further setup as it ships with the veclib framework.

Linux

Generically-tuned ATLAS and OpenBLAS are available with most distributions and must be enabled explicitly using the package-manager. For example,

  • sudo apt-get install libatlas3-base libopenblas-base
  • sudo update-alternatives --config libblas.so
  • sudo update-alternatives --config libblas.so.3
  • sudo update-alternatives --config liblapack.so
  • sudo update-alternatives --config liblapack.so.3

However, these are only generic pre-tuned builds. If you have Intel MKL installed, you could also create symbolic links from libblas.so.3 and liblapack.so.3 to libmkl_rt.so or use Debian's alternatives system.

Windows

The native_system builds expect to find libblas3.dll and liblapack3.dll on the %PATH% (or current working directory). Smile ships a prebuilt OpenBLAS. The users can also install vendor-supplied implementations, which may offer better performance.

Shell

Smile comes with an interactive shell. Download pre-packaged Smile from the releases page. In the home directory of Smile, type

    ./bin/smile

to enter the shell, which is based on Ammonite-REPL. You can run any valid Scala expressions in the shell. In the simplest case, you can use it as a calculator. Besides, all high-level Smile operators are predefined in the shell. By default, the shell uses up to 75% memory. If you need more memory to handle large data, use the option -J-Xmx or -XX:MaxRAMPercentage. For example,

    ./bin/smile -J-Xmx30G

You can also modify the configuration file ./conf/smile.ini for the memory and other JVM settings. For detailed help, checkout the project website.

Model Serialization

Most models support the Java Serializable interface (all classifiers do support Serializable interface) so that you can use them in Spark. For reading/writing the models in non-Java code, we suggest XStream to serialize the trained models. XStream is a simple library to serialize objects to XML and back again. XStream is easy to use and doesn't require mappings (actually requires no modifications to objects). Protostuff is a nice alternative that supports forward-backward compatibility (schema evolution) and validation. Beyond XML, Protostuff supports many other formats such as JSON, YAML, protobuf, etc. For some predictive models, we look forward to supporting PMML (Predictive Model Markup Language), an XML-based file format developed by the Data Mining Group.

Smile Scala API provides read(), read.xstream(), write(), and write.xstream() functions in package smile.io.

Visualization

Smile also has a Swing-based data visualization library SmilePlot, which provides scatter plot, line plot, staircase plot, bar plot, box plot, histogram, 3D histogram, dendrogram, heatmap, hexmap, QQ plot, contour plot, surface, and wireframe. The class PlotCanvas provides builtin functions such as zoom in/out, export, print, customization, etc.

SmilePlot requires SwingX library for JXTable. But if your environment cannot use SwingX, it is easy to remove this dependency by using JTable.

To use SmilePlot, add the following to dependencies

    <dependency>
      <groupId>com.github.haifengl</groupId>
      <artifactId>smile-plot</artifactId>
      <version>2.2.2</version>
    </dependency>

Gallery

Kernel PCA

Kernel PCA

IsoMap

IsoMap

MDS

Multi-Dimensional Scaling

SOM

SOM

Neural Network

Neural Network

SVM

SVM

Agglomerative Clustering

Agglomerative Clustering

X-Means

X-Means

DBSCAN

DBSCAN

Neural Gas

Neural Gas

Wavelet

Wavelet

Mixture

Exponential Family Mixture

smile's People

Contributors

haifengl avatar beckgael avatar rayeaster avatar kid1412z avatar serickso avatar elkfrawy-df avatar xyclade avatar digital-thinking avatar twotabbies avatar myui avatar benmccann avatar kevincooper avatar diegocatalano avatar inejc avatar rafaelsakurai avatar doghere avatar alexey-grigorev-sm avatar tdunning avatar peter-toth avatar pdkovacs avatar kno10 avatar jpe42 avatar gforman44 avatar gitter-badger avatar takanori-ugai avatar ghostdogpr avatar pguillebert avatar ptoth-rapidminer avatar pfcoperez avatar lukehutch avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.