Giter Site home page Giter Site logo

hanssmail / quantq Goto Github PK

View Code? Open in Web Editor NEW
77.0 27.0 44.0 253 KB

The repository for the Machine Learning and Big Data with kdb+/q book by Novotny et al.

License: GNU General Public License v3.0

q 100.00%
kdb kdb-q machine-learning mathematical-functions bioinformatics algorithm deep-neural-networks quantum-computing topological-data-analysis

quantq's Introduction

quantQ

The repository for the Machine Learning and Big Data with kdb+/q book by Novotny et al.

Order the book at https://www.wiley.com/en-us/Machine+Learning+and+Big+Data+with+KDB%2B+Q-p-9781119404750.

Getting Started

$ cd quantQ/lib
$ q quantQ.q -p 5000

The naming convention for each .q file reflects the corresponding book chapter and the functions it defines reside under a homonymous namespace, as outlined in Section 3.1.0.1.

Errata

Chapter 7: Joins

Section Note
7.1 The example of comma join corresponds to t1,t6 and not t5,t6.
Section Note
7.2.10 In the text we state that we aim to aggregate the data from the table dataSet2 over a window starting 1 minute prior to the trade and ending at the time of the trade.; however, the window is defined as starting at the time of the trade and ending 1 minute after the trade. The example should read window: (-00:01:00;0) +\: exec time from dataSet1;.

Chapter 14: Time Series Econometrics

Section Note
14.1.6.1 Ordering of phi vector inside implementation of .quantQ.ts.simAR should be reversed to be in line with definition 14.1 and example on page 276. Addressed in the repo. Variation of the function using adverbs also provided, under .quantQ.tse.simAR

Chapter 15: Fourier Transform

Section Note
15.3 Example following implementation of the Hamilton product (15.29-15.32) should read as .quantQ.quat.mult[quat1;quat3]. Definition of .quantQ.quat.mult has a typo which is fixed in the repo.

Chapter 22: Neural Networks

Section Note
22.2 Missing functions .quantQ.nn.funcNN and .quantQ.nn.funcErrNN have been added to the repo.

Extension beyond the book

Mathematical functions

We have added .quantQ.math namespace with various mathematical functions, constants and identities. Currently, there are constants, hyperbolic functions, number of special functions and polynomials (defined in the real domain), and the most frequently used PDF and CDF. In order to obtain the multivariate normal distribution, we have included tetrachoric expansion.

Biostatistics

We have added into the .quantQ.stats namespace functions to work with contingency tables, namely Exact Fisher test and Barnard test.

Optimization

We have added a native implementation of Nelder-Mead (amoeba) optimisation method. Functions can be found in the .quantQ.amoeba namespace. The illustration shows a solution to Rosenbrock's function.

Bioinformatics

We have implemented the Needleman-Wunsch algorithm developed in bioinformatics which can be used to align two sequences (nucleotide sequences or general finite-length sequences) using principles of dynamic programming. Functions are within the .quantQ.stats namespace. The local matching using the Smith-Waterman algorithm is available as well. Levenshtein distance calculated with Wagner-Fischer algorithm has been added.

Dynamic Time Warp

We have added the algorithms to perform the Dynamic Time Warp calculations in the .quantQ.dtw namespace. Details of the algorithm can be found here.

Deep Neural Networks

We have added deep neural networks. The features implemented include the dropout, batches, and different functional forms of annealing in learning and regularisation. The architecture is specified in the input dictionary; the rest is fully automated. More examples can be found here. The implementation is part of the .quantQ.nn namespace.

Stochastic Optimisation

We have implemented the stochastic optimisation, which can be used to minimise the provided n-dimensional function using random search. The provided method is an iterative procedure which explores at every step a neighbourhood comprising of random points lying on the n-sphere, where the radius of the sphere is shrinking with occasional attempts to investigate further points. The implementation is part of the .quantQ.so namespace.

Support Vector Machine

We have added the Support Vector Machine for binary classification using the Soft Margin and Stochastic Gradient Descent (using one observation per step). Functions are specified with a set of the default setup, which is customised based on the provided dataset. The Soft Margin specification is provided with regularisation parameter, which can be optimised using the built-in n-fold cross-validation method.

The implementation includes a function which calculates more than 20 statistics used to evaluate the binary classification, including specificity, accuracy, precision, or F1. More examples can be found here. The implementation is part of the .quantQ.svm namespace.

Poisson Regression

We have added the library with the Poisson distribution and the Poisson regression to estimate the integer-valued Poisson process. The functions are based on the maximum likelihood optimised using routines from .quantQ.opt library. The library also contains the L2-regularised version with n-fold cross-validation.

Examples and basic usage of the library can be found in here. The implementation is part of the .quantQ.pois namespace.

Quantum Computing

We have added name space .quantQ.quantum which contains a set of routines to set up and perform quantum computing using quibits. The library is not connected to any actual quantum computer and is for demonstration purposes only.

Examples and basic library usage can be found in here.

Topological Data Analysis

We have added the library .quantQ.tda which allows us to perform the topological data analysis for a cloud on n-dimensional points. The routines provided include calculation of the distance between points, identification of the all Vietoris-Rips complexes given the provided threshold, the routine to calculate all unique loops, and several analytical functions to get insight into data using the TDA.

Examples and basic library usage can be found in here.

Utility Library

We have added the library .quantQ.util with various utility functions.

Hopfield Neural Networks

We have added the library .quantQ.hopf to define and work with Hopfield networks. In addition, we have added Hebbian descent to calculate train the network for a provided input/output.

Technical Analysis

We have added the library .quantQ.ta which comprises a growing list of technical indicators from what is known as Technical Analysis. The details of the library can be found in here.

RSA Cryptoanalysis

We have added the library .quantQ.rsa. The library contains the implementation of the RSA algorithm to encrypt message using public and private key. The purpose of the library is to illustrate the RSA concept. The library contains several additional functions to work with prime numbers and perform factorisation.

Random Matrix Theory

We have added the library .quantQ.rmt. The library contains implementation of Excess Out-of-Sample Risk and Fleeting Modes paper by Bouchaud et al. here. The algorithm performs the comparison of two datasets in terms of their empirical covariance matrices using the Random Matrix Theory. In order to implement the algorithm, we have added the Denman-Bevaers and power series algorithm to calculate the square-root of the matrix along with several utility functions. Further, .quantQ.stats has been extended by generalised binomial number calculator.

Maximum Entropy Block Bootstrap

We have added he library .quantQ.meb. The library implements the block bootstrap using the Maximum Entropy as proposed in Bergamelli, Novotny and Urga, 2015 here. The method is intended to be used to create a bootstrapped samples for non-stationary time series, where the temporal component is preserved by the blocks with the overall time series being replicated by using the Maximum Entropy bootstrap. For completeness, we provide an old implementation by Vinod and Lopez for reference.

quantq's People

Contributors

benrothman93 avatar hanssmail avatar skeevey avatar skeptiqos avatar sydx avatar team2of2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

quantq's Issues

Typo in function ols.fStatistics printed in book

Hi, in page 255 function ols.fStatistics has a count[] function that is not necessary and is actually corrected in the library. I mention this in case you want to add this to the documentation of typos from the book
Best regards

Missing dictionaries funcNN and funcErrNN from .quantQ.nn

It seems that quantQ_nn.q does not contain the definitions of .quantQ.nn.funcNN and .quantQ.nn.funcErrNN, therefore when .quantQ.nn.modelNN is called it yields an evaluation error. Adding this in the library fixes the problem and the function runs, reproducing the example in page 451 of the book

Issues in example 14.3.1.3 page 289

I had to do a few adaptations of the code in order to reproduce the results of example 14.3.1.3 on VAR models. In general all had to do with overloaded functions like ts.Y, ts.zt and ts.Z.

  • For ts.Y there are two versions of the same function, the only difference being the evaluation of y[;0] or y[0;]. The problem seems to be that VAR functions like ts.varOrder call the function with the wrong index order.

  • For ts.zt and ts.Z the overloaded versions have an extra parameter "flag", but functions like ts.varOrder call the curried version which return a function and therefore the function fails.

In all these cases my solution has been to rename the overloaded functions, I don't know if there is a way to impose the use of one of the specific versions, or there should be a special namespace for the VAR utility functions.
Best regards

.quantQ.simul.genBoxMuller does not its parameter

Function returns 2 normal variates. In Chapter 14 the results are sampled as e.g. 100?.quantQ.simul.genBoxMuller[] (sampling 100 values from a list of two normal variates). I suspect the intention is that .quantQ.simul.genBoxMuller[n] return n random variates or similar

runRule function missing from .quantQ_trees.q

The function defined in page 487 .quantQ.trees.runRule is missing from the library file, making predictOnTree fail. By adding it as defined in the book the example from the book is reproduced.

Issue in function ols.RMSE

Both in the book and in the library the ols.RMSE function is implemented using the wavg function, but this seems to me incorrect, since wavg is according to the documentation (sum x*y) % sum x, so it is dividing by sum t instead of count t. This actually makes possible to have negative values depending on the order of model 0 and 1. In my opinion a simple solution could be to use sqrt avg t * t: tabModel...
Best regards

Typo function .quantQ.ols.logL

When using this function I get an evaluation error at "log[2*acos-1]". The function seems to be missing a separation between "acos" and "-1", as it is actually printed in page 250 of the book.
Thanks a lot for the great library and book

Typos/Errors

Hi,
there's a small error in the output of t5,t6 on p.123: The output displayed shows t1,t6 rather than t5,t6

and there's an error in either the explanation of the example for or the window used in 7.2.10 for the window join. p.140

Currently reading/working through the book and I really enjoy it.

Thanks

Error with .quantQ.ols.olsTab

Regarding .quantQ.ols.olsTab
under tabStats
"f"$count[y] should read "f"$count[y] because y is referencing a variable under ols.fit, which counts all nonmissing observations in the table.

Typo namespace function olsTab

In page 254 of the book it states that olsTab is nested under namespace "ols" but in the current version of the library seems to be under quantQ directly (quantQ.olsTab instead of quantQ.ols.olsTab as in the book)
Best regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.