Giter Site home page Giter Site logo

stefan-jansen / machine-learning-for-trading Goto Github PK

View Code? Open in Web Editor NEW
12.3K 381.0 4.0K 667.34 MB

Code for Machine Learning for Algorithmic Trading, 2nd edition.

Home Page: https://ml4trading.io

Python 0.04% Shell 0.01% Jupyter Notebook 99.96% JavaScript 0.01%
machine-learning trading investment finance data-science investment-strategies artificial-intelligence trading-strategies deep-learning synthetic-data ml4t-workflow trading-agent

machine-learning-for-trading's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machine-learning-for-trading's Issues

missing csv file

The file data/create_datasets.ipynb has the following line

df = pd.read_csv('us_equities_meta_data.csv')

However, this csv file is not present. I can't seem to find it anywhere in the distribution. Can you tell me where to locate it?

do not have permission in /home/packt/ml4t

When I install via docker on windows10.

After input zipline ingest seem like permission to mkdir in /home/packt/ml4t is demied.
How can I add permission to this folder.

(ml4t-zipline) packt@6463f5cb63ce:~$ zipline ingest
Traceback (most recent call last):
File "/opt/conda/envs/ml4t-zipline/bin/zipline", line 8, in
sys.exit(main())
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 1256, in invoke
Command.invoke(self, ctx)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/main.py", line 60, in main
os.environ,
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/utils/run_algo.py", line 249, in load_extensions
pth.ensure_file(default_extension_path)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/utils/paths.py", line 58, in ensure_file
ensure_directory_containing(path)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/utils/paths.py", line 45, in ensure_directory_containing
ensure_directory(os.path.dirname(path))
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/utils/paths.py", line 30, in ensure_directory
os.makedirs(path)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/os.py", line 231, in makedirs
makedirs(head, mode, exist_ok)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/os.py", line 241, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/packt/ml4t/data'

Open Table Spyder MAC IOS problem

Good morning,
I'm trying to run the spyder files about open table folders, I have a mac with Catalina and I have a problem with geckodriver, it seems there is a bug
https://firefox-source-docs.mozilla.org/testing/geckodriver/Notarization.html
mozilla/geckodriver#1629

I have installed geckodrive with homebrew but it doesn't work anyway
I receive this spyder console error: WebDriverException: 'geckodriver' executable needs to be in PATH.

Please help me to resolve this issue
Ty Ale

Thank you for your kind reply.

Thank you for your kind reply.
I
found the environment_windows.yml from the first_edition branch and installed it well and am using it well.
I have one additional question.

Where can I find the zipline installation files for Windows in Python 3.5 environments to use Chapter 4 through 5?

In this chapter folder, i could only find file for linux. Do you provide an installation file for Windows os ?

Hi @silent0506,

thank you for your interest in the book. You can just delete these packages form the environment.yml file. However, the first_edition branch also contains a file tailored to Windows, you may want to use this instead.

The second edition has been released a few weeks ago and contains a lot of additional material; I would highly recommend you review the notebooks in this repo as well, you may find it quite useful.

thank

Originally posted by @silent0506 in #36 (comment)

earnings to csv

Good morning,
I correctly run the sa_selenium.py on spyder
The code correctly scrap the seeking alpha website but I can't save the results to csv file
I have write my folder path in rows 23, 24, 25
I have modify row 32 using 'html.parser'
I have add to row 89 the geckodriver path in folder
I have modify rows 114 and 115

I think I'm wrong with the csv paths or files declaring please help me to solv that problem
Ty Ale

import re
from pathlib import Path
from random import random
from time import sleep
from urllib.parse import urljoin

import pandas as pd
from bs4 import BeautifulSoup
from furl import furl
from selenium import webdriver

transcript_path = Path('transcripts')

def store_result(meta, participants, content):
    """Save parse content to csv"""
    path = transcript_path / 'parsed' / meta['symbol']
    if not path.exists():
        path.mkdir(parents=True, exist_ok=True)
    pd.DataFrame(content, columns=['speaker', 'q&a', 'content']).to_csv(path / '/Users/alessiomontani/Documents/0_Python/CODE/Machine-Learning-for-Algorithmic-Trading-Second-Edition-master/03_alternative_data/02_earnings_calls copy/content.csv', index=False)
    pd.DataFrame(participants, columns=['type', 'name']).to_csv(path / '/Users/alessiomontani/Documents/0_Python/CODE/Machine-Learning-for-Algorithmic-Trading-Second-Edition-master/03_alternative_data/02_earnings_calls copy/participants.csv', index=False)
    pd.Series(meta).to_csv(path / '/Users/alessiomontani/Documents/0_Python/CODE/Machine-Learning-for-Algorithmic-Trading-Second-Edition-master/03_alternative_data/02_earnings_calls copy/earnings.csv')


def parse_html(html):
    """Main html parser function"""
    date_pattern = re.compile(r'(\d{2})-(\d{2})-(\d{2})')
    quarter_pattern = re.compile(r'(\bQ\d\b)')
    soup = BeautifulSoup(html, 'html.parser')

    meta, participants, content = {}, [], []
    h1 = soup.find('h1', itemprop='headline')
    if h1 is None:
        return
    h1 = h1.text
    meta['company'] = h1[:h1.find('(')].strip()
    meta['symbol'] = h1[h1.find('(') + 1:h1.find(')')]

    title = soup.find('div', class_='title')
    if title is None:
        return
    title = title.text
    print(title)
    match = date_pattern.search(title)
    if match:
        m, d, y = match.groups()
        meta['month'] = int(m)
        meta['day'] = int(d)
        meta['year'] = int(y)

    match = quarter_pattern.search(title)
    if match:
        meta['quarter'] = match.group(0)

    qa = 0
    speaker_types = ['Executives', 'Analysts']
    for header in [p.parent for p in soup.find_all('strong')]:
        text = header.text.strip()
        if text.lower().startswith('copyright'):
            continue
        elif text.lower().startswith('question-and'):
            qa = 1
            continue
        elif any([type in text for type in speaker_types]):
            for participant in header.find_next_siblings('p'):
                if participant.find('strong'):
                    break
                else:
                    participants.append([text, participant.text])
        else:
            p = []
            for participant in header.find_next_siblings('p'):
                if participant.find('strong'):
                    break
                else:
                    p.append(participant.text)
            content.append([header.text, qa, '\n'.join(p)])
    return meta, participants, content


SA_URL = 'https://seekingalpha.com/'
TRANSCRIPT = re.compile('Earnings Call Transcript')

next_page = True
page = 1
driver = webdriver.Firefox(executable_path='/Users/alessiomontani/Documents/0_Python/CODE/Machine-Learning-for-Algorithmic-Trading-Second-Edition-master/03_alternative_data/02_earnings_calls copy/geckodriver')
while next_page:
    print(f'Page: {page}')
    url = f'{SA_URL}/earnings/earnings-call-transcripts/{page}'
    driver.get(urljoin(SA_URL, url))
    sleep(8 + (random() - .5) * 2)
    response = driver.page_source
    page += 1
    soup = BeautifulSoup(response, 'html.parser')
    links = soup.find_all(name='a', string=TRANSCRIPT)
    if len(links) == 0:
        next_page = False
    else:
        for link in links:
            transcript_url = link.attrs.get('href')
            article_url = furl(urljoin(SA_URL, transcript_url)).add({'part': 'single'})
            driver.get(article_url.url)
            html = driver.page_source
            result = parse_html(html)
            if result is not None:
                meta, participants, content = result
                meta['link'] = link
                store_result(meta, participants, content)
                sleep(8 + (random() - .5) * 2)

driver.close()
earnings = pd.read_csv('earnings.csv')
#print(earnings)

Installation/Docker issue

I've downloaded Docker for windows but have encountered a couple issues when attempting to follow the steps here: https://github.com/PacktPublishing/Machine-Learning-for-Algorithmic-Trading-Second-Edition/blob/44c03418255a196c74b698c7d8a1cb82d5c7fa5f/installation/README.md.

It throws the following error:
docker: invalid reference format.

any feedback is much appreciated!

Cannot repeat with 01_parse_itch_order_flow_messages.ipynb

Cannot repeat 01_parse_itch_order_flow_messages.ipynb. Numerous errors, some could resolve by changing names of variables, but cannot in
The following code processes the binary file and produces the parsed orders stored by message type:

I receive the error:
error Traceback (most recent call last)
in
17 # read & store message
18 record = data.read(message_size - 1)
---> 19 message = message_fields[message_type]._make(unpack(fstring[message_type], record))
20 messages[message_type].append(message)
21

error: bad char in struct format

Misclassification DataFrame is badly composed in LDA notebook

The week 7 practice notebook 04_lda_with_sklearn.ipynb contains a badly compiled pandas DataFrame. It groups the test set predictions by ground truth topic, which reorders the rows. However, it leaves the headings and articles in their original, ungrouped order as it concatenates them into the DataFrame. The rows are thus jumbled chaotically.

test_assignments = test_opt_eval.groupby(level='topic').idxmax(axis=1)
test_assignments = test_assignments.reset_index(-1, drop=True).to_frame('predicted').reset_index()
test_assignments['heading'] = test_docs.heading.values
test_assignments['article'] = test_docs.article.values
test_assignments.head(6)

# Output:
#      topic predicted  heading                           article
# 0 Business Topic 4    Kilroy launches 'Veritas' party   Ex-BBC chat show host and East Midlands MEP R...
# 1 Business Topic 4    Radcliffe eyes hard line on drugs Paula Radcliffe has called for all athletes f...
# 2 Business Topic 4    S Korean consumers spending again South Korea looks set to sustain its revival ...
# 3 Business Topic 4    Quiksilver moves for Rossignol    Shares of Skis Rossignol, the world's largest...
# 4 Business Topic 4    Britons fed up with net service   A survey conducted by PC Pro Magazine has rev...
# 5 Business Topic 4    Scissor Sisters triumph at Brits  US band Scissor Sisters led the winners at th.

According to the code's output, the ground truth classification for all 6 articles is "Business." However, the classifications are in fact "Politics," "Sport," "Business," "Business," "Tech," and "Entertainment."

Here's code that fixes the problem:

test_assignments = pd.DataFrame(test_eval.idxmax(axis=1), columns=["prediction"])
test_assignments['heading'] = test_docs.heading.values
test_assignments['article'] = test_docs.article.values
test_assignments.head()

# Output
#    topic   prediction heading                           article
# Politics   Topic 5    Kilroy launches 'Veritas' party   Ex-BBC chat show host and East Midlands MEP R...
# Sport      Topic 3    Radcliffe eyes hard line on drugs Paula Radcliffe has called for all athletes f...
# Business   Topic 2    S Korean consumers spending again South Korea looks set to sustain its revival ...
# Business   Topic 2    Quiksilver moves for Rossignol    Shares of Skis Rossignol, the world's largest...
# Tech       Topic 4    Britons fed up with net service   A survey conducted by PC Pro Magazine has rev...

You have my permission to use my code and analysis; I would only request the courtesy of being credited for any contribution I make toward your final product. (Thanks!)

Getting UnsortedIndexError when trying to read/filter assets.h5 file in 04_alpha_factor_research/00_data/feature_engineering notebook

Hi:

I created the assets.h5 file with data/create_datasets.ipynb and it looks fine:

$ h5ls -f assets.h5 
/fred                    Group
/quandl                  Group
/sp500                   Group
/us_equities             Group

However, 04_alpha_factor_research/00_data/feature_engineering.ipynb throws the following error when it tries to read and filter the prices data set:

DATA_STORE = '../../data/assets.h5'
with pd.HDFStore(DATA_STORE) as store:
    prices = store['quandl/wiki/prices'].loc[idx['2000':'2018', :], 'adj_close'].unstack('ticker')
    stocks = store['us_equities/stocks'].loc[:, ['marketcap', 'ipoyear', 'sector']]
[...]
UnsortedIndexError: 'MultiIndex slicing requires the index to be lexsorted: slicing on levels [0], lexsort depth 0'

The prices data seems fine -- it just appears to be the filtering which is breaking.

I am still fairly new to Pandas, but I got the filter to work by explicitly creating a date range:

prices = store['quandl/wiki/prices'].loc[ pd.date_range(start='1/1/2000', end='12/31/2018'), idx['adj_close'] ].unstack('ticker')

Thanks!
Jeffrey

Installation

when I run docker run -it -v $(pwd):/home/packt/ml4t -p 8888:8888 -e QUANDL_API_KEY=myapi*- --name ml4t appliedai/packt:latest bash

from command line after installing docker desktop and allocating 4gb ram i get this:

docker: error during connect: Post https://192.168.99.100:2376/v1.40/containers/create?name=ml4t: dial tcp 192.168.99.100:2376: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
See 'docker run --help'.

any help to get me on the right track would be greatly appreciated, so far i've just been trying to run the code in pycharm or jupyter notebooks with abit of trouble

zipline backtest with_pf_optimization error

chapter 5 - 05_strategy_evaluation
02_backtest_with_pf_optimization.ipynb throws error - Cannot compare tz-naive and tz-aware timestamps

backtest = run_algorithm(start=start,
                         end=end,
                         initialize=initialize,
                         before_trading_start=before_trading_start,
                         bundle='quandl',
                         capital_base=capital_base)

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
4 before_trading_start=before_trading_start,
5 bundle='quandl',
----> 6 capital_base=capital_base)
7
8 # backtest = run_algorithm(start=start,

C:\ProgramData\Anaconda3\envs\env_zipline\lib\site-packages\zipline\utils\run_algo.py in run_algorithm(start, end, initialize, capital_base, handle_data, before_trading_start, analyze, data_frequency, data, bundle, bundle_timestamp, trading_calendar, metrics_set, default_extension, extensions, strict_extensions, environ, blotter)
428 local_namespace=False,
429 environ=environ,
--> 430 blotter=blotter,
431 )

C:\ProgramData\Anaconda3\envs\env_zipline\lib\site-packages\zipline\utils\run_algo.py in _run(handle_data, initialize, before_trading_start, analyze, algofile, algotext, defines, data_frequency, capital_base, data, bundle, bundle_timestamp, start, end, output, trading_calendar, print_algo, metrics_set, local_namespace, environ, blotter)
212 capital_base=capital_base,
213 data_frequency=data_frequency,
--> 214 trading_calendar=trading_calendar,
215 ),
216 metrics_set=metrics_set,

C:\ProgramData\Anaconda3\envs\env_zipline\lib\site-packages\zipline\utils\factory.py in create_simulation_parameters(year, start, end, capital_base, num_days, data_frequency, emission_rate, trading_calendar)
60 data_frequency=data_frequency,
61 emission_rate=emission_rate,
---> 62 trading_calendar=trading_calendar,
63 )
64

C:\ProgramData\Anaconda3\envs\env_zipline\lib\site-packages\zipline\finance\trading.py in init(self, start_session, end_session, trading_calendar, capital_base, emission_rate, data_frequency, arena)
146 assert start_session <= end_session,
147 "Period start falls after period end."
--> 148 assert start_session <= trading_calendar.last_trading_session,
149 "Period start falls after the last known trading day."
150 assert end_session >= trading_calendar.first_trading_session, \

pandas/_libs/tslib.pyx in pandas._libs.tslib._Timestamp.richcmp()

pandas/_libs/tslib.pyx in pandas._libs.tslib._Timestamp._assert_tzawareness_compat()

TypeError: Cannot compare tz-naive and tz-aware timestamps`
Additional details:
Env - Windows 10
Python - python=3.5.6
Pandas - pandas=0.22.0

.numpy() in RL

Are there source codes for this particular function?
Just giving us an environment.yml and not explaining how the code works don't seem like a good way to learn and generalize...

Docker Installation

HI there,

I'm attempting to do a fresh install of the ml4t setup within docker desktop on windows 10 pro.
Having trouble with the docker run -it -v line you provided, even after downloading the directory and switching into that folder with "cd" whether it be the unzipped machine learning for trading/ folder or down to env/linux/ where the .yaml file is for zipline ect,

each time I run docker run -it -v $(pwd):/home/packt/ml4t -p 8888:8888 -e QUANDL_API_KEY= --name ml4t appliedai/packt:latest bash

i get back the system cannot find the file specified.

I have also linked the file within docker>resources

When I put in the docker getting started example "docker run" file into my cmd terminal I am able to get dockers "getting started" repo or whatever it is called no problem, but no matter what i try i just get system cannot find the file specified when trying to setup,

I've just wiped my whole machine to start fresh for this, I was thinking to set back up with linux mint because i was looking into someone elses issue of setting up linux no problem, but docker desktop isnt for linux so It just went with windows since i already upgrated to pro for it, but still facing this issue either way not sure what to do., I was trying to set up with andaconda last night but everytime I setup the zipline_env and activated I got hit with the giant list of missing dependancies so I was like okay I clearly need to setup the image in the docker environment to make it work, but not i just cant get the file located to do so,

Im super excited to get this set up though

I greatly appreciate your help with this and the insights you're sharing through the book I was wondering if you had a ebook or audio copy so I could listen as well to consumer the information in multiple ways, i just dont have any money to buy the ebook on packt right now or i would I could just see myself being able to slide in another 15-20 hours just listening to the book on top of what im already trying to consume, just thought id ask so i could consume in the most efficient amount of time as possible, this is my sole task/ goal right now.

Thanks again, have a good one.

get_data.py script missing in repo?

At location 2646 "Useful pandas and NumPy methods" in the book it states "The notebook uses data generated by the get_data.py script in the data folder in the root directory of the GitHub repo and stored in HDF5 format for faster access".

Can't find the get_data.py script in this repo, missing? Assume it generates the access.h5 file used in many of the other chapters' notebooks.

05_strategy_evaluation - Mean Revision Startegy

Hi Stefan, I am trying to test the mean revision strategy provided by you in chapter 05_strategy_evaluation, 01_backtest_with_trades.ipynb with my own custom bundle of Indian equity. It doesn't create any output for
returns, positions, transactions = extract_rets_pos_txn_from_zipline(backtest)

Any help is appreciated. The code looks as below

class MeanReversion(CustomFactor):
    """Compute ratio of latest monthly return to 12m average,
       normalized by std dev of monthly returns"""
    inputs = [Returns(window_length=MONTH)]
    window_length = YEAR

    def compute(self, today, assets, out, monthly_returns):
        df = pd.DataFrame(monthly_returns)
        out[:] = df.iloc[-1].sub(df.mean()).div(df.std())

def compute_factors():
    """Create factor pipeline incl. mean reversion,
        filtered by 30d Dollar Volume; capture factor ranks"""
    mean_reversion = MeanReversion()
    dollar_volume = AverageDollarVolume(window_length=30)
    return Pipeline(columns={'longs'  : mean_reversion.bottom(N_LONGS),
                             'shorts' : mean_reversion.top(N_SHORTS),
                             'ranking': mean_reversion.rank(ascending=False)},
                    screen=dollar_volume.top(VOL_SCREEN))

def rebalance(context, data):
    """Compute long, short and obsolete holdings; place trade orders"""
    factor_data = context.factor_data
    assets = factor_data.index

    longs = assets[factor_data.longs]
    shorts = assets[factor_data.shorts]
    divest = context.portfolio.positions.keys() - longs.union(shorts)

    exec_trades(data, assets=divest, target_percent=0)
    exec_trades(data, assets=longs, target_percent=1 / N_LONGS if N_LONGS else 0)
    exec_trades(data, assets=shorts, target_percent=-1 / N_SHORTS if N_SHORTS else 0)

def exec_trades(data, assets, target_percent):
    """Place orders for assets using target portfolio percentage"""
    for asset in assets:
        if data.can_trade(asset) and not get_open_orders(asset):
            order_target_percent(asset, target_percent)

def before_trading_start(context, data):
    """Run factor pipeline"""
    context.factor_data = pipeline_output('factor_pipeline')
    record(factor_data=context.factor_data.ranking)
    assets = context.factor_data.index
    record(prices=data.current(assets, 'price'))

def initialize(context):
    """Setup: register pipeline, schedule rebalancing,
        and set trading params"""
    set_benchmark(symbol('INFY'))
    attach_pipeline(compute_factors(), 'factor_pipeline')
    schedule_function(rebalance,
                      date_rules.week_start(),
                      time_rules.market_open(),)

    set_commission(us_equities=commission.PerShare(cost=0.00075, min_trade_cost=.01))
    set_slippage(us_equities=slippage.VolumeShareSlippage(volume_limit=0.0025, price_impact=0.01))

backtest = run_algorithm(start=start,
                         end=end,
                         initialize=initialize,
                         before_trading_start=before_trading_start,
                         capital_base=capital_base,
                         data_frequency = 'daily', 
                         bundle= 'nse_data')

Cannot access Jupyter Notebook from Dockers container.

Thank you for sharing this awesome work.

I am using windows and following the dockers workflow, currently i am stuck at step 7 on opening the jupyter notebook.

(base) packt@1154ebfb6453:~/ ml4t$ conda activate ml4t

(ml4t) packt@1154ebfb6453:~/ml4t$ jupyter notebook --ip 0.0.0.0 --no-browser --allow-root
[I 09:37:37.174 NotebookApp] [nb_conda_kernels] enabled, 4 kernels found
[I 09:37:37.184 NotebookApp] Writing notebook server cookie secret to
/home/packt/.local/share/jupyter/runtime/notebook_cookie_secret
[I 09:37:39.115 NotebookApp] JupyterLab extension loaded from /opt/conda/envs/ml4t/lib/python3.7/site-packages/jupyterlab
[I 09:37:39.115 NotebookApp] JupyterLab application directory is /opt/conda/envs/ml4t/share/jupyter/lab
[I 09:37:39.118 NotebookApp] Serving notebooks from local directory: /home/packt/ml4t
[I 09:37:39.118 NotebookApp] The Jupyter Notebook is running at:
[I 09:37:39.118 NotebookApp] http://1154ebfb6453:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f
[I 09:37:39.118 NotebookApp] or http://127.0.0.1:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f
[I 09:37:39.118 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 09:37:39.122 NotebookApp]

To access the notebook, open this file in a browser:
    file:///home/packt/.local/share/jupyter/runtime/nbserver-296-open.html
Or copy and paste one of these URLs:
    http://1154ebfb6453:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f
 or http://127.0.0.1:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f

i copy and paste this url http://127.0.0.1:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f on the browser to open up jupyter notebook but was prompted to an authentication page as follows

image

I have tried the following but still fail to access the jupyternotebook

  • entering the authentication token code ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f

  • running this command before running jupyter notebook --ip 0.0.0.0 --no-browser --allow-root jupyter notebook --NotebookApp.token='' to clear authentication

Thank you for your help.

link might not be correct

I cannot run https://github.com/stefan-jansen/machine-learning-for-trading/blob/master/data/create_datasets.ipynb

I get a KeyError: 'code' in the section were we download the Wiki Prices Metadata.

It seems like that the link (https://www.quandl.com/databases/WIKIP/documentation) in section 3.1 is also pointing towards the prices data set while it should point towards a data set with meta data. The referenced data set does not contain the required 'code' and 'name' columns.

Has the meta data been moved?

docker install in GCP

Hi,
I'm trying to setup environment in GCP and AWS because my laptop power is small.
I succeeded in docker run.
But when it comes to 'zipline ingest', I can do on AWS, while in GCP there are error statement as below. .

PermissionError: [Errno 13] Permission denied: '/home/packt/ml4t/data/.zipline'

Do I have any solution?

KR

zipline data ingest for backtesting

Hi I tried running the zipline backtest in chapter 12, and upon following the notebook code by code, I came across the following error:

"ValueError: Failed to find any assets with country_code 'US' that traded between 2016-01-13 00:00:00+00:00 and 2016-01-21 00:00:00+00:00.
This probably means that your asset db is old or that it has incorrect country/exchange metadata."

This happened to be so for any stard_date I tried, the first week will always raise this ValueError.

I also followed the book's instruction on ingesting quandl data.
Any suggestions?
Thanks in advance!

Also, for chapter 11, how is the 'stooq' bundle used for backtesting ingested?

ResolvePackageNotFound: error (window os)

Hello. I recently bought a book(first edition) and am reading it well. Thank you.

I'm going to practice the code.

Using 'conda env create -f environment.yml'

I was trying to install my environment, but the following error occurred:

What should I do? (I am a Windows os user.)

`ResolvePackageNotFound:

binutils_impl_linux-64=2.28.1
gxx_impl_linux-64=7.2.0
gxx_linux-64=7.2.0
libgcc-ng=8.2.0
libstdcxx-ng=8.2.0
readline=7.0
gcc_linux-64=7.2.0
gmp=6.1.2
libuuid=1.0.3
gstreamer=1.14.0
graphviz=2.40.1
dbus=1.13.2
binutils_linux-64=7.2.0
expat=2.2.6
libgfortran-ng=7.3.0
gcc_impl_linux-64=7.2.0
ncurses=6.1
gst-plugins-base=1.14.0
libedit=3.1.20170329`

FEATURE_ENGINEERING ERROR

within the file feature_engineering I keep getting the below error when trying to 'pd.slideIndex'.
not sure what it means and how to overcome this issue.

TypeError: unhashable type: 'slice'

could you please share the notebook for pgportfolio

I've been working on deconstructing pgfortfolio which I found you pointed to in the read me in 22. reinforcement learning, i decided to wander off into the packt library and came across a new RL book releases sept 2020 and listened to it with a google chrome extension fairly quickly so it allowed me to gain some high inspiration for trader, where I actually wrote out my first hypothesis which was that reinforcement learning will allow me to learn the strategy which the agent uses so I could at least implement this for my own strategy and stop loosing money, as oct 1st i wasted away the last of my october funds on poor signals which gave me ultra motivation to commit to my fullest capability.

Anyways I know its not your code but i was really hoping you might be kind enough to share the code which you used for pgfortfolio to receive the 4 fold returns so i could play with it further and try to deconstruct it into my own system, it would really help if i could help it working from the start, i've been taking notes from their system for the majority of the week to understand the architecture i only tried to implement the code a couple times, but i have note figured it out yet as of now,
was really hoping you might be able to help me with a quick consult to get me on the right track so i could see what code you put in what order. so i could then pull up pgfortfolio in a jupyter notebook and disect it,

thanks so much once again

Regarding the Russian translation

Hi, Stefan.

I'm working on a Russian translation of your book for a Saint-Petersburg Publ. and found some omissions in the text of ch 17 on DL (formulas) and ch. 18 on CNN (picture #1) for filters so far.
Can you indicate, where to find the amended versions of the chapters or advise on what to do?

By the way, I found your book very enlightening, an excellent add-on to the Advances in FML by Marcos de Prado.

B.R.,
Andrey Logunov

Issues with datetime in 02_rebuild_nasdaq_order_book.ipynb

I've got an issue when converting timestamp to integer so changed:
buy_per_min.timestamp = buy_per_min.timestamp.add(utc_offset).values.astype(int)
Same for sell_per_min and trades_per_min
But now I receive error
OSError: [Errno 22] Invalid argument
in the last plot in the line
xticks = [datetime.fromtimestamp(ts / 1e9).strftime('%H:%M') for ts in ax.get_xticks()]
as all timestamps are negative.

--> 145 return registry.make(id, **kwargs)

Hi i was wondering if you could point me in the right direction im getting this error, I thought it might have been something to do with loading quandl data but It seemed fine when I loaded in another notebook, I thought I had it all working then my computer went to sleep over night and it would not work anymore, I am in the ml4t-dl conda environment using docker, inside of jupyter lab, i had to first reinstall the tensorflow estimator last night from an error it was throwing then this was the next, I can't get past it, my quandl api key works well with other notebooks reloading, I thought you may be more familiar with this code than I am, thank you kindly

`INFO:trading_env:trading_env` logger started.
INFO:trading_env:loading data for AAPL...
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-10-f07228b2136b> in <module>
----> 1 trading_environment = gym.make('trading-v0')
      2 trading_environment.env.trading_days = trading_days
      3 trading_environment.env.trading_cost_bps = 1e-3
      4 trading_environment.env.time_cost_bps = 1e-4
      5 trading_environment.env.ticker = 'AAPL'

/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/gym/envs/registration.py in make(id, **kwargs)
    143 
    144 def make(id, **kwargs):
--> 145     return registry.make(id, **kwargs)
    146 
    147 def spec(id):

/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/gym/envs/registration.py in make(self, path, **kwargs)
     88             logger.info('Making new env: %s', path)
     89         spec = self.spec(path)
---> 90         env = spec.make(**kwargs)
     91         # We used to have people override _reset/_step rather than
     92         # reset/step. Set _gym_disable_underscore_compat = True on

/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/gym/envs/registration.py in make(self, **kwargs)
     58         else:
     59             cls = load(self.entry_point)
---> 60             env = cls(**_kwargs)
     61 
     62         # Make the environment aware of which spec it came from.

~/ml4t/22_deep_reinforcement_learning/trading_env.py in __init__(self, trading_days, trading_cost_bps, time_cost_bps, ticker)
    234         self.time_cost_bps = time_cost_bps
    235         self.data_source = DataSource(trading_days=self.trading_days,
--> 236                                       ticker=ticker)
    237         self.simulator = TradingSimulator(steps=self.trading_days,
    238                                           trading_cost_bps=self.trading_cost_bps,

~/ml4t/22_deep_reinforcement_learning/trading_env.py in __init__(self, trading_days, ticker, normalize)
     62         self.trading_days = trading_days
     63         self.normalize = normalize
---> 64         self.data = self.load_data()
     65         self.preprocess_data()
     66         self.min_values = self.data.min()

~/ml4t/22_deep_reinforcement_learning/trading_env.py in load_data(self)
     73         idx = pd.IndexSlice
     74         with pd.HDFStore('../data/assets.h5') as store:
---> 75             df = (store['quandl/wiki/prices']
     76                   .loc[idx[:, self.ticker],
     77                        ['adj_close', 'adj_volume', 'adj_low', 'adj_high']]

/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/pandas/io/pytables.py in __getitem__(self, key)
    551 
    552     def __getitem__(self, key: str):
--> 553         return self.get(key)
    554 
    555     def __setitem__(self, key: str, value):

/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/pandas/io/pytables.py in get(self, key)
    744         group = self.get_node(key)
    745         if group is None:
--> 746             raise KeyError(f"No object named {key} in the file")
    747         return self._read_group(group)
    748 

KeyError: 'No object named quandl/wiki/prices in the file'

searching alpha- featuring engineering

Hi Stefen,
I have gone trough the algo, (lagged returns, etc) but at the end I did not understand how you have identified the alpha factor. I m basically missing the conclusion.

thks Angelo

Train Agent - TypeError: 'tensorflow.python.framework.ops.EagerTensor' object does not support item assignment

Hi, I managed to get the data to redownload correctly but i've run into this syntax error moving forward on the block of code under the "train agent" section I receive this error:

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
13 0.0 if done else 1.0)
14 if ddqn.train:
---> 15 ddqn.experience_replay()
16 if done:
17 break

in experience_replay(self)
107
108 q_values = self.online_network.predict_on_batch(states)
--> 109 q_values[[self.idx, actions]] = targets
110
111 loss = self.online_network.train_on_batch(x=states, y=q_values)

TypeError: 'tensorflow.python.framework.ops.EagerTensor' object does not support item assignment`

I appreciate you're guidance greatly, thanks again.

create_message_spec.py

Hello Stefan,

I bought your book on Packt, I trying to do the exercises, but I am a bit confuse, where is the file create_message_spec.py? i didn't understand what to do.

Chapter 2 - 01_build_itch_order_book.ipynb

Create_stooq_data.ipynb error storing data

I was running create_stooq_data.ipynb

load some Japanese and all US assets for 2000-2019

markets = {'jp': ['tse stocks'],
'us': ['nasdaq etfs', 'nasdaq stocks', 'nyse etfs', 'nyse stocks', 'nysemkt stocks']
}
frequency = 'daily'

idx = pd.IndexSlice
for market, asset_classes in markets.items():
for asset_class in asset_classes:
print(f'\n{asset_class}')
prices, tickers = get_stooq_prices_and_tickers(frequency=frequency,market=market,asset_class=asset_class)

    prices = prices.sort_index().loc[idx[:, '2000': '2019'], :]
    names = prices.index.names
    prices = (prices
              .reset_index()
              .drop_duplicates()
              .set_index(names)
              .sort_index())
    
    print('\nNo. of observations per asset')
    print(prices.groupby('ticker').size().describe())
    key = f'stooq/{market}/{asset_class.replace(" ", "/")}/'
    
    print(prices.info(null_counts=True))
    
    prices.to_hdf(DATA_STORE, key + 'prices', format='t')
    
    print(tickers.info())
    tickers.to_hdf(DATA_STORE, key + 'tickers', format='t')

got this error.

tse stocks
stooq/data/daily/jp/tse stocks

ValueError Traceback (most recent call last)
in
9 for asset_class in asset_classes:
10 print(f'\n{asset_class}')
---> 11 prices, tickers = get_stooq_prices_and_tickers(frequency=frequency,market=market,asset_class=asset_class)
12
13 prices = prices.sort_index().loc[idx[:, '2000': '2019'], :]

in get_stooq_prices_and_tickers(frequency, market, asset_class)
38 file.unlink()
39
---> 40 prices = (pd.concat(prices, ignore_index=True)
41 .rename(columns=str.lower)
42 .set_index(['ticker', date_label])

/opt/conda/envs/ml4t/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
279 verify_integrity=verify_integrity,
280 copy=copy,
--> 281 sort=sort,
282 )
283

/opt/conda/envs/ml4t/lib/python3.7/site-packages/pandas/core/reshape/concat.py in init(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
327
328 if len(objs) == 0:
--> 329 raise ValueError("No objects to concatenate")
330
331 if keys is None:

ValueError: No objects to concatenate

Any help is appreciated.

2nd edition publishing date

Hi Stefan,
What are the 2nd edition publishing and availability date? Is it the end of this month? Eagerly waiting for it.
Thanks

Windows .yml file

Stefan,

Is it fair to say that readers of the book using windows should focus on the docker installation method? I tried creating a conda environment on a windows machine using the linux .yml file and got the below:

ResolvePackageNotFound:

libxkbcommon=0.10.0
libgcc-ng=9.2.0
gmp=6.2.0
gxx_impl_linux-64=7.5.0
libedit=3.1.20191231
gxx_linux-64=7.5.0
libgfortran-ng=7.5.0
ld_impl_linux-64=2.34
dbus=1.13.14
graphviz=2.42.3
nspr=4.25
libuuid=2.32.1
libxgboost=1.0.2
libtool=2.4.6
readline=8.0
gcc_linux-64=7.5.0
_openmp_mutex=4.5
nss=3.47
gst-plugins-base=1.14.5
gcc_impl_linux-64=7.5.0
ta-lib-base=0.4.0
binutils_linux-64=2.34
py-xgboost=1.0.2
libgomp=9.2.0
ncurses=6.2
libstdcxx-ng=9.2.0
gstreamer=1.14.5
xgboost=1.0.2
binutils_impl_linux-64=2.34
As you mentioned in an earlier comment, I deleted these from the linux file and tried again to create the conda environment but it appeared to result in way too many conflicts. When do you expect to have a tested windows .yml file (you mentioned this was something you were looking into) or should I just proceed with the docker installation approach?

Thanks for your help - I'm looking forward to working through the book!

Issue in downloading stooq data

Hello Stefan,
I am getting the following error in create_stooq_data.ipynb while trying to download US assets from stooq. Please note I am on windows 10. Appreciate your help.
Thanks
Sabir

tse stocks
stooq/data/daily/jp/tse stocks
500
1000
1500
2000
2500
3000
3500

No. of observations per asset
count    3672.000000
mean     2804.119281
std      1176.615453
min         1.000000
25%      2146.000000
50%      3041.000000
75%      3621.000000
max      4905.000000
dtype: float64
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 10296726 entries, (1301.JP, 2005-03-22 00:00:00) to (9997.JP, 2019-12-30 00:00:00)
Data columns (total 5 columns):
open      10296726 non-null float64
high      10296726 non-null float64
low       10296726 non-null float64
close     10296726 non-null float64
volume    10296726 non-null int64
dtypes: float64(4), int64(1)
memory usage: 432.1+ MB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3719 entries, 0 to 3718
Data columns (total 2 columns):
ticker    3719 non-null object
name      3719 non-null object
dtypes: object(2)
memory usage: 58.2+ KB
None

nasdaq etfs
stooq/data/daily/us/nasdaq etfs
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-0e99a64c5acf> in <module>
     11         prices, tickers = get_stooq_prices_and_tickers(frequency=frequency, 
     12                                                        market=market,
---> 13                                                        asset_class=asset_class)
     14 
     15         prices = prices.sort_index().loc[idx[:, '2000': '2019'], :]

<ipython-input-21-65e60f1965c0> in get_stooq_prices_and_tickers(frequency, market, asset_class)
     39 
     40 #     print(prices)
---> 41     prices = (pd.concat(prices, ignore_index=True)
     42               .rename(columns=str.lower)
     43               .set_index(['ticker', date_label])

~\AppData\Roaming\Python\Python37\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    253         verify_integrity=verify_integrity,
    254         copy=copy,
--> 255         sort=sort,
    256     )
    257 

~\AppData\Roaming\Python\Python37\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    302 
    303         if len(objs) == 0:
--> 304             raise ValueError("No objects to concatenate")
    305 
    306         if keys is None:

ValueError: No objects to concatenate

storing predictions code

The code on analyzing cross-validation section in rf and boosting chapters are very misleading and unclear. Why are there different labels on stored metrics and evaluated metrics on the hdf file? Why are different hdf files used when runningcv/storing and evaluating?

It seems like the author has run the cross-validation code snippet somewhere else and pasted it on the repo, resulting in different file and metric names?

01_latent_semantic_indexing - bug

when running this code I keep getting this error.

docs = pd.DataFrame(doc_list, columns=['Category', 'Heading', 'Article'])
docs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2226 entries, 0 to 2225
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Category  2226 non-null   object
 1   Heading   2226 non-null   object
 2   Article   2226 non-null   object
dtypes: object(3)
memory usage: 52.3+ KB
train_docs, test_docs = train_test_split(docs,
                                         stratify=docs.Category,
                                         test_size=50,
                                         random_state=42)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-7ce1db248c17> in <module>
----> 1 train_docs, test_docs = train_test_split(docs,
      2                                          stratify=docs.Category,
      3                                          test_size=50,
      4                                          random_state=42)

~\miniconda3\envs\torch\lib\site-packages\sklearn\model_selection\_split.py in train_test_split(*arrays, **options)
   2150                      random_state=random_state)
   2151 
-> 2152         train, test = next(cv.split(X=arrays[0], y=stratify))
   2153 
   2154     return list(chain.from_iterable((_safe_indexing(a, train),

~\miniconda3\envs\torch\lib\site-packages\sklearn\model_selection\_split.py in split(self, X, y, groups)
   1339         """
   1340         X, y, groups = indexable(X, y, groups)
-> 1341         for train, test in self._iter_indices(X, y, groups):
   1342             yield train, test
   1343 

~\miniconda3\envs\torch\lib\site-packages\sklearn\model_selection\_split.py in _iter_indices(self, X, y, groups)
   1666         class_counts = np.bincount(y_indices)
   1667         if np.min(class_counts) < 2:
-> 1668             raise ValueError("The least populated class in y has only 1"
   1669                              " member, which is too few. The minimum"
   1670                              " number of groups for any class cannot"

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

Installation \Environments\ no Windows path

Stefan - congrats on the new book. I am trying to replicate the environments you have provided but the directory has installations for Linux only. Zipline on the other hand is very fiddly. I will attempt the docker images but would have preferred to work on my local computer. Thanks for your custom and insights.

issue when setting up new environment using yml file provided

Hello
I have tried to create an environment using the environment.yml file provided but I get the following error:
Solving environment: failed
ResolvePackageNotFound:
gcc_linux-64=7.2.0
binutils_impl_linux-64=2.28.1
gxx_linux-64=7.2.0
gst-plugins-base=1.14.0
gstreamer=1.14.0
gmp=6.1.2
pango=1.42.4
dbus=1.13.2
gcc_impl_linux-64=7.2.0
binutils_linux-64=7.2.0
gxx_impl_linux-64=7.2.0
ncurses=6.1
libgcc-ng=8.2.0
libstdcxx-ng=8.2.0
libuuid=1.0.3
readline=7.0
expat=2.2.6
fribidi=1.0.5
libgfortran-ng=7.3.0
graphviz=2.40.1
libedit=3.1.20170329
is there any way I can fix this? thanks for the help

Env setup

Hi Stefan,
Eagerly waiting for the 2nd edition of your book. Meanwhile, I am working with 1st edition and your github repo.
I am using window 10 env.
I setup the env_zipline as per environment.yml under
machine-learning-for-trading/02_market_and_fundamental_data/03_data_providers/05_zipline/
received error for following packages and I removed them from environment.yml. It works and no issues

  - libgfortran-ng=7.3.0
  - libgfortran=3.0.0
  - ncurses=6.2
  - readline=7.0
  - gst-plugins-base=1.14.5
  - libstdcxx-ng=9.2.0
  - libgomp=9.2.0
  - gstreamer=1.14.5
  - dbus=1.13.12
  - requests=2.14.2
  - libgcc-ng=9.2.0
  - libuuid=2.32.1
  - glib=2.63.1
  - _openmp_mutex=4.5
  - libedit=3.1.20181209
'''

however, there is another environment.yml
machine-learning-for-trading/05_strategy_evaluation/02_risk_metrics_pyfolio/ to setup backtesting env when I ran this on my window system it throws a big list of packages not found (listed below). Appreacite your help


`  - markupsafe==1.0=py35h14c3975_1
  - lz4-c==1.8.1.2=h14c3975_0
  - gcc_linux-64==7.3.0=h553295d_7
  - theano==1.0.2=py35h6bb024c_0
  - tk==8.6.8=hbc83047_0
  - nbformat==4.4.0=py35h12e6e07_0
  - h5py==2.8.0=py35h989c5e5_3
  - libgcc-ng==9.1.0=hdf63c60_0
  - traitlets==4.3.2=py35ha522a97_0
  - dbus==1.13.6=h746ee38_0
  - mkl_random==1.0.1=py35h4414c95_1
  - tornado==5.1.1=py35h7b6447c_0
  - cyordereddict==1.0.0=py35h470a237_2
  - glib==2.56.2=hd408876_0
  - ncurses==6.1=he6710b0_1
  - bzip2==1.0.6=h14c3975_5
  - hdf5==1.10.2=hba1933b_1
  - sqlite==3.28.0=h7b6447c_0
  - gst-plugins-base==1.14.0=hbbd80ab_1
  - libffi==3.2.1=hd88cf55_4
  - pcre==8.43=he6710b0_0
  - ptyprocess==0.6.0=py35_0
  - libgpuarray==0.7.6=h14c3975_0
  - numexpr==2.6.8=py35hd89afb7_0
  - zeromq==4.2.5=hf484d3e_1
  - bottleneck==1.2.1=py35h035aef0_1
  - python==3.5.6=hc3d631a_0
  - snappy==1.1.7=hbae5bb6_3
  - mistune==0.8.3=py35h14c3975_1
  - lxml==4.2.5=py35hefd8a0e_0
  - gcc_impl_linux-64==7.3.0=habb00fd_1
  - numpy==1.14.6=py35h3b04361_4
  - scikit-learn==0.20.0=py35h4989274_1
  - binutils_impl_linux-64==2.31.1=h6176602_1
  - libsodium==1.0.16=h1bed415_0
  - jpeg==9b=h024ee3a_2
  - jupyter_console==5.2.0=py35h4044a63_1
  - sqlalchemy==1.2.11=py35h7b6447c_0
  - wcwidth==0.1.7=py35hcd08066_0
  - kiwisolver==1.0.1=py35hf484d3e_0
  - zstd==1.3.7=h0b5b093_0
  - libpng==1.6.37=hbc83047_0
  - pytables==3.4.4=py35ha205bf6_0
  - libxslt==1.1.33=h7d1a2b0_0
  - cffi==1.11.5=py35he75722e_1
  - libuuid==1.0.3=h1bed415_2
  - cython==0.28.5=py35hf484d3e_0
  - mkl_fft==1.0.6=py35h7dd41cf_0
  - jsonschema==2.6.0=py35h4395190_0
  - icu==58.2=h9c2bf20_1
  - xz==5.2.4=h14c3975_4
  - sip==4.19.8=py35hf484d3e_0
  - gstreamer==1.14.0=hb453b48_1
  - readline==7.0=h7b6447c_5
  - qt==5.9.6=h8703b6f_2
  - fontconfig==2.13.0=h9420a91_0
  - cryptography==2.3.1=py35hc365091_0
  - pandas==0.22.0=py35hf484d3e_0
  - blosc==1.16.3=hd408876_0
  - statsmodels==0.9.0=py35h3010b51_0
  - freetype==2.9.1=h8a8886c_1
  - matplotlib==3.0.0=py35h5429711_0
  - scipy==1.1.0=py35hd20e5f9_0
  - expat==2.2.6=he6710b0_0
  - numpy-base==1.14.6=py35h81de0dd_4
  - pygpu==0.7.6=py35h3010b51_0
  - gxx_impl_linux-64==7.3.0=hdf63c60_1
  - gmp==6.1.2=h6c8ec71_1
  - mkl-service==1.1.2=py35h90e4bf4_5
  - binutils_linux-64==2.31.1=h6176602_7
  - libstdcxx-ng==9.1.0=hdf63c60_0
  - libgfortran-ng==7.3.0=hdf63c60_0
  - ipython_genutils==0.2.0=py35hc9e07d0_0
  - contextlib2==0.5.5=py35h6690dba_0
  - cycler==0.10.0=py35hc4d5149_0
  - libxcb==1.13=h1bed415_1
  - intel-openmp==2019.4=243
  - openssl==1.0.2s=h7b6447c_0
  - zlib==1.2.11=h7b6447c_3
  - pyzmq==17.1.2=py35h14c3975_0
  - pyqt==5.9.2=py35h05f1152_2
  - prompt_toolkit==1.0.15=py35hc09de7a_0
  - pickleshare==0.7.4=py35hd57304d_0
  - click==6.7=py35h353a69f_0
  - libxml2==2.9.9=hea5a465_1
  - lzo==2.10=h49e0be7_2
  - gxx_linux-64==7.3.0=h553295d_7
  - testpath==0.3.1=py35had42eaf_0
  - libedit==3.1.20181209=hc058e9b_0
  - wrapt==1.10.11=py35h14c3975_2`

ISSUE WITH

20_autoencoders_for_conditional_risk_factors/06_conditional_autoencoder_for_asset_pricing_model.ipynb

HI STEFAN , I WANT TO THANK YOU FIRST FOR THE NICE WEBINAR FEW DAYS BACK AIRED ON YOUTUBE ,HOWVEVER I RAN ON TO THE BELOW ERROR (LIST INDEX OUT OF THE RANGE ) AND HERE ARE SOME SCREENSHOTS
Screenshot from 2020-10-13 22-36-51
Screenshot from 2020-10-13 22-36-24
YOU WILL NOTICE THAT IN THE SECOND SCREEN SHOT THAT THE ERROR IS COMING FROM THE LINES OR DATES WHICH ONE OF THEM IS OUT OF LINE WITH THE OTHER BETWEEN TRAINING AND TESTING
YOUR INPUT IS HIGHLY APPRECIATED
BEST REGARDS

07_linear_models - Ridge and Lasso regression runs forever

Hi Stefan,
Ridge and Lasso regression take lot to time to execute with a big list of warning
Ridge Regression: Wall time: 25min 26s
Lasso: My laptop stop responding almost after 1 hrs and I have to reboot it

My Env:
Windows 10
System Type: x64-based PC
Processor(s): 1 Processor(s) Installed.
[01]: Intel64 Family 6 Model 78 Stepping 3 GenuineIntel ~2601 Mhz
Python: Python 3.6.10
Virtual Env: ml4trading
Warnings:

C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\pipeline.py:331: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.
  Xt = transform.transform(Xt)
C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\preprocessing\data.py:625: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.
  return self.partial_fit(X, y)
C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\base.py:465: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.
  return self.fit(X, y, **fit_params).transform(X)
C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\linear_model\coordinate_descent.py:492: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
  ConvergenceWarning)
C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\pipeline.py:331: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.
  Xt = transform.transform(Xt)
......


Error running NASDAQ File

AttributeError Traceback (most recent call last)
in
1 file_name = may_be_download(FTP_URL + SOURCE_FILE)
----> 2 date = file_name.split('.')[0]

AttributeError: 'PosixPath' object has no attribute 'split'

NameError Traceback (most recent call last)
in
----> 1 message_labels = (df.loc[:, ['message_type', 'notes']]
2 .dropna()
3 .rename(columns={'notes': 'name'}))
4 message_labels.name = (message_labels.name
5 .str.lower()

NameError: name 'df' is not defined

LDA model converges poorly due to bad hyperparameters

In the notebook lda_with_sklearn, the LDA model predicts only 3 topics for documents in 5 classes.

By performing hyperparameter optimization with sklearn.model_selection.GridSearchCV, I was able to determine the following:

  • Root cause is badly chosen min_df and max_df parameters to the TfidfVectorizer instance.
  • Setting max_df = 0.11 and min_df = 0.026 produces excellent results.

My code and analysis are in my public Github repo here. You have my permission to use my code and analysis; I would only request the courtesy of being credited for any contribution I make toward your final product. (Thanks!)

Ch11_06_alphalens_signals_quality

Hi,

In Ch11, Notebook 06_alphalens_signals_quality.

test_tickers = best_predictions.index.get_level_values('ticker').unique()
trade_prices = get_trade_prices(test_tickers)
trade_prices.info()

Variable best_predictions is not defined before its usage. Please advice. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.