stefan-jansen / machine-learning-for-trading Goto Github PK
View Code? Open in Web Editor NEWCode for Machine Learning for Algorithmic Trading, 2nd edition.
Home Page: https://ml4trading.io
Code for Machine Learning for Algorithmic Trading, 2nd edition.
Home Page: https://ml4trading.io
My system is windows, when i do "conda env create -f environment_linux.yml " , results in "ResolvePackageNotFound: - arrow-cpp 0.11.1 py36h5c3f529_1" . Can this code use in windows ? Thanks.
The file data/create_datasets.ipynb has the following line
df = pd.read_csv('us_equities_meta_data.csv')
However, this csv file is not present. I can't seem to find it anywhere in the distribution. Can you tell me where to locate it?
When I install via docker on windows10.
After input zipline ingest seem like permission to mkdir in /home/packt/ml4t is demied.
How can I add permission to this folder.
(ml4t-zipline) packt@6463f5cb63ce:~$ zipline ingest
Traceback (most recent call last):
File "/opt/conda/envs/ml4t-zipline/bin/zipline", line 8, in
sys.exit(main())
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 1256, in invoke
Command.invoke(self, ctx)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/main.py", line 60, in main
os.environ,
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/utils/run_algo.py", line 249, in load_extensions
pth.ensure_file(default_extension_path)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/utils/paths.py", line 58, in ensure_file
ensure_directory_containing(path)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/utils/paths.py", line 45, in ensure_directory_containing
ensure_directory(os.path.dirname(path))
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/site-packages/zipline/utils/paths.py", line 30, in ensure_directory
os.makedirs(path)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/os.py", line 231, in makedirs
makedirs(head, mode, exist_ok)
File "/opt/conda/envs/ml4t-zipline/lib/python3.5/os.py", line 241, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/packt/ml4t/data'
Good morning,
I'm trying to run the spyder files about open table folders, I have a mac with Catalina and I have a problem with geckodriver, it seems there is a bug
https://firefox-source-docs.mozilla.org/testing/geckodriver/Notarization.html
mozilla/geckodriver#1629
I have installed geckodrive with homebrew but it doesn't work anyway
I receive this spyder console error: WebDriverException: 'geckodriver' executable needs to be in PATH.
Please help me to resolve this issue
Ty Ale
Thank you for your kind reply.
I
found the environment_windows.yml from the first_edition branch and installed it well and am using it well.
I have one additional question.
Where can I find the zipline installation files for Windows in Python 3.5 environments to use Chapter 4 through 5?
In this chapter folder, i could only find file for linux. Do you provide an installation file for Windows os ?
Hi @silent0506,
thank you for your interest in the book. You can just delete these packages form the
environment.yml
file. However, thefirst_edition
branch also contains a file tailored to Windows, you may want to use this instead.The second edition has been released a few weeks ago and contains a lot of additional material; I would highly recommend you review the notebooks in this repo as well, you may find it quite useful.
thank
Originally posted by @silent0506 in #36 (comment)
Good morning,
I correctly run the sa_selenium.py on spyder
The code correctly scrap the seeking alpha website but I can't save the results to csv file
I have write my folder path in rows 23, 24, 25
I have modify row 32 using 'html.parser'
I have add to row 89 the geckodriver path in folder
I have modify rows 114 and 115
I think I'm wrong with the csv paths or files declaring please help me to solv that problem
Ty Ale
import re
from pathlib import Path
from random import random
from time import sleep
from urllib.parse import urljoin
import pandas as pd
from bs4 import BeautifulSoup
from furl import furl
from selenium import webdriver
transcript_path = Path('transcripts')
def store_result(meta, participants, content):
"""Save parse content to csv"""
path = transcript_path / 'parsed' / meta['symbol']
if not path.exists():
path.mkdir(parents=True, exist_ok=True)
pd.DataFrame(content, columns=['speaker', 'q&a', 'content']).to_csv(path / '/Users/alessiomontani/Documents/0_Python/CODE/Machine-Learning-for-Algorithmic-Trading-Second-Edition-master/03_alternative_data/02_earnings_calls copy/content.csv', index=False)
pd.DataFrame(participants, columns=['type', 'name']).to_csv(path / '/Users/alessiomontani/Documents/0_Python/CODE/Machine-Learning-for-Algorithmic-Trading-Second-Edition-master/03_alternative_data/02_earnings_calls copy/participants.csv', index=False)
pd.Series(meta).to_csv(path / '/Users/alessiomontani/Documents/0_Python/CODE/Machine-Learning-for-Algorithmic-Trading-Second-Edition-master/03_alternative_data/02_earnings_calls copy/earnings.csv')
def parse_html(html):
"""Main html parser function"""
date_pattern = re.compile(r'(\d{2})-(\d{2})-(\d{2})')
quarter_pattern = re.compile(r'(\bQ\d\b)')
soup = BeautifulSoup(html, 'html.parser')
meta, participants, content = {}, [], []
h1 = soup.find('h1', itemprop='headline')
if h1 is None:
return
h1 = h1.text
meta['company'] = h1[:h1.find('(')].strip()
meta['symbol'] = h1[h1.find('(') + 1:h1.find(')')]
title = soup.find('div', class_='title')
if title is None:
return
title = title.text
print(title)
match = date_pattern.search(title)
if match:
m, d, y = match.groups()
meta['month'] = int(m)
meta['day'] = int(d)
meta['year'] = int(y)
match = quarter_pattern.search(title)
if match:
meta['quarter'] = match.group(0)
qa = 0
speaker_types = ['Executives', 'Analysts']
for header in [p.parent for p in soup.find_all('strong')]:
text = header.text.strip()
if text.lower().startswith('copyright'):
continue
elif text.lower().startswith('question-and'):
qa = 1
continue
elif any([type in text for type in speaker_types]):
for participant in header.find_next_siblings('p'):
if participant.find('strong'):
break
else:
participants.append([text, participant.text])
else:
p = []
for participant in header.find_next_siblings('p'):
if participant.find('strong'):
break
else:
p.append(participant.text)
content.append([header.text, qa, '\n'.join(p)])
return meta, participants, content
SA_URL = 'https://seekingalpha.com/'
TRANSCRIPT = re.compile('Earnings Call Transcript')
next_page = True
page = 1
driver = webdriver.Firefox(executable_path='/Users/alessiomontani/Documents/0_Python/CODE/Machine-Learning-for-Algorithmic-Trading-Second-Edition-master/03_alternative_data/02_earnings_calls copy/geckodriver')
while next_page:
print(f'Page: {page}')
url = f'{SA_URL}/earnings/earnings-call-transcripts/{page}'
driver.get(urljoin(SA_URL, url))
sleep(8 + (random() - .5) * 2)
response = driver.page_source
page += 1
soup = BeautifulSoup(response, 'html.parser')
links = soup.find_all(name='a', string=TRANSCRIPT)
if len(links) == 0:
next_page = False
else:
for link in links:
transcript_url = link.attrs.get('href')
article_url = furl(urljoin(SA_URL, transcript_url)).add({'part': 'single'})
driver.get(article_url.url)
html = driver.page_source
result = parse_html(html)
if result is not None:
meta, participants, content = result
meta['link'] = link
store_result(meta, participants, content)
sleep(8 + (random() - .5) * 2)
driver.close()
earnings = pd.read_csv('earnings.csv')
#print(earnings)
I've downloaded Docker for windows but have encountered a couple issues when attempting to follow the steps here: https://github.com/PacktPublishing/Machine-Learning-for-Algorithmic-Trading-Second-Edition/blob/44c03418255a196c74b698c7d8a1cb82d5c7fa5f/installation/README.md.
in #2 it is unclear what repo we should be cloning ("git clone...?") as I cannot find any "starter repo" on github or Docker hub. My best guess was to clone the entire: https://github.com/PacktPublishing/Machine-Learning-for-Algorithmic-Trading-Second-Edition
in #4 when I attempt to run:
docker run -it -v $(pwd):/home/packt/ml4t -p 8888:8888 -e QUANDL_API_KEY=XXXXXX --name ml4t appliedai/packt:latest bash
It throws the following error:
docker: invalid reference format.
any feedback is much appreciated!
Cannot repeat 01_parse_itch_order_flow_messages.ipynb. Numerous errors, some could resolve by changing names of variables, but cannot in
The following code processes the binary file and produces the parsed orders stored by message type:
I receive the error:
error Traceback (most recent call last)
in
17 # read & store message
18 record = data.read(message_size - 1)
---> 19 message = message_fields[message_type]._make(unpack(fstring[message_type], record))
20 messages[message_type].append(message)
21
error: bad char in struct format
The week 7 practice notebook 04_lda_with_sklearn.ipynb contains a badly compiled pandas DataFrame. It groups the test set predictions by ground truth topic, which reorders the rows. However, it leaves the headings and articles in their original, ungrouped order as it concatenates them into the DataFrame. The rows are thus jumbled chaotically.
test_assignments = test_opt_eval.groupby(level='topic').idxmax(axis=1)
test_assignments = test_assignments.reset_index(-1, drop=True).to_frame('predicted').reset_index()
test_assignments['heading'] = test_docs.heading.values
test_assignments['article'] = test_docs.article.values
test_assignments.head(6)
# Output:
# topic predicted heading article
# 0 Business Topic 4 Kilroy launches 'Veritas' party Ex-BBC chat show host and East Midlands MEP R...
# 1 Business Topic 4 Radcliffe eyes hard line on drugs Paula Radcliffe has called for all athletes f...
# 2 Business Topic 4 S Korean consumers spending again South Korea looks set to sustain its revival ...
# 3 Business Topic 4 Quiksilver moves for Rossignol Shares of Skis Rossignol, the world's largest...
# 4 Business Topic 4 Britons fed up with net service A survey conducted by PC Pro Magazine has rev...
# 5 Business Topic 4 Scissor Sisters triumph at Brits US band Scissor Sisters led the winners at th.
According to the code's output, the ground truth classification for all 6 articles is "Business." However, the classifications are in fact "Politics," "Sport," "Business," "Business," "Tech," and "Entertainment."
Here's code that fixes the problem:
test_assignments = pd.DataFrame(test_eval.idxmax(axis=1), columns=["prediction"])
test_assignments['heading'] = test_docs.heading.values
test_assignments['article'] = test_docs.article.values
test_assignments.head()
# Output
# topic prediction heading article
# Politics Topic 5 Kilroy launches 'Veritas' party Ex-BBC chat show host and East Midlands MEP R...
# Sport Topic 3 Radcliffe eyes hard line on drugs Paula Radcliffe has called for all athletes f...
# Business Topic 2 S Korean consumers spending again South Korea looks set to sustain its revival ...
# Business Topic 2 Quiksilver moves for Rossignol Shares of Skis Rossignol, the world's largest...
# Tech Topic 4 Britons fed up with net service A survey conducted by PC Pro Magazine has rev...
You have my permission to use my code and analysis; I would only request the courtesy of being credited for any contribution I make toward your final product. (Thanks!)
Hi:
I created the assets.h5
file with data/create_datasets.ipynb
and it looks fine:
$ h5ls -f assets.h5
/fred Group
/quandl Group
/sp500 Group
/us_equities Group
However, 04_alpha_factor_research/00_data/feature_engineering.ipynb
throws the following error when it tries to read and filter the prices data set:
DATA_STORE = '../../data/assets.h5'
with pd.HDFStore(DATA_STORE) as store:
prices = store['quandl/wiki/prices'].loc[idx['2000':'2018', :], 'adj_close'].unstack('ticker')
stocks = store['us_equities/stocks'].loc[:, ['marketcap', 'ipoyear', 'sector']]
[...]
UnsortedIndexError: 'MultiIndex slicing requires the index to be lexsorted: slicing on levels [0], lexsort depth 0'
The prices data seems fine -- it just appears to be the filtering which is breaking.
I am still fairly new to Pandas, but I got the filter to work by explicitly creating a date range:
prices = store['quandl/wiki/prices'].loc[ pd.date_range(start='1/1/2000', end='12/31/2018'), idx['adj_close'] ].unstack('ticker')
Thanks!
Jeffrey
when I run docker run -it -v $(pwd):/home/packt/ml4t -p 8888:8888 -e QUANDL_API_KEY=myapi*- --name ml4t appliedai/packt:latest bash
from command line after installing docker desktop and allocating 4gb ram i get this:
docker: error during connect: Post https://192.168.99.100:2376/v1.40/containers/create?name=ml4t: dial tcp 192.168.99.100:2376: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
See 'docker run --help'.
any help to get me on the right track would be greatly appreciated, so far i've just been trying to run the code in pycharm or jupyter notebooks with abit of trouble
chapter 5 - 05_strategy_evaluation
02_backtest_with_pf_optimization.ipynb throws error - Cannot compare tz-naive and tz-aware timestamps
backtest = run_algorithm(start=start,
end=end,
initialize=initialize,
before_trading_start=before_trading_start,
bundle='quandl',
capital_base=capital_base)
`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
4 before_trading_start=before_trading_start,
5 bundle='quandl',
----> 6 capital_base=capital_base)
7
8 # backtest = run_algorithm(start=start,
C:\ProgramData\Anaconda3\envs\env_zipline\lib\site-packages\zipline\utils\run_algo.py in run_algorithm(start, end, initialize, capital_base, handle_data, before_trading_start, analyze, data_frequency, data, bundle, bundle_timestamp, trading_calendar, metrics_set, default_extension, extensions, strict_extensions, environ, blotter)
428 local_namespace=False,
429 environ=environ,
--> 430 blotter=blotter,
431 )
C:\ProgramData\Anaconda3\envs\env_zipline\lib\site-packages\zipline\utils\run_algo.py in _run(handle_data, initialize, before_trading_start, analyze, algofile, algotext, defines, data_frequency, capital_base, data, bundle, bundle_timestamp, start, end, output, trading_calendar, print_algo, metrics_set, local_namespace, environ, blotter)
212 capital_base=capital_base,
213 data_frequency=data_frequency,
--> 214 trading_calendar=trading_calendar,
215 ),
216 metrics_set=metrics_set,
C:\ProgramData\Anaconda3\envs\env_zipline\lib\site-packages\zipline\utils\factory.py in create_simulation_parameters(year, start, end, capital_base, num_days, data_frequency, emission_rate, trading_calendar)
60 data_frequency=data_frequency,
61 emission_rate=emission_rate,
---> 62 trading_calendar=trading_calendar,
63 )
64
C:\ProgramData\Anaconda3\envs\env_zipline\lib\site-packages\zipline\finance\trading.py in init(self, start_session, end_session, trading_calendar, capital_base, emission_rate, data_frequency, arena)
146 assert start_session <= end_session,
147 "Period start falls after period end."
--> 148 assert start_session <= trading_calendar.last_trading_session,
149 "Period start falls after the last known trading day."
150 assert end_session >= trading_calendar.first_trading_session, \
pandas/_libs/tslib.pyx in pandas._libs.tslib._Timestamp.richcmp()
pandas/_libs/tslib.pyx in pandas._libs.tslib._Timestamp._assert_tzawareness_compat()
TypeError: Cannot compare tz-naive and tz-aware timestamps`
Additional details:
Env - Windows 10
Python - python=3.5.6
Pandas - pandas=0.22.0
Are there source codes for this particular function?
Just giving us an environment.yml and not explaining how the code works don't seem like a good way to learn and generalize...
HI there,
I'm attempting to do a fresh install of the ml4t setup within docker desktop on windows 10 pro.
Having trouble with the docker run -it -v line you provided, even after downloading the directory and switching into that folder with "cd" whether it be the unzipped machine learning for trading/ folder or down to env/linux/ where the .yaml file is for zipline ect,
each time I run docker run -it -v $(pwd):/home/packt/ml4t -p 8888:8888 -e QUANDL_API_KEY= --name ml4t appliedai/packt:latest bash
i get back the system cannot find the file specified.
I have also linked the file within docker>resources
When I put in the docker getting started example "docker run" file into my cmd terminal I am able to get dockers "getting started" repo or whatever it is called no problem, but no matter what i try i just get system cannot find the file specified when trying to setup,
I've just wiped my whole machine to start fresh for this, I was thinking to set back up with linux mint because i was looking into someone elses issue of setting up linux no problem, but docker desktop isnt for linux so It just went with windows since i already upgrated to pro for it, but still facing this issue either way not sure what to do., I was trying to set up with andaconda last night but everytime I setup the zipline_env and activated I got hit with the giant list of missing dependancies so I was like okay I clearly need to setup the image in the docker environment to make it work, but not i just cant get the file located to do so,
Im super excited to get this set up though
I greatly appreciate your help with this and the insights you're sharing through the book I was wondering if you had a ebook or audio copy so I could listen as well to consumer the information in multiple ways, i just dont have any money to buy the ebook on packt right now or i would I could just see myself being able to slide in another 15-20 hours just listening to the book on top of what im already trying to consume, just thought id ask so i could consume in the most efficient amount of time as possible, this is my sole task/ goal right now.
Thanks again, have a good one.
At location 2646 "Useful pandas and NumPy methods" in the book it states "The notebook uses data generated by the get_data.py script in the data folder in the root directory of the GitHub repo and stored in HDF5 format for faster access".
Can't find the get_data.py script in this repo, missing? Assume it generates the access.h5 file used in many of the other chapters' notebooks.
conda env create -f environment_mac_osx.yml --force
Collecting package metadata: done
Solving environment: done
Preparing transaction: done
Verifying transaction: failed
RemoveError: 'requests' is a dependency of conda and cannot be removed from
conda's operating environment.
Hi Stefan, I am trying to test the mean revision strategy provided by you in chapter 05_strategy_evaluation, 01_backtest_with_trades.ipynb with my own custom bundle of Indian equity. It doesn't create any output for
returns, positions, transactions = extract_rets_pos_txn_from_zipline(backtest)
Any help is appreciated. The code looks as below
class MeanReversion(CustomFactor):
"""Compute ratio of latest monthly return to 12m average,
normalized by std dev of monthly returns"""
inputs = [Returns(window_length=MONTH)]
window_length = YEAR
def compute(self, today, assets, out, monthly_returns):
df = pd.DataFrame(monthly_returns)
out[:] = df.iloc[-1].sub(df.mean()).div(df.std())
def compute_factors():
"""Create factor pipeline incl. mean reversion,
filtered by 30d Dollar Volume; capture factor ranks"""
mean_reversion = MeanReversion()
dollar_volume = AverageDollarVolume(window_length=30)
return Pipeline(columns={'longs' : mean_reversion.bottom(N_LONGS),
'shorts' : mean_reversion.top(N_SHORTS),
'ranking': mean_reversion.rank(ascending=False)},
screen=dollar_volume.top(VOL_SCREEN))
def rebalance(context, data):
"""Compute long, short and obsolete holdings; place trade orders"""
factor_data = context.factor_data
assets = factor_data.index
longs = assets[factor_data.longs]
shorts = assets[factor_data.shorts]
divest = context.portfolio.positions.keys() - longs.union(shorts)
exec_trades(data, assets=divest, target_percent=0)
exec_trades(data, assets=longs, target_percent=1 / N_LONGS if N_LONGS else 0)
exec_trades(data, assets=shorts, target_percent=-1 / N_SHORTS if N_SHORTS else 0)
def exec_trades(data, assets, target_percent):
"""Place orders for assets using target portfolio percentage"""
for asset in assets:
if data.can_trade(asset) and not get_open_orders(asset):
order_target_percent(asset, target_percent)
def before_trading_start(context, data):
"""Run factor pipeline"""
context.factor_data = pipeline_output('factor_pipeline')
record(factor_data=context.factor_data.ranking)
assets = context.factor_data.index
record(prices=data.current(assets, 'price'))
def initialize(context):
"""Setup: register pipeline, schedule rebalancing,
and set trading params"""
set_benchmark(symbol('INFY'))
attach_pipeline(compute_factors(), 'factor_pipeline')
schedule_function(rebalance,
date_rules.week_start(),
time_rules.market_open(),)
set_commission(us_equities=commission.PerShare(cost=0.00075, min_trade_cost=.01))
set_slippage(us_equities=slippage.VolumeShareSlippage(volume_limit=0.0025, price_impact=0.01))
backtest = run_algorithm(start=start,
end=end,
initialize=initialize,
before_trading_start=before_trading_start,
capital_base=capital_base,
data_frequency = 'daily',
bundle= 'nse_data')
zipline_example.ipynb not working as zipline throws error:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I tried Stackoverflow solution, but did not help;
https://stackoverflow.com/questions/56957791/getting-jsondecodeerror-expecting-value-line-1-column-1-char-0-with-python/58661288#58661288
Tested the examples and they work fine on 1 GPU . i was trying to have it work on multiple GPUs ( 2 to 8 GPUs ).
Any suggestions to train and test on multiple GPUs ?
Thank you for sharing this awesome work.
I am using windows and following the dockers workflow, currently i am stuck at step 7 on opening the jupyter notebook.
(base) packt@1154ebfb6453:~/ ml4t$ conda activate ml4t
(ml4t) packt@1154ebfb6453:~/ml4t$ jupyter notebook --ip 0.0.0.0 --no-browser --allow-root
[I 09:37:37.174 NotebookApp] [nb_conda_kernels] enabled, 4 kernels found
[I 09:37:37.184 NotebookApp] Writing notebook server cookie secret to
/home/packt/.local/share/jupyter/runtime/notebook_cookie_secret
[I 09:37:39.115 NotebookApp] JupyterLab extension loaded from /opt/conda/envs/ml4t/lib/python3.7/site-packages/jupyterlab
[I 09:37:39.115 NotebookApp] JupyterLab application directory is /opt/conda/envs/ml4t/share/jupyter/lab
[I 09:37:39.118 NotebookApp] Serving notebooks from local directory: /home/packt/ml4t
[I 09:37:39.118 NotebookApp] The Jupyter Notebook is running at:
[I 09:37:39.118 NotebookApp] http://1154ebfb6453:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f
[I 09:37:39.118 NotebookApp] or http://127.0.0.1:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f
[I 09:37:39.118 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 09:37:39.122 NotebookApp]
To access the notebook, open this file in a browser:
file:///home/packt/.local/share/jupyter/runtime/nbserver-296-open.html
Or copy and paste one of these URLs:
http://1154ebfb6453:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f
or http://127.0.0.1:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f
i copy and paste this url http://127.0.0.1:8888/?token=ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f on the browser to open up jupyter notebook but was prompted to an authentication page as follows
I have tried the following but still fail to access the jupyternotebook
entering the authentication token code ffbf0a92a3c46f4d3cbbffab66a66d144dc3a23c0a93bb6f
running this command before running jupyter notebook --ip 0.0.0.0 --no-browser --allow-root jupyter notebook --NotebookApp.token='' to clear authentication
Thank you for your help.
I cannot run https://github.com/stefan-jansen/machine-learning-for-trading/blob/master/data/create_datasets.ipynb
I get a KeyError: 'code' in the section were we download the Wiki Prices Metadata.
It seems like that the link (https://www.quandl.com/databases/WIKIP/documentation) in section 3.1 is also pointing towards the prices data set while it should point towards a data set with meta data. The referenced data set does not contain the required 'code' and 'name' columns.
Has the meta data been moved?
Hi,
I'm trying to setup environment in GCP and AWS because my laptop power is small.
I succeeded in docker run.
But when it comes to 'zipline ingest', I can do on AWS, while in GCP there are error statement as below. .
PermissionError: [Errno 13] Permission denied: '/home/packt/ml4t/data/.zipline'
Do I have any solution?
KR
Hi I tried running the zipline backtest in chapter 12, and upon following the notebook code by code, I came across the following error:
"ValueError: Failed to find any assets with country_code 'US' that traded between 2016-01-13 00:00:00+00:00 and 2016-01-21 00:00:00+00:00.
This probably means that your asset db is old or that it has incorrect country/exchange metadata."
This happened to be so for any stard_date I tried, the first week will always raise this ValueError.
I also followed the book's instruction on ingesting quandl data.
Any suggestions?
Thanks in advance!
Also, for chapter 11, how is the 'stooq' bundle used for backtesting ingested?
Hello. I recently bought a book(first edition) and am reading it well. Thank you.
I'm going to practice the code.
Using 'conda env create -f environment.yml'
I was trying to install my environment, but the following error occurred:
What should I do? (I am a Windows os user.)
`ResolvePackageNotFound:
binutils_impl_linux-64=2.28.1
gxx_impl_linux-64=7.2.0
gxx_linux-64=7.2.0
libgcc-ng=8.2.0
libstdcxx-ng=8.2.0
readline=7.0
gcc_linux-64=7.2.0
gmp=6.1.2
libuuid=1.0.3
gstreamer=1.14.0
graphviz=2.40.1
dbus=1.13.2
binutils_linux-64=7.2.0
expat=2.2.6
libgfortran-ng=7.3.0
gcc_impl_linux-64=7.2.0
ncurses=6.1
gst-plugins-base=1.14.0
libedit=3.1.20170329`
within the file feature_engineering I keep getting the below error when trying to 'pd.slideIndex'.
not sure what it means and how to overcome this issue.
TypeError: unhashable type: 'slice'
I've been working on deconstructing pgfortfolio which I found you pointed to in the read me in 22. reinforcement learning, i decided to wander off into the packt library and came across a new RL book releases sept 2020 and listened to it with a google chrome extension fairly quickly so it allowed me to gain some high inspiration for trader, where I actually wrote out my first hypothesis which was that reinforcement learning will allow me to learn the strategy which the agent uses so I could at least implement this for my own strategy and stop loosing money, as oct 1st i wasted away the last of my october funds on poor signals which gave me ultra motivation to commit to my fullest capability.
Anyways I know its not your code but i was really hoping you might be kind enough to share the code which you used for pgfortfolio to receive the 4 fold returns so i could play with it further and try to deconstruct it into my own system, it would really help if i could help it working from the start, i've been taking notes from their system for the majority of the week to understand the architecture i only tried to implement the code a couple times, but i have note figured it out yet as of now,
was really hoping you might be able to help me with a quick consult to get me on the right track so i could see what code you put in what order. so i could then pull up pgfortfolio in a jupyter notebook and disect it,
thanks so much once again
Hi, Stefan.
I'm working on a Russian translation of your book for a Saint-Petersburg Publ. and found some omissions in the text of ch 17 on DL (formulas) and ch. 18 on CNN (picture #1) for filters so far.
Can you indicate, where to find the amended versions of the chapters or advise on what to do?
By the way, I found your book very enlightening, an excellent add-on to the Advances in FML by Marcos de Prado.
B.R.,
Andrey Logunov
I've got an issue when converting timestamp to integer so changed:
buy_per_min.timestamp = buy_per_min.timestamp.add(utc_offset).values.astype(int)
Same for sell_per_min and trades_per_min
But now I receive error
OSError: [Errno 22] Invalid argument
in the last plot in the line
xticks = [datetime.fromtimestamp(ts / 1e9).strftime('%H:%M') for ts in ax.get_xticks()]
as all timestamps are negative.
Hi i was wondering if you could point me in the right direction im getting this error, I thought it might have been something to do with loading quandl data but It seemed fine when I loaded in another notebook, I thought I had it all working then my computer went to sleep over night and it would not work anymore, I am in the ml4t-dl conda environment using docker, inside of jupyter lab, i had to first reinstall the tensorflow estimator last night from an error it was throwing then this was the next, I can't get past it, my quandl api key works well with other notebooks reloading, I thought you may be more familiar with this code than I am, thank you kindly
`INFO:trading_env:trading_env` logger started.
INFO:trading_env:loading data for AAPL...
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-10-f07228b2136b> in <module>
----> 1 trading_environment = gym.make('trading-v0')
2 trading_environment.env.trading_days = trading_days
3 trading_environment.env.trading_cost_bps = 1e-3
4 trading_environment.env.time_cost_bps = 1e-4
5 trading_environment.env.ticker = 'AAPL'
/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/gym/envs/registration.py in make(id, **kwargs)
143
144 def make(id, **kwargs):
--> 145 return registry.make(id, **kwargs)
146
147 def spec(id):
/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/gym/envs/registration.py in make(self, path, **kwargs)
88 logger.info('Making new env: %s', path)
89 spec = self.spec(path)
---> 90 env = spec.make(**kwargs)
91 # We used to have people override _reset/_step rather than
92 # reset/step. Set _gym_disable_underscore_compat = True on
/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/gym/envs/registration.py in make(self, **kwargs)
58 else:
59 cls = load(self.entry_point)
---> 60 env = cls(**_kwargs)
61
62 # Make the environment aware of which spec it came from.
~/ml4t/22_deep_reinforcement_learning/trading_env.py in __init__(self, trading_days, trading_cost_bps, time_cost_bps, ticker)
234 self.time_cost_bps = time_cost_bps
235 self.data_source = DataSource(trading_days=self.trading_days,
--> 236 ticker=ticker)
237 self.simulator = TradingSimulator(steps=self.trading_days,
238 trading_cost_bps=self.trading_cost_bps,
~/ml4t/22_deep_reinforcement_learning/trading_env.py in __init__(self, trading_days, ticker, normalize)
62 self.trading_days = trading_days
63 self.normalize = normalize
---> 64 self.data = self.load_data()
65 self.preprocess_data()
66 self.min_values = self.data.min()
~/ml4t/22_deep_reinforcement_learning/trading_env.py in load_data(self)
73 idx = pd.IndexSlice
74 with pd.HDFStore('../data/assets.h5') as store:
---> 75 df = (store['quandl/wiki/prices']
76 .loc[idx[:, self.ticker],
77 ['adj_close', 'adj_volume', 'adj_low', 'adj_high']]
/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/pandas/io/pytables.py in __getitem__(self, key)
551
552 def __getitem__(self, key: str):
--> 553 return self.get(key)
554
555 def __setitem__(self, key: str, value):
/opt/conda/envs/ml4t-dl/lib/python3.7/site-packages/pandas/io/pytables.py in get(self, key)
744 group = self.get_node(key)
745 if group is None:
--> 746 raise KeyError(f"No object named {key} in the file")
747 return self._read_group(group)
748
KeyError: 'No object named quandl/wiki/prices in the file'
Hi Stefen,
I have gone trough the algo, (lagged returns, etc) but at the end I did not understand how you have identified the alpha factor. I m basically missing the conclusion.
thks Angelo
Hi, I managed to get the data to redownload correctly but i've run into this syntax error moving forward on the block of code under the "train agent" section I receive this error:
`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
13 0.0 if done else 1.0)
14 if ddqn.train:
---> 15 ddqn.experience_replay()
16 if done:
17 break
in experience_replay(self)
107
108 q_values = self.online_network.predict_on_batch(states)
--> 109 q_values[[self.idx, actions]] = targets
110
111 loss = self.online_network.train_on_batch(x=states, y=q_values)
TypeError: 'tensorflow.python.framework.ops.EagerTensor' object does not support item assignment`
I appreciate you're guidance greatly, thanks again.
Hello Stefan,
I bought your book on Packt, I trying to do the exercises, but I am a bit confuse, where is the file create_message_spec.py? i didn't understand what to do.
Chapter 2 - 01_build_itch_order_book.ipynb
I was running create_stooq_data.ipynb
markets = {'jp': ['tse stocks'],
'us': ['nasdaq etfs', 'nasdaq stocks', 'nyse etfs', 'nyse stocks', 'nysemkt stocks']
}
frequency = 'daily'
idx = pd.IndexSlice
for market, asset_classes in markets.items():
for asset_class in asset_classes:
print(f'\n{asset_class}')
prices, tickers = get_stooq_prices_and_tickers(frequency=frequency,market=market,asset_class=asset_class)
prices = prices.sort_index().loc[idx[:, '2000': '2019'], :]
names = prices.index.names
prices = (prices
.reset_index()
.drop_duplicates()
.set_index(names)
.sort_index())
print('\nNo. of observations per asset')
print(prices.groupby('ticker').size().describe())
key = f'stooq/{market}/{asset_class.replace(" ", "/")}/'
print(prices.info(null_counts=True))
prices.to_hdf(DATA_STORE, key + 'prices', format='t')
print(tickers.info())
tickers.to_hdf(DATA_STORE, key + 'tickers', format='t')
got this error.
ValueError Traceback (most recent call last)
in
9 for asset_class in asset_classes:
10 print(f'\n{asset_class}')
---> 11 prices, tickers = get_stooq_prices_and_tickers(frequency=frequency,market=market,asset_class=asset_class)
12
13 prices = prices.sort_index().loc[idx[:, '2000': '2019'], :]
in get_stooq_prices_and_tickers(frequency, market, asset_class)
38 file.unlink()
39
---> 40 prices = (pd.concat(prices, ignore_index=True)
41 .rename(columns=str.lower)
42 .set_index(['ticker', date_label])
/opt/conda/envs/ml4t/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
279 verify_integrity=verify_integrity,
280 copy=copy,
--> 281 sort=sort,
282 )
283
/opt/conda/envs/ml4t/lib/python3.7/site-packages/pandas/core/reshape/concat.py in init(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
327
328 if len(objs) == 0:
--> 329 raise ValueError("No objects to concatenate")
330
331 if keys is None:
ValueError: No objects to concatenate
Any help is appreciated.
Hi Stefan,
What are the 2nd edition publishing and availability date? Is it the end of this month? Eagerly waiting for it.
Thanks
Stefan,
Is it fair to say that readers of the book using windows should focus on the docker installation method? I tried creating a conda environment on a windows machine using the linux .yml file and got the below:
ResolvePackageNotFound:
libxkbcommon=0.10.0
libgcc-ng=9.2.0
gmp=6.2.0
gxx_impl_linux-64=7.5.0
libedit=3.1.20191231
gxx_linux-64=7.5.0
libgfortran-ng=7.5.0
ld_impl_linux-64=2.34
dbus=1.13.14
graphviz=2.42.3
nspr=4.25
libuuid=2.32.1
libxgboost=1.0.2
libtool=2.4.6
readline=8.0
gcc_linux-64=7.5.0
_openmp_mutex=4.5
nss=3.47
gst-plugins-base=1.14.5
gcc_impl_linux-64=7.5.0
ta-lib-base=0.4.0
binutils_linux-64=2.34
py-xgboost=1.0.2
libgomp=9.2.0
ncurses=6.2
libstdcxx-ng=9.2.0
gstreamer=1.14.5
xgboost=1.0.2
binutils_impl_linux-64=2.34
As you mentioned in an earlier comment, I deleted these from the linux file and tried again to create the conda environment but it appeared to result in way too many conflicts. When do you expect to have a tested windows .yml file (you mentioned this was something you were looking into) or should I just proceed with the docker installation approach?
Thanks for your help - I'm looking forward to working through the book!
Hello Stefan,
I am getting the following error in create_stooq_data.ipynb while trying to download US assets from stooq. Please note I am on windows 10. Appreciate your help.
Thanks
Sabir
tse stocks
stooq/data/daily/jp/tse stocks
500
1000
1500
2000
2500
3000
3500
No. of observations per asset
count 3672.000000
mean 2804.119281
std 1176.615453
min 1.000000
25% 2146.000000
50% 3041.000000
75% 3621.000000
max 4905.000000
dtype: float64
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 10296726 entries, (1301.JP, 2005-03-22 00:00:00) to (9997.JP, 2019-12-30 00:00:00)
Data columns (total 5 columns):
open 10296726 non-null float64
high 10296726 non-null float64
low 10296726 non-null float64
close 10296726 non-null float64
volume 10296726 non-null int64
dtypes: float64(4), int64(1)
memory usage: 432.1+ MB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3719 entries, 0 to 3718
Data columns (total 2 columns):
ticker 3719 non-null object
name 3719 non-null object
dtypes: object(2)
memory usage: 58.2+ KB
None
nasdaq etfs
stooq/data/daily/us/nasdaq etfs
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-22-0e99a64c5acf> in <module>
11 prices, tickers = get_stooq_prices_and_tickers(frequency=frequency,
12 market=market,
---> 13 asset_class=asset_class)
14
15 prices = prices.sort_index().loc[idx[:, '2000': '2019'], :]
<ipython-input-21-65e60f1965c0> in get_stooq_prices_and_tickers(frequency, market, asset_class)
39
40 # print(prices)
---> 41 prices = (pd.concat(prices, ignore_index=True)
42 .rename(columns=str.lower)
43 .set_index(['ticker', date_label])
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
253 verify_integrity=verify_integrity,
254 copy=copy,
--> 255 sort=sort,
256 )
257
~\AppData\Roaming\Python\Python37\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy, sort)
302
303 if len(objs) == 0:
--> 304 raise ValueError("No objects to concatenate")
305
306 if keys is None:
ValueError: No objects to concatenate
The code on analyzing cross-validation section in rf and boosting chapters are very misleading and unclear. Why are there different labels on stored metrics and evaluated metrics on the hdf file? Why are different hdf files used when runningcv/storing and evaluating?
It seems like the author has run the cross-validation code snippet somewhere else and pasted it on the repo, resulting in different file and metric names?
when running this code I keep getting this error.
docs = pd.DataFrame(doc_list, columns=['Category', 'Heading', 'Article'])
docs.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2226 entries, 0 to 2225
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Category 2226 non-null object
1 Heading 2226 non-null object
2 Article 2226 non-null object
dtypes: object(3)
memory usage: 52.3+ KB
train_docs, test_docs = train_test_split(docs,
stratify=docs.Category,
test_size=50,
random_state=42)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-7ce1db248c17> in <module>
----> 1 train_docs, test_docs = train_test_split(docs,
2 stratify=docs.Category,
3 test_size=50,
4 random_state=42)
~\miniconda3\envs\torch\lib\site-packages\sklearn\model_selection\_split.py in train_test_split(*arrays, **options)
2150 random_state=random_state)
2151
-> 2152 train, test = next(cv.split(X=arrays[0], y=stratify))
2153
2154 return list(chain.from_iterable((_safe_indexing(a, train),
~\miniconda3\envs\torch\lib\site-packages\sklearn\model_selection\_split.py in split(self, X, y, groups)
1339 """
1340 X, y, groups = indexable(X, y, groups)
-> 1341 for train, test in self._iter_indices(X, y, groups):
1342 yield train, test
1343
~\miniconda3\envs\torch\lib\site-packages\sklearn\model_selection\_split.py in _iter_indices(self, X, y, groups)
1666 class_counts = np.bincount(y_indices)
1667 if np.min(class_counts) < 2:
-> 1668 raise ValueError("The least populated class in y has only 1"
1669 " member, which is too few. The minimum"
1670 " number of groups for any class cannot"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
Hi,
I recently purchased your book. I am working through through 4th chapter on alpha factor research. The get_data script in packt repo has a get_wiki_constitutents function. I believe the quandl link for that function might have changed, is there a new link i can try?
https://www.quandl.com/api/v3/databases/WIKI/codes?api_key=<API_KEY>
The link for prices file works fine
Is it intentionally or some code is missing?
https://github.com/stefan-jansen/machine-learning-for-trading/tree/master/20_reinforcement_learning
Stefan - congrats on the new book. I am trying to replicate the environments you have provided but the directory has installations for Linux only. Zipline on the other hand is very fiddly. I will attempt the docker images but would have preferred to work on my local computer. Thanks for your custom and insights.
Hello
I have tried to create an environment using the environment.yml file provided but I get the following error:
Solving environment: failed
ResolvePackageNotFound:
gcc_linux-64=7.2.0
binutils_impl_linux-64=2.28.1
gxx_linux-64=7.2.0
gst-plugins-base=1.14.0
gstreamer=1.14.0
gmp=6.1.2
pango=1.42.4
dbus=1.13.2
gcc_impl_linux-64=7.2.0
binutils_linux-64=7.2.0
gxx_impl_linux-64=7.2.0
ncurses=6.1
libgcc-ng=8.2.0
libstdcxx-ng=8.2.0
libuuid=1.0.3
readline=7.0
expat=2.2.6
fribidi=1.0.5
libgfortran-ng=7.3.0
graphviz=2.40.1
libedit=3.1.20170329
is there any way I can fix this? thanks for the help
Hi Stefan,
Eagerly waiting for the 2nd edition of your book. Meanwhile, I am working with 1st edition and your github repo.
I am using window 10 env.
I setup the env_zipline
as per environment.yml under
machine-learning-for-trading/02_market_and_fundamental_data/03_data_providers/05_zipline/
received error for following packages and I removed them from environment.yml. It works and no issues
- libgfortran-ng=7.3.0
- libgfortran=3.0.0
- ncurses=6.2
- readline=7.0
- gst-plugins-base=1.14.5
- libstdcxx-ng=9.2.0
- libgomp=9.2.0
- gstreamer=1.14.5
- dbus=1.13.12
- requests=2.14.2
- libgcc-ng=9.2.0
- libuuid=2.32.1
- glib=2.63.1
- _openmp_mutex=4.5
- libedit=3.1.20181209
'''
however, there is another environment.yml
machine-learning-for-trading/05_strategy_evaluation/02_risk_metrics_pyfolio/ to setup backtesting
env when I ran this on my window system it throws a big list of packages not found (listed below). Appreacite your help
` - markupsafe==1.0=py35h14c3975_1
- lz4-c==1.8.1.2=h14c3975_0
- gcc_linux-64==7.3.0=h553295d_7
- theano==1.0.2=py35h6bb024c_0
- tk==8.6.8=hbc83047_0
- nbformat==4.4.0=py35h12e6e07_0
- h5py==2.8.0=py35h989c5e5_3
- libgcc-ng==9.1.0=hdf63c60_0
- traitlets==4.3.2=py35ha522a97_0
- dbus==1.13.6=h746ee38_0
- mkl_random==1.0.1=py35h4414c95_1
- tornado==5.1.1=py35h7b6447c_0
- cyordereddict==1.0.0=py35h470a237_2
- glib==2.56.2=hd408876_0
- ncurses==6.1=he6710b0_1
- bzip2==1.0.6=h14c3975_5
- hdf5==1.10.2=hba1933b_1
- sqlite==3.28.0=h7b6447c_0
- gst-plugins-base==1.14.0=hbbd80ab_1
- libffi==3.2.1=hd88cf55_4
- pcre==8.43=he6710b0_0
- ptyprocess==0.6.0=py35_0
- libgpuarray==0.7.6=h14c3975_0
- numexpr==2.6.8=py35hd89afb7_0
- zeromq==4.2.5=hf484d3e_1
- bottleneck==1.2.1=py35h035aef0_1
- python==3.5.6=hc3d631a_0
- snappy==1.1.7=hbae5bb6_3
- mistune==0.8.3=py35h14c3975_1
- lxml==4.2.5=py35hefd8a0e_0
- gcc_impl_linux-64==7.3.0=habb00fd_1
- numpy==1.14.6=py35h3b04361_4
- scikit-learn==0.20.0=py35h4989274_1
- binutils_impl_linux-64==2.31.1=h6176602_1
- libsodium==1.0.16=h1bed415_0
- jpeg==9b=h024ee3a_2
- jupyter_console==5.2.0=py35h4044a63_1
- sqlalchemy==1.2.11=py35h7b6447c_0
- wcwidth==0.1.7=py35hcd08066_0
- kiwisolver==1.0.1=py35hf484d3e_0
- zstd==1.3.7=h0b5b093_0
- libpng==1.6.37=hbc83047_0
- pytables==3.4.4=py35ha205bf6_0
- libxslt==1.1.33=h7d1a2b0_0
- cffi==1.11.5=py35he75722e_1
- libuuid==1.0.3=h1bed415_2
- cython==0.28.5=py35hf484d3e_0
- mkl_fft==1.0.6=py35h7dd41cf_0
- jsonschema==2.6.0=py35h4395190_0
- icu==58.2=h9c2bf20_1
- xz==5.2.4=h14c3975_4
- sip==4.19.8=py35hf484d3e_0
- gstreamer==1.14.0=hb453b48_1
- readline==7.0=h7b6447c_5
- qt==5.9.6=h8703b6f_2
- fontconfig==2.13.0=h9420a91_0
- cryptography==2.3.1=py35hc365091_0
- pandas==0.22.0=py35hf484d3e_0
- blosc==1.16.3=hd408876_0
- statsmodels==0.9.0=py35h3010b51_0
- freetype==2.9.1=h8a8886c_1
- matplotlib==3.0.0=py35h5429711_0
- scipy==1.1.0=py35hd20e5f9_0
- expat==2.2.6=he6710b0_0
- numpy-base==1.14.6=py35h81de0dd_4
- pygpu==0.7.6=py35h3010b51_0
- gxx_impl_linux-64==7.3.0=hdf63c60_1
- gmp==6.1.2=h6c8ec71_1
- mkl-service==1.1.2=py35h90e4bf4_5
- binutils_linux-64==2.31.1=h6176602_7
- libstdcxx-ng==9.1.0=hdf63c60_0
- libgfortran-ng==7.3.0=hdf63c60_0
- ipython_genutils==0.2.0=py35hc9e07d0_0
- contextlib2==0.5.5=py35h6690dba_0
- cycler==0.10.0=py35hc4d5149_0
- libxcb==1.13=h1bed415_1
- intel-openmp==2019.4=243
- openssl==1.0.2s=h7b6447c_0
- zlib==1.2.11=h7b6447c_3
- pyzmq==17.1.2=py35h14c3975_0
- pyqt==5.9.2=py35h05f1152_2
- prompt_toolkit==1.0.15=py35hc09de7a_0
- pickleshare==0.7.4=py35hd57304d_0
- click==6.7=py35h353a69f_0
- libxml2==2.9.9=hea5a465_1
- lzo==2.10=h49e0be7_2
- gxx_linux-64==7.3.0=h553295d_7
- testpath==0.3.1=py35had42eaf_0
- libedit==3.1.20181209=hc058e9b_0
- wrapt==1.10.11=py35h14c3975_2`
20_autoencoders_for_conditional_risk_factors/06_conditional_autoencoder_for_asset_pricing_model.ipynb
HI STEFAN , I WANT TO THANK YOU FIRST FOR THE NICE WEBINAR FEW DAYS BACK AIRED ON YOUTUBE ,HOWVEVER I RAN ON TO THE BELOW ERROR (LIST INDEX OUT OF THE RANGE ) AND HERE ARE SOME SCREENSHOTS
YOU WILL NOTICE THAT IN THE SECOND SCREEN SHOT THAT THE ERROR IS COMING FROM THE LINES OR DATES WHICH ONE OF THEM IS OUT OF LINE WITH THE OTHER BETWEEN TRAINING AND TESTING
YOUR INPUT IS HIGHLY APPRECIATED
BEST REGARDS
Hi Stefan,
Ridge and Lasso regression take lot to time to execute with a big list of warning
Ridge Regression: Wall time: 25min 26s
Lasso: My laptop stop responding almost after 1 hrs and I have to reboot it
My Env:
Windows 10
System Type: x64-based PC
Processor(s): 1 Processor(s) Installed.
[01]: Intel64 Family 6 Model 78 Stepping 3 GenuineIntel ~2601 Mhz
Python: Python 3.6.10
Virtual Env: ml4trading
Warnings:
C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\pipeline.py:331: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.
Xt = transform.transform(Xt)
C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\preprocessing\data.py:625: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.
return self.partial_fit(X, y)
C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\base.py:465: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.
return self.fit(X, y, **fit_params).transform(X)
C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\linear_model\coordinate_descent.py:492: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
ConvergenceWarning)
C:\ProgramData\Anaconda3\envs\ml4trading\lib\site-packages\sklearn\pipeline.py:331: DataConversionWarning: Data with input dtype uint8, float64 were all converted to float64 by StandardScaler.
Xt = transform.transform(Xt)
......
AttributeError Traceback (most recent call last)
in
1 file_name = may_be_download(FTP_URL + SOURCE_FILE)
----> 2 date = file_name.split('.')[0]
AttributeError: 'PosixPath' object has no attribute 'split'
NameError Traceback (most recent call last)
in
----> 1 message_labels = (df.loc[:, ['message_type', 'notes']]
2 .dropna()
3 .rename(columns={'notes': 'name'}))
4 message_labels.name = (message_labels.name
5 .str.lower()
NameError: name 'df' is not defined
In the notebook lda_with_sklearn, the LDA model predicts only 3 topics for documents in 5 classes.
By performing hyperparameter optimization with sklearn.model_selection.GridSearchCV
, I was able to determine the following:
min_df
and max_df
parameters to the TfidfVectorizer
instance.max_df = 0.11
and min_df = 0.026
produces excellent results.My code and analysis are in my public Github repo here. You have my permission to use my code and analysis; I would only request the courtesy of being credited for any contribution I make toward your final product. (Thanks!)
Hi,
In Ch11, Notebook 06_alphalens_signals_quality.
test_tickers = best_predictions.index.get_level_values('ticker').unique()
trade_prices = get_trade_prices(test_tickers)
trade_prices.info()
Variable best_predictions is not defined before its usage. Please advice. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.