quantopian / pyfolio Goto Github PK
View Code? Open in Web Editor NEWPortfolio and risk analytics in Python
Home Page: https://quantopian.github.io/pyfolio
License: Apache License 2.0
Portfolio and risk analytics in Python
Home Page: https://quantopian.github.io/pyfolio
License: Apache License 2.0
simple version that doesn't include a benchmark or MAR (Minimum Acceptable Return).
So its just: (Total Profit %) / (Total Losses %)
For the time being, quant_notebooks
will depend on the internal quant_utils
repo. Once things stabilize over here we should make a concerted effort to port all our NBs over to use this new library. The overlapping parts in quant_utils
will then get removed.
New things should, however, go into quantrisk.
seeing this on the research server, using the tearsheet_quantrisk nb.
ValueError: List of boxplot statistics and `positions` values must have same the length
Repro expression:
rets, pos, txn_daily = quantrisk.internals.get_and_analyze_algo('5586028584f4829e9800025e',
backtest_min_years=5,
plot_risk_factors=True,
include_positions=True)
Error stack trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-f997b0a20ee9> in <module>()
2 backtest_min_years=5,
3 plot_risk_factors=True,
----> 4 include_positions=True)
/opt/code/quantrisk/quantrisk/internals.pyc in get_and_analyze_algo(*args, **kwargs)
281 fetcher_urls=fetcher_urls,
282 algo_create_date=algo_create_date,
--> 283 cone_std=cone_std, bayesian=bayesian)
284
285 return df_rets, df_pos, df_txn
/opt/code/quantrisk/quantrisk/tears.pyc in create_full_tear_sheet(df_rets, df_pos, df_txn, gross_lev, fetcher_urls, algo_create_date, bayesian, backtest_days_pct, cone_std)
227 benchmark2_rets = utils.get_symbol_rets('IEF') # 7-10yr Bond ETF.
228
--> 229 create_returns_tear_sheet(df_rets, algo_create_date=algo_create_date, backtest_days_pct=backtest_days_pct, cone_std=cone_std, benchmark_rets=benchmark_rets, benchmark2_rets=benchmark2_rets)
230
231 create_interesting_times_tear_sheet(df_rets, benchmark_rets=benchmark_rets)
/opt/code/quantrisk/quantrisk/tears.pyc in create_returns_tear_sheet(df_rets, algo_create_date, backtest_days_pct, cone_std, benchmark_rets, benchmark2_rets)
116 ax=ax_daily_similarity_no_var_no_mean)
117
--> 118 plotting.plot_return_quantiles(df_rets, df_weekly, df_monthly, ax=ax_return_quantiles)
119
120
/opt/code/quantrisk/quantrisk/plotting.pyc in plot_return_quantiles(df_rets, df_weekly, df_monthly, ax, **kwargs)
595 sns.boxplot([df_rets, df_weekly, df_monthly],
596 names=['daily', 'weekly', 'monthly'],
--> 597 ax=ax, **kwargs)
598 ax.set_title('Return quantiles')
599 return ax
/opt/miniconda/lib/python2.7/site-packages/seaborn/categorical.pyc in boxplot(x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, fliersize, linewidth, whis, notch, ax, **kwargs)
1621 kwargs.update(dict(whis=whis, notch=notch))
1622
-> 1623 plotter.plot(ax, kwargs)
1624 return ax
1625
/opt/miniconda/lib/python2.7/site-packages/seaborn/categorical.pyc in plot(self, ax, boxplot_kws)
516 def plot(self, ax, boxplot_kws):
517 """Make the plot."""
--> 518 self.draw_boxplot(ax, boxplot_kws)
519 self.annotate_axes(ax)
520 if self.orient == "h":
/opt/miniconda/lib/python2.7/site-packages/seaborn/categorical.pyc in draw_boxplot(self, ax, kws)
453 positions=[i],
454 widths=self.width,
--> 455 **kws)
456 color = self.colors[i]
457 self.restyle_boxplot(artist_dict, color, kws)
/opt/miniconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in boxplot(self, x, notch, sym, vert, whis, positions, widths, patch_artist, bootstrap, usermedians, conf_intervals, meanline, showmeans, showcaps, showbox, showfliers, boxprops, labels, flierprops, medianprops, meanprops, capprops, whiskerprops, manage_xticks)
3116 meanline=meanline, showfliers=showfliers,
3117 capprops=capprops, whiskerprops=whiskerprops,
-> 3118 manage_xticks=manage_xticks)
3119 return artists
3120
/opt/miniconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in bxp(self, bxpstats, positions, widths, vert, patch_artist, shownotches, showmeans, showcaps, showbox, showfliers, boxprops, whiskerprops, flierprops, medianprops, capprops, meanprops, meanline, manage_xticks)
3383 positions = list(xrange(1, N + 1))
3384 elif len(positions) != N:
-> 3385 raise ValueError(datashape_message.format("positions"))
3386
3387 # width
ValueError: List of boxplot statistics and `positions` values must have same the length
get_backtest()
on the quantopian research platform (QRP) returns backtest results, positions and transactions. For quantrisk
to be useful on the QRP, it has to work with that format.
I imagine this could go 2 ways. Either we switch internally to always use that format if it's convenient, or we transform whatever get_backtest()
returns.
Most of the code that actually creates the tearsheet lives in https://github.com/quantopian/quant_notebooks/blob/master/analyses/algo_tear_sheet_contest.ipynb. We should refactor that NB in a major way so that the one huge function is broken up into individual plots. Then the tear sheet would just call the individual functions.
This is a darn legacy typo on my part in the spelling :(
https://en.wikipedia.org/wiki/Calmar_ratio
Gus, giving it to you since I figure you know best as to how renaming will affect dependencies.
Once quantrisk
is in an installable state, we should add it to research so that we can start to remove chunks from quant_utils
that are overlapping.
We want two example notebooks, one that uses a single stock and one that uses a zipline algo.
First plot in returns tearsheet. I tried to fix this myself but I wasn't sure of an easy way.
Gonna assign to @justinlent, but if you're not sure/don't have time I can take another stab.
Use e.g. autopep8
to make the syntax correspond to pep8.
Also, the functions use different naming conventions, change all to use underscore_style
rather than camelCase
.
@twiecki I want to propose we change the default returns 'style' parameter used for annual_return to be "compound" instead of "calendar". Calendar isn't very conventional (its just simple arithmetic annual return), and was originally implemented just to match zipline results. Here's an example showing how different they can be using the benchmark_rets timeseries in tearsheet_standalone. Returning the geometric compounded value, which is what 'style=compound' returns simply makes more sense I think for a publicly facing API.
Given annual_return is called in the calculation for quite a few other performance statistics, this can make a huge impact downstream (especially over df_rets that span many years), so I think we should think about it. Maybe you can take a quick look at the exact implementation of the code for a sanity check as well.
Thoughts?
I'm seeing this locally on master. This is not a problem on the internal research server. So 1 of 2 things is going on:
I've copied in some images below, but now I'm noticing that nearly all my plots don't have x-axis labels. Even the x-axis dates from the main cumulative returns plot are missing.
somehow the title and y-axis got labeled "alpha". as well as some extraneous code that computed alpha was merged into the body that we don't need right now. I'm commenting out what I think is wrong and committed here for other's review in this commit:
becac6c
If all looks good we can get rid of the commented lines
first off we need to define which sids we want to use to proxy these - i think i said I'd do that, but if @justinlent wants to weigh in please do!
Traditional single factor betas: Equity, Bonds, Credit, Gold, Crude Oil, Volatility
It will be a bit tricky in the interim period where we still have both libraries side-by-side. We definitely don't want to have to fix bugs twice.
One proposal is to first remove the functions in quant_utils
that are overlapping, and then do a selective import from quantrisk
.
If a function with the same name and call signature now exists in quantrisk
, remove it from quant_utils
and add an alias. For example:
cum_returns
exists in both, quantrisk/timeseries.py
as well as quant_utils/timeseries.py
. Remove cum_returns
from quant_utils/timeseries.py
and add a from quantrisk.timeseries import cum_returns
. That way the imports from the NBs will continue to work. If the name changed, add an alias with the old name (e.g. if we before had the name cumReturns
we'd do from quantrisk.timeseries import cum_returns as cumReturns
.
When there is negative cash that should likely be obvious to the user
Install quantrisk as a module onto the quantopian research platform. I don't think we need to blacklist anything.
Not sure who could help us with that, any idea @KarenRubin?
Example: https://github.com/quantopian/zipline/blob/master/zipline/algorithm.py#L1 For every python file.
Will probably just require us to adjust print statements.
I already went through and deleted some obvious cases but I bet there is more stale code lying around.
just assuming equal-weight of each algo for phase 1 implementation is acceptable
Currently, we have max_drawdown and get_max_draw_down, the latter likely being the better version.
Allow users to pass in different amounts of benchmarks besides 2, most likely just in the cumulative returns plot
The more each function is tested the better.
The question arises as to how to define ground truth. zipline
does this via an excel
spreadsheet and we could copy that approach but it's a bit cumbersome. Finding some small cases where the truth can be calculated manually would be a better first step.
Once that is in place we should activate travis
.
here is one example algo id: 557e96eb8ba119ea30000409
IndexError: index out of bounds
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-2-e2266cc245fd> in <module>()
3 # B algo: 54349eefcd2e3f7f57000025
4 #a, b, c = internals.get_and_analyze_algo('54349eefcd2e3f7f57000025', contest='MEGARUN_1');
----> 5 a, b, c = internals.get_and_analyze_algo('557e96eb8ba119ea30000409', backtest_min_years=5, cone_std=1.0);
/Users/jlent/github_projects/quantrisk/quantrisk/internals.pyc in get_and_analyze_algo(algo_id, contest, cone_std, plot_risk_factors, algo_live_date, include_positions, backtest_min_years, backtest_max_years)
283
284 tears.create_full_tear_sheet(df_rets, df_pos=df_pos, df_txn=df_txn_daily, gross_lev=gross_lev, fetcher_urls=fetcher_urls,
--> 285 algo_create_date=algo_create_date, cone_std=cone_std)
286
287 return df_rets, df_pos, df_txn_daily
/Users/jlent/github_projects/quantrisk/quantrisk/tears.pyc in create_full_tear_sheet(df_rets, df_pos, df_txn, gross_lev, fetcher_urls, algo_create_date, backtest_days_pct, cone_std)
111 backtest_days_pct=0.5, cone_std=1.0):
112
--> 113 create_returns_tear_sheet(df_rets, algo_create_date=algo_create_date, backtest_days_pct=backtest_days_pct, cone_std=cone_std)
114
115 if df_pos is not None:
/Users/jlent/github_projects/quantrisk/quantrisk/tears.pyc in create_returns_tear_sheet(df_rets, algo_create_date, backtest_days_pct, cone_std)
36 print '\n'
37
---> 38 plotting.show_perf_stats(df_rets, algo_create_date, benchmark_rets)
39
40 plotting.plot_rolling_returns(
/Users/jlent/github_projects/quantrisk/quantrisk/plotting.pyc in show_perf_stats(df_rets, algo_create_date, benchmark_rets)
299
300 diff_pct = timeseries.out_of_sample_vs_in_sample_returns_kde(timeseries.cum_returns(df_rets_backtest , 1.0),
--> 301 timeseries.cum_returns(df_rets_live, 1.0) )
302
303 consistency_pct = int( 100*(1.0 - diff_pct) )
/Users/jlent/github_projects/quantrisk/quantrisk/timeseries.pyc in cum_returns(df_rets, starting_value)
64 # Note that we can't add that ourselves as we don't know which dt
65 # to use.
---> 66 if pd.isnull(df_rets.iloc[0]):
67 df_rets.iloc[0] = 0.
68
/Users/jlent/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
1215 return self._getitem_tuple(key)
1216 else:
-> 1217 return self._getitem_axis(key, axis=0)
1218
1219 def _getitem_axis(self, key, axis=0):
/Users/jlent/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
1506 self._is_valid_integer(key, axis)
1507
-> 1508 return self._get_loc(key, axis=axis)
1509
1510 def _convert_to_indexer(self, obj, axis=0, is_setter=False):
/Users/jlent/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _get_loc(self, key, axis)
90
91 def _get_loc(self, key, axis=0):
---> 92 return self.obj._ixs(key, axis=axis)
93
94 def _slice(self, obj, axis=0, kind=None):
/Users/jlent/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in _ixs(self, i, axis)
486 values = self.values
487 if isinstance(values, np.ndarray):
--> 488 return _index.get_value_at(values, i)
489 else:
490 return values[i]
pandas/index.pyx in pandas.index.get_value_at (pandas/index.c:2358)()
pandas/src/util.pxd in util.get_value_at (pandas/index.c:15287)()
IndexError: index out of bounds
Talked to @justinlent yesterday and he said there were some updates to the cone code that have not made into quantrisk yet.
need to find a way to link up MS asset classifications, or some other external data source to do this one.
Sectors (10-GIC levels to start, or whatever similar we have from Morningstar)
Table: Per year and overall sector weight longs, shorts, net, absolute and relative performance of each
Plot (timeseries): portfolio weight in each sector over time
Table: Top and Bottom 10 holdings by contribution to performance ever, and per year (long, short, overall) - performance is over the time position was held in portfolio.
Performance, weight and relative contribution for top 10 holdings, ranked from largest contribution to smallest
"best" and "worst" positions basically
with the full list of holdings from all time is a large list (>10 or so) it doesn't show up nicely in the tearsheet.
Might be useful to have a breakdown of win/loss percentage
Some functions are only useful to people with access to quantopian's DB. These should be distilled out into a separate file. Later, we can worry about how these internal tools are made available.
Ran this command:
rets, pos, txn_daily = quantrisk.internals.get_and_analyze_algo('5552d8b7c4b238fb42000578',
backtest_min_years=10,
plot_risk_factors=True,
include_positions=True,
contest='all_contest_entries')
Got this traceback:
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-40-6670bc8a3ecd> in <module>()
3 backtest_min_years=5,
4 plot_risk_factors=True,
----> 5 include_positions=True)
/opt/code/quantrisk/quantrisk/internals.py in get_and_analyze_algo(*args, **kwargs)
295
296 tears.create_full_tear_sheet(df_rets, df_pos=df_pos, df_txn=df_txn, gross_lev=gross_lev, fetcher_urls=fetcher_urls,
--> 297 algo_create_date=algo_create_date, cone_std=cone_std)
298
299 return df_rets, df_pos, df_txn
/opt/code/quantrisk/quantrisk/tears.py in create_full_tear_sheet(df_rets, df_pos, df_txn, gross_lev, fetcher_urls, algo_create_date, backtest_days_pct, cone_std)
114
115 if df_pos is not None:
--> 116 create_position_tear_sheet(df_rets, df_pos, gross_lev=gross_lev)
117
118 if df_txn is not None:
/opt/code/quantrisk/quantrisk/tears.py in create_position_tear_sheet(df_rets, df_pos_val, gross_lev)
89 df_pos_alloc = positions.get_portfolio_alloc(df_pos_val)
90
---> 91 plotting.plot_exposures(df_cum_rets, df_pos_alloc)
92
93 plotting.show_and_plot_top_positions(df_cum_rets, df_pos_alloc)
/opt/code/quantrisk/quantrisk/plotting.py in plot_exposures(df_cum_rets, df_pos_alloc)
448 df_long_short = positions.get_long_short_pos(df_pos_alloc)
449 df_long_short.plot(
--> 450 kind='area', color=['lightblue', 'green', 'coral'], alpha=1.0)
451 plt.xlim((df_cum_rets.index[0], df_cum_rets.index[-1]))
452 plt.title("Long/Short/Cash Exposure")
/opt/miniconda/lib/python2.7/site-packages/pandas/tools/plotting.pyc in plot_frame(data, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)
2486 yerr=yerr, xerr=xerr,
2487 secondary_y=secondary_y, sort_columns=sort_columns,
-> 2488 **kwds)
2489
2490
/opt/miniconda/lib/python2.7/site-packages/pandas/tools/plotting.pyc in _plot(data, x, y, subplots, ax, kind, **kwds)
2322 plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
2323
-> 2324 plot_obj.generate()
2325 plot_obj.draw()
2326 return plot_obj.result
/opt/miniconda/lib/python2.7/site-packages/pandas/tools/plotting.pyc in generate(self)
912 self._compute_plot_data()
913 self._setup_subplots()
--> 914 self._make_plot()
915 self._add_table()
916 self._make_legend()
/opt/miniconda/lib/python2.7/site-packages/pandas/tools/plotting.pyc in _make_plot(self)
1623 kwds['label'] = label
1624
-> 1625 newlines = plotf(ax, x, y, style=style, column_num=i, **kwds)
1626 self._add_legend_handle(newlines[0], label, index=i)
1627
/opt/miniconda/lib/python2.7/site-packages/pandas/tools/plotting.pyc in plotf(ax, x, y, style, column_num, **kwds)
1743 if column_num == 0:
1744 self._initialize_prior(len(self.data))
-> 1745 y_values = self._get_stacked_values(y, kwds['label'])
1746 lines = f(ax, x, y_values, style=style, **kwds)
1747
/opt/miniconda/lib/python2.7/site-packages/pandas/tools/plotting.pyc in _get_stacked_values(self, y, label)
1638 else:
1639 raise ValueError('When stacked is True, each column must be either all positive or negative.'
-> 1640 '{0} contains both positive and negative values'.format(label))
1641 else:
1642 return y
ValueError: When stacked is True, each column must be either all positive or negative.cash contains both positive and negative values
Currently, every plotting function creates it's own figure and then plots an axes object (i.e. what is actually the plot, figure is just the window that contains it) inside. This is fairly inflexible as maybe you want one large layout and plot the individual subplots in the appropriate axes defined outside of the plotting function.
A common pattern used e.g. by seaborn
is to have every plotting function take an ax
kws (e.g. https://github.com/mwaskom/seaborn/blob/master/seaborn/distributions.py#L33). That way the outside can control where the plot is going to be placed.
The pattern would be:
fig, (ax1, ax2, ax3) = plt.subplots(3, sharex=True)
quantrisk.plotting.plot_returns(df, ax=ax1)
quantrisk.plotting.plot_beta(df, ax=ax2)
Another nice pattern is to also take in kws
that are passed on to individual plotting functions (https://github.com/mwaskom/seaborn/blob/master/seaborn/distributions.py#L56). For example, if I want to change the line-width of the plot, we wouldn't have to have an explicit linewidth
kwarg in our function definition, but rather could do:
quantriks.plotting.plot_returns(df, plot_kws={'linewidth': 3})
Every function should have a doc string (in the numpy
format https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt) explaining what it does, the arguments it takes and what those do as well as what it returns. Preferably with a small example code snippet.
We should look into directly pulling this data since the csv can get out of date really quickly. Maybe we supply a default csv (if we can) as well as expose a function to pull the data from the university page
We also might want to confirm if there are any licenses or issues with redistributing it. @twiecki maybe you have some experience understanding redistribution of data in the OSS world?
might make more sense visually is have the long+short+cash plot y-axis values match up with the gross exposure y-axis
assume you hedge the algo with its rolling 6-month SPY beta everyday to see if it improves sharpe ratio of the algo. plot it on the Cumulative Returns chart along with the algo and the benchmarks
the number of holdings plotted daily can be difficult to read over a long backtest period.
It will be good to get a sense, visually, how closely the two cones are with one another depending on the algo.
it should be pretty easy to do just by passing the ax to the existing cumulative performance figure
Function definitions and calls need to be tidied up and standardized.
Old code that was commented out that can be removed should be removed.
plotly is building on a cool dashboard (that can also be used without plotly) here: https://github.com/chriddyp/dash
Could be great to integrate that.
its useful info, if its available. In screenshot, sounds like the out of sample date was less than 5 days from the end of the available backtest data from the test harness run, and thus an automatical 85%/15% split was used (which is totally fine, and a good default). But if the out of sample date is available lets just show it
same logic as issue #12 the daily plot is hard to read
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.