qmj's People
qmj's Issues
Library qmjdata does not automatically load when loading qmj
Loading up qmj does not auto-load qmjdata, preventing me from doing
library(qmj)
data(companies) # Or equivalent
Will look into getting both to load when one or the other loads. Either that, or explicitly informing the user that the data is in a separate package "qmjdata"
Documentation for get_companies is unintuitive in explaining how it works.
The documentation implies the function automagically retrieves the data, formats it as a text file, and then reads it in. I intend to edit this to make the requisite steps clearer.
Working on this now. Will close when I submit the pull request.
?qmj leads to "No documentation for 'qmj' in specified packages and libraries.
??qmj returns the error:
Error in vignette_type(Outfile) : Vignette product �NA� does not have a known filename extension (�NA�)
Phantom Bugs # 17 and # 18
get_prices - quantmod getSymbols function changed
Providing a function to clean temporary data
Specifically, this concern exists:
If I start get_prices, stop halfway through, and then resume get_prices a week later, get_prices will find the old temporary data and then resume the download. However, the range of stock prices will now differ between the data sets, corrupting our results to some degree.
Working on writing a function that cleans out all qmj temporary data in the temp directory should the user call it.
Issue: Financials Data Table Overfilled with Erroneous Data, Missing Some Gathered Data
get_info or tidyinfo is handling some data badly. Possibly incorrectly inserting anomalous data.
Off-hand Thought: Dealing With Missing Information
In cases of NA or INF results for a calculation (almost always the result of missing information), our current way of coping with the result is by scaling it, and then setting certain values to 0; mostly in cases where the 0 value would have no effect on the resultant quality or quality component score.
Option A to Maximize Number of Quality Scores Produced: Produce component sub-scores Where Possible, Ignore Missing sub-scores
I'm concerned that companies with either questionable filings or missing data are not penalized, and the scaled values provide z-scores for companies which provide a maximal amount of data, a generally desirable attribute.
As a contrived example. Rigged Company A can produce documents which lead to an enormous growth score, but by creative accounting, gives us just enough information in the other categories to assign a neutral score.
Rigged Company A is thus judged as high quality, and a brief overview of the quality data set does not clearly reveal the lack of information.
On top of that, high-accuracy companies are "penalized" as their component sub-scores are judged only relative to other high-accuracy companies. The under-performers are removed, so their z-scores are absolutely lower to reflect that.
Option B to Maximize Accuracy: When a sub-value is missing, simply fail to produce the component score, and thus the quality score.
I'll have to double check what we do with specific sub-values, but following this method cuts us out of roughly 400 companies.
qmj package documentation file is out of date
If this were 1984, it would refer to un-datasets.
Do we want to set up a (relatively painless) way of updating/retrieving the companies from the Russell 3000 Index?
If memory serves me correctly, parsing the original data directly/programmatically is extremely painful, given the .pdf encoding that the Russell 3000 company list is saved as.
Given that the list only updates once a year, and that it's (relatively) simple to create a data frame like what we expect as input, I also wouldn't say this is crucial, but it's food for thought.
By the way, what I did to get the original list that we're currently using is to copy and paste the text into notepad, remove by hand some unwanted artifacts (typically something on the order of a "page end" marker), and then parsed the result with a line or two of R. Practically, the user would likely need to copy the text into an appropriate file and then call our function in order to produce the data frame.
get_companies() regex cuts out several companies when directly copying and pasting from the Component List
Specifically, when this issue pops up: TRIPADVISOR INC TRIPAs of 06/26/2015 Russell Indexes.
We're parsing for lines that are entirely capitalized, so TRIP isn't read in as a company. Should be easy to fix. Just get rid of the excess chunk of a line after reading in an "As of"
Observation: get_prices is slow to aggregate the various chunks of raw price data into a single data object
Slow enough that I wondered if R had either crashed or frozen.
A quick stop-gap measure to ensure the user's aware that work is happening is to set up a notifier telling the user when so-and-so is processed.
Speeding up the process should occur later down the line when more serious issues have been dealt with.
Worth Repeating explanation of Russell 3000 in Prices and Financials data documentation?
This is from an earlier push to the qmjdata repo, but one of the larger changes I implemented was to replace the detailed descriptions of the Russell 3000 Index in the prices and financials data sets with the single sentence "For more information on the Russell 3000 Index and why it was chosen, please see {link to companies data set}"
I don't see the repeated description of the Russell 3000 Index to be critical to understanding the Prices/Financials data sets, aside from mentioning that it's the source for the chosen companies.
However, I'm having second thoughts at the idea of an individual needing to go to another "page" if they wanted to find out more about the Index.
Your thoughts?
Impose consistency across variable names and function names
A non-small chunk of this is my fault.
There's code/variables that are lower-to-upper camel case, e.g., startDate
And there's also a large amount of code that uses underscore separation, e.g., start_date
Once the bigger things are taken care of, I'll go back and rewrite variables/e.t.c. to follow one style strictly, likely underscore separation.
If statement in market_data
Specifically this statement:
if(length(which(financials$TCSO < 0))) {
stop("Negative TCSO exists.")
}
Was in the midst of updating documentation, and I'm not quite sure I understand the stop message, or why TCSO is specifically chosen. Can someone explain it to me?
For Case Study: Reducing the Number of Companies for which we produce No Quality Score
Here's a small snippet of companies for which we produce no quality scores. The current number is 385. (I've yet to seriously tamper with how we handle missing data, which may reduce the list somewhat)
Taking AAC as a sample, with a side-by-side comparison of a company we do produce a quality score for (AMH), the main issue appears to be the absence of key figures.
The main issue appears to be the absence of a couple key figures. Growth, Payouts, and Profitability all use gross profits in their calculations, for example. Since quantmod currently only allows us to source our data from Google, it'll be worth looking into what figures are "reasonably implied" when not explicitly given. (Yahoo Finance, for example, does give gross profits of AAC as equal to total revenue).
Safety I'm unsure of. I'll be looking at this over the next few days as reference in order to try to reduce the number of non-quantified companies.
The back of my head tells me that the code should be robust enough that missing a single key figure here and there should still be able to produce a rough score, so something else may be up. I'll comment here and (ideally) resolve the issue once I address any/all reasonable means of producing a quality score for AAC.
Missing data?
Hi - this is a really useful package for exploration, thanks for making it!
However, I've been going through the readme and noticed that:
#And more detailed data sets into what makes up quality
data(profitability)
data(growth)
data(payouts)
data(safety)
does not seem to work. Is this data available?
Thanks,
Alex
Cleaning up tidy_prices
qmjdata is not available
In RStudio, running under Ubuntu, I entered the commands:
library(devtools)
install_github("anttsou/qmj")
The response to the latter command was:
ERROR: dependency ‘qmjdata’ is not available for package ‘qmj’
Installation failed: Command failed (1)
What is the correct procedure for installing qmj?
-John
[email protected]
README markdown file for github repo is badly, badly out of date
Some data sets no longer exist. Have not yet checked function statuses.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.