Giter Site home page Giter Site logo

qmj's People

Contributors

anttsou avatar davidkane9 avatar echoe95 avatar rynkwn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

qmj's Issues

Library qmjdata does not automatically load when loading qmj

Loading up qmj does not auto-load qmjdata, preventing me from doing
library(qmj)
data(companies) # Or equivalent

Will look into getting both to load when one or the other loads. Either that, or explicitly informing the user that the data is in a separate package "qmjdata"

Phantom Bugs # 17 and # 18

Issue #18 and Issue #17 bother me. Our data should be recent enough that I shouldn't be seeing these discrepancies behind my new data, and the old data.

Will think further on this, and will resolve both issues once I've come to terms with their conceptual existence.

Providing a function to clean temporary data

Specifically, this concern exists:

If I start get_prices, stop halfway through, and then resume get_prices a week later, get_prices will find the old temporary data and then resume the download. However, the range of stock prices will now differ between the data sets, corrupting our results to some degree.

Working on writing a function that cleans out all qmj temporary data in the temp directory should the user call it.

Off-hand Thought: Dealing With Missing Information

In cases of NA or INF results for a calculation (almost always the result of missing information), our current way of coping with the result is by scaling it, and then setting certain values to 0; mostly in cases where the 0 value would have no effect on the resultant quality or quality component score.

Option A to Maximize Number of Quality Scores Produced: Produce component sub-scores Where Possible, Ignore Missing sub-scores

I'm concerned that companies with either questionable filings or missing data are not penalized, and the scaled values provide z-scores for companies which provide a maximal amount of data, a generally desirable attribute.

As a contrived example. Rigged Company A can produce documents which lead to an enormous growth score, but by creative accounting, gives us just enough information in the other categories to assign a neutral score.

Rigged Company A is thus judged as high quality, and a brief overview of the quality data set does not clearly reveal the lack of information.

On top of that, high-accuracy companies are "penalized" as their component sub-scores are judged only relative to other high-accuracy companies. The under-performers are removed, so their z-scores are absolutely lower to reflect that.

Option B to Maximize Accuracy: When a sub-value is missing, simply fail to produce the component score, and thus the quality score.

I'll have to double check what we do with specific sub-values, but following this method cuts us out of roughly 400 companies.

Do we want to set up a (relatively painless) way of updating/retrieving the companies from the Russell 3000 Index?

If memory serves me correctly, parsing the original data directly/programmatically is extremely painful, given the .pdf encoding that the Russell 3000 company list is saved as.

Given that the list only updates once a year, and that it's (relatively) simple to create a data frame like what we expect as input, I also wouldn't say this is crucial, but it's food for thought.

By the way, what I did to get the original list that we're currently using is to copy and paste the text into notepad, remove by hand some unwanted artifacts (typically something on the order of a "page end" marker), and then parsed the result with a line or two of R. Practically, the user would likely need to copy the text into an appropriate file and then call our function in order to produce the data frame.

Worth Repeating explanation of Russell 3000 in Prices and Financials data documentation?

This is from an earlier push to the qmjdata repo, but one of the larger changes I implemented was to replace the detailed descriptions of the Russell 3000 Index in the prices and financials data sets with the single sentence "For more information on the Russell 3000 Index and why it was chosen, please see {link to companies data set}"

I don't see the repeated description of the Russell 3000 Index to be critical to understanding the Prices/Financials data sets, aside from mentioning that it's the source for the chosen companies.

However, I'm having second thoughts at the idea of an individual needing to go to another "page" if they wanted to find out more about the Index.

Your thoughts?

Impose consistency across variable names and function names

A non-small chunk of this is my fault.

There's code/variables that are lower-to-upper camel case, e.g., startDate
And there's also a large amount of code that uses underscore separation, e.g., start_date

Once the bigger things are taken care of, I'll go back and rewrite variables/e.t.c. to follow one style strictly, likely underscore separation.

If statement in market_data

Specifically this statement:

if(length(which(financials$TCSO < 0))) {
    stop("Negative TCSO exists.")
  }

Was in the midst of updating documentation, and I'm not quite sure I understand the stop message, or why TCSO is specifically chosen. Can someone explain it to me?

For Case Study: Reducing the Number of Companies for which we produce No Quality Score

Here's a small snippet of companies for which we produce no quality scores. The current number is 385. (I've yet to seriously tamper with how we handle missing data, which may reduce the list somewhat)

image

Taking AAC as a sample, with a side-by-side comparison of a company we do produce a quality score for (AMH), the main issue appears to be the absence of key figures.

INCOME STATEMENT:
image

image

BALANCE SHEET:
image

image

CASH FLOW:
image

image

The main issue appears to be the absence of a couple key figures. Growth, Payouts, and Profitability all use gross profits in their calculations, for example. Since quantmod currently only allows us to source our data from Google, it'll be worth looking into what figures are "reasonably implied" when not explicitly given. (Yahoo Finance, for example, does give gross profits of AAC as equal to total revenue).

Safety I'm unsure of. I'll be looking at this over the next few days as reference in order to try to reduce the number of non-quantified companies.

The back of my head tells me that the code should be robust enough that missing a single key figure here and there should still be able to produce a rough score, so something else may be up. I'll comment here and (ideally) resolve the issue once I address any/all reasonable means of producing a quality score for AAC.

Missing data?

Hi - this is a really useful package for exploration, thanks for making it!

However, I've been going through the readme and noticed that:

#And more detailed data sets into what makes up quality
data(profitability)
data(growth)
data(payouts)
data(safety)

does not seem to work. Is this data available?

Thanks,
Alex

qmjdata is not available

In RStudio, running under Ubuntu, I entered the commands:
library(devtools)
install_github("anttsou/qmj")

The response to the latter command was:
ERROR: dependency ‘qmjdata’ is not available for package ‘qmj’
Installation failed: Command failed (1)

What is the correct procedure for installing qmj?
-John
[email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.