Giter Site home page Giter Site logo

influentialinvestmentprediction's People

Contributors

aleahck avatar joyceccy avatar lingruivera avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

influentialinvestmentprediction's Issues

Project Proposal Peer Review

The project is going to consider the performance of companies that received large investments. Based on their proposal, there seems to be anecdotal evidence that some of these companies outperform how they would do based on the investment itself. They seek to see if this trend is just anecdotal or if there is statistical significance behind it.

Things I liked about the proposal:

  1. It seems like you are limiting your scope to just Goldman Sachs as an influential investor. I think this is good because it makes you have a clear focus and I think it might make your data collection process easier because you could just look through forms/filings for Goldman Sachs instead of for several different companies, which would all have data at different locations.
  2. I like that you seem to have clearly thought through what sorts of data you will need and where it will be located. This is a really important first step for what I think is a pretty ambitious and interesting project.
  3. I like the fact that you had a more interesting, nuanced project idea (influence by large investments) rather than just a boilerplate project about using lagged data to predict the stock's performance in the future. Your project seems like a nice twist (by adding in the details around getting a large investment) on the typical idea of predicting how a stock will perform.

Areas for improvement:

  1. With regard to the motivation for the project, do you have an evidence for "stock price raises beyond what can be directly attributed to the investment in the six months following the investment"? How can you tell this quantitatively, and will your model have knowledge of the expected amount the stock price should raise based on a particular investment by some big player in finance? I think this might be tricky to do.
  2. You said you're limiting yourselves to companies that Goldman Sachs has invested at least $1 million into. Is there any concern over whether you'll have enough data (or maybe too much)? I think you should delve into more detail on the expected size of the dataset in your project proposal.
  3. I think scraping large amounts of data from SEC filings might be more difficult than expected. Do you have a plan already for how you're going to do this or maybe have you experimented with something to do this already?

Midterm Peer Review by ar2293

Big data analysis can be very useful in financial applications and this group is trying to analyze the investment activities that relate to the stock price. It seems very reasonable to me as the stock price can be influenced by many factors, and investment activities should be a very crucial one.

The data cleaning and initial investment part is reasonable. The data cleaning step get rid of unnecessary data and leave only those are used in the modeling. The initial analysis did a linear fit, which suggests a positive trend. However, from my point of view the linear fit is not very obvious as most data points are clustered at the lower left corner of the plot.

There are also something can be improved for this midterm report: 1. The page number exceeds 3 pages and the requirement is at most 3 pages. 2. The first two plots are lack of labels, which makes it hard for the audience to understand.

Overall, the project seems very reasonable to me and I expect the team could go deep on the analysis part and have useful results at the end.

peer review

This project is trying to predict how Goldman Sach's investments can influence stock behavior. They are using data from SEC's 13-F-HR/A forms for GS and the performance data from Yahoo and Bloomberg.

-Topic is interesting
-I like how you had the distinct sections in the proposal such as the benefits and the feasibility of the project
-I like how you know exactly what data, where to find that data, and that you are actually getting the data yourself.

-Looking at just one company is good at first but could looking at just GS potentially lead to overfitting the mode/ Also, who do you have in mind when you say the model can be extended to other influential investment banks or individuals
-With your data collection, will your dataset be big enough/how big do you expect your dataset to be?
-Also is there a reason why 6 months was chosen? Will you not be looking at 9 or 12 months?

Midterm Peer Review

The project aims to predict the price movement of company stocks after Goldman Sachs heavily invests in them. The team is using data from the SEC database on Goldman Sachs' quarterly holdings in different companies, and the quarterly performance of these companies.

These are the few things I like about this project and midterm report:

  1. It looks like you put in a lot of thought and work to clean the data. It was especially interesting that you guys paid attention to the dataset details and realized that some companies may appear in some quarters, and not in others, so it was creative that you took care of this problem by doing a JOIN.
  2. It was also interesting that you came up with your own definition and research to see what it means to be an "influential" investment.
  3. Overall, I think this project is a very applicable to the real world, and really could help investors make informed decisions.

There are some things that can be improved upon for the next project:

  1. When you are talking about cleaning your data, it could be helpful to describe why you decided to ignore data about puts and calls on stocks, and holdings that were not shares. For people who are not very familiar with terms in finance, these additional details could be helpful. It would also be helpful to describe exactly what features your team is focusing on.
  2. Include axis titles, and graph titles. The part of the midterm report where you show what constitutes an "influential" investment would be clearer if the graphs were labelled appropriately. Additionally, I think it would also be helpful if you could give more intuition into why your team chose 200,000 as the lower bound.
  3. I think the linear model you get does not necessarily show a positive correlation between the holdings in the company and the stock price of the company. It seems like there is one outlier point that your linear regression model is heavily dependent on because most of the points are in the lower left hand corner of the graph. Also, I think your report should include more information on how you are planning to avoid underfitting/overfitting.

Some comments

This proposal aims to look at impacts of large investments from Goldman Sachs on investment receivers’ stock prices, by using data on quarterly holdings by Goldman Sachs and quarterly performance of the companies invested in. They already finished the first part of data collecting and plan to finish data collecting in two weeks.
There are several things I like about this proposal:

  1. The topic is interesting and meaningful.
  2. They collect data themselves, which is extra work but good experience.
  3. The proposal is clearly divided into four parts, which is easy to read.

However, there are some aspects I hope they can consider improving on.

  1. First of all, structurally, the proposal describes the objective, importance of this topic, data collection method, and feasibility of collecting data, the missing part is “why do you think the data can answer your question?”
  2. Actually without thinking carefully about “why do you think the data can answer your question?”, there would be some bias in the result if they only use those data series they mentioned to collect, because there are other elements which also affect stock prices of those companies. Failing to consider a complete pool of explainable variables will lead to biased predictions.
  3. Another part which is missing is what kind of modeling they plan to utilize. They mention “with this information obtained from this model,…” without specifying any modeling details.
  4. For writing, “Abstract” of this proposal is more like an introduction, or background and objective, since an abstract should be an outline of the whole project/paper, including objective, methodology, and key conclusions.
  5. In “Abstract”, this sentence “However, we have noticed a trend where these companies stock prices rise beyond what can be directly attributed to the investment in the six months following the investment” is very confusing, even counterintuitive for the objective of this proposal. When I read it, I thought they would like to do some research on the reasons which contribute to the fluctuations of stock prices other than large investments these companies receive from Goldman Sachs, which turned out not true.
  6. In the last paragraph, they define “large investments”. For a formal report or paper, it is better to mention it earlier when they introduce large investments for the first time.

Final Peer Review (Edmond Mui)

The goal of this project was to try to predict stock prices based on whether or not large institutions invested in those stocks (trying to find evidence of a positive relationship). Thus, they used a data set consisting of the the investment decisions of Goldman Sachs and tried to analyze these stocks' long-term stock prices.

Positive Feedback

  1. Good catches when conducting the data cleaning because simply including Puts and Calls into the cleaned data set might skew the models.
  2. Using feature engineering to try a polynomial fit was definitely a clever idea. From first hand experience, this is used quite often in finance and your implementation of it impressed me. The use of PCA was also a good idea for such a complicated environment such as predicting stock prices.
  3. Overall, I thought this was a very in-depth project and you attempted it to the best of your ability and came out with some interesting results.

Rooms for Improvement

  1. Most of your plots have extremely small text which makes it hard to understand the message you are trying to convey from those data visualizations (even if you explain the visualizations as well in the report).
  2. Although I understand that the goal was to try multiple models that we learned from class, I am curious as to why you specifically chose a quantile regression. There wasn't really a reasoning behind why you thought predicting stock performance would necessarily be better predicted using a quantile regression.
  3. The results of your analysis seem quite unsatisfying since they all seem to conclude that investments made by GS do not play a large role in future stock prices. I feel that the project might have needed more time to get a fully comprehensive report completed and I am also interested to see what would have happened if you were able to analyze the results of how GS executed their calls and puts.

Midterm report peer review by bs774

The team intends to identify influential investments in terms of the stock prices by identifying a correlation between the large investments, large being decide by an arbitrary threshold, made by Goldman Sachs and the change in stock prices. The underlying approach decide by the team is good but there are certain things I would suggest. Most importantly I think the team needs to make sure that if such a correlation does emerge it is not because of a separate factor that is affecting both the stock price and the bank's investment like maybe a newly passed law or some breakthrough invention. Secondly rather than just merging the separate investments in one company over a quarter into one and using that as a feature maybe the team could also include the span of time over which this investment was made as a feature as this could be a significant factor in influencing prices because one sudden investment could have a greater impact on the price than a spread out investment even if it is larger than the former.
The strategy adopted to come up with the stock price can also be improved as just dividing the holding value at a point by the number of units could lead to inaccurate predictions because it is not the market price of the stock which could already be on an upward trend. It would be better to use some kind of a feature that would also capture time.
There are several points about the approach that are very good like ensuring that the source of data is authentic. Secondly the team has used some pretty innovative methods to extract the data they need in the form that they need it.
I wish the team the best of luck for the final leg!

Final Review Influential Investment Prediction

In essence, the group is trying to predict stock prices. The group took a unique perspective where they went off to study the effect on stock prices by big companies' investment (on certain stocks).

I like this project, especially the thought given by the group. Stock prices prediction has always been a big topic in finance and until today no really good algorithms have been proposed. The group smartly observed the connection between the big investment agencies' behavior on stock market and the future prices of these stocks. This is something that is really interesting and thought provoking. I liked the creativity behind it. The group also went off the try some different models, some of them quite complex, to see which one would give good and reliable results. Some of the models haven't even been covered in class and it is great to take some initiatives and explore them. The work they have done is very meaningful and I learned quite a lot from reading their report.

Here are some things that I think they can improve upon. First, the wording of the report is not very good. It contained a lot of jargon early on and for someone who has no experience in finance like me, it is hard for me to understand what they are trying to do. Also, some of the sentences are written with possible grammar errors, which decreased the quality of the report. Second, I think the visuals that are created were great, but they should be accompanied by more explanations. Some plots (for example, the ones from Quantile Regression) were very complex and deserve more explanation. Third, when referring to terms in new models, it is helpful to offer some theoretical backup to them before directly applying them.

For future suggestions, I understand that it is hard to do this kind of predictions in general as the group concluded at the end of their report, but maybe the group can potentially add some elements from traditional techniques such Time Series analysis and see maybe the return of stock prices themselves are correlated or not. Then, combing these information with the new model they proposed might offer better prediction results.

In general, this is a good project with a cool concept. Although it didn't work out eventually, I think the group deserves recognition for their work.

Peer Review

I thought the topic was interesting, but had a few issues with the proposal:

  • where are the methods? What machine learning methods will be used? While a vague mention of "predicting" is made, it is not clear if this is supposed to be done manually, or whether certain algorithms will be used. This point is very important considering that this class is on machine learning and scalable algorithms. The scale of the dataset, the challenges (beside just data collection) should be put into consideration.

  • Why just Goldman? It seems like a very narrow window of analysis. Is this to limit the amount to collect and analyze?

Other than that, I believe that the topic itself could be interesting if proper machine learning and predictive analysis could be incorporated.

InfluentialInvestmentPrediction midterm report review by qs76

This project is very exciting and it aims to predict the influential investment of Goldman Sarchs. The dataset that was obtained by the team is neat and tidy since there are not any missing data and all of them are numerical. Therefore, no heavy data-preprocessing work is necessary.
The reports states that the team chooses to ignore data that they think would cause different behaviors on other transactions. However, there are no specific test results to support that. Such approach can be too subjective and could have accidentally deleted out crucial information. Moreover, the merge of the columns in a single table might cause duplicate counts and running a correlation test for it is recommended.
In addition, the initial data analysis is unclear on its approach. More clarification is suggested for explanation. After choosing a lower cutoff for " influential" difficult data the plot appeared to be log-normal and log-normal transformation should be adopted in the plot.
Eventually, the team attempts to fit a linear model between the absolute investment and the percentage change of stock price. The data-set actually does not show a good fit to me and I suggest plotting out the residual plot to see if it exists a pattern. In that case, maybe there is another kind of model that can fit the data-set better.

Final Review

This group is trying to explore the correlation between investments made by huge companies and long-term stock prices. They studied Goldman Sach's quarterly report and tried to predict the subsequent quarter stock price using the previous quarter investments. Although no significant result is found after fitting the data to multiple models, the group was able to find some correlation for shorter terms.

3 things that the group did well:
A variety of regression models are applied to the data, and the analysis of the regression results is very thorough.
Terms are well explained for those who have no financial background, and the emphasis on the importance of the problem is clear.
Clear description of data and preprocessing / feature engineering.

3 things to improve:
Final dataset features sectino is confusing. What is a 4-digit date price? What is a 4-digit date amt? What about the last quarter in the dataset, which doesn't have a subsequent quarter?
Organization of the report is hard to follow. Instead of having two columns and flowing left and right in general like a research paper, this report flows left to right for each section, which also flows top to bottom. The format in general makes the report look a little unorganized.
Although the analysis of the regression is very clear, it is hard to match these analysis to the graphs since it is hard to match the result to a specific graph or a point on the graph.

Peer Review by yj76

This project is going to study the influence of Goldman Sachs’s stock price after large investments and then try to predict whether this will lead to gains in the following two quarters. The datasets being used are SEC’s 13-F-HR data for GS , and the performance data of the companies GS invested from Yahoo Finance and Bloomberg.

Things I like about the project:
(1) I really like the idea of your project. It’s an interesting project and you have a specific focus instead of just predict stock prices in a general way.
(2) I appreciate that you evaluated the feasibility of the project and have made some progresses on data collection, which you realized would be a difficult part for the project.
(3) I like that you choose to come down to one representative firm which makes this project more feasible.

Things that I think can be improved and some concerns:
(1) Will it affect the performance of model if you only fit the data of investments that are larger than $1 million? I think this is a part that you need to give more justifications because your models may not include the features of relatively small investments.
(2) I think you may need to provide some evidences that “stock price raises beyond what can be directly attributed to the investment in the six months following the investment” is a valid trend in the financial market.
(3) You didn’t talk much about what kinds of methods you are going to use for this project. It seems to be a challenging project and I am interested in your approaches.

Midterm Report Peer Review by ts568

The focus of this project is on investments made by Goldman Sachs and from my understanding, the group hopes to understand how the investments made by Goldman Sachs may influence the stock price of companies they have invested in. The data sets used were drawn from the SEC Edgar database and include information about Goldman's holdings at the end of each quarter. In this report, the group details their process of data cleaning as well as some initial data analysis.

A few aspects of the report I like:

  1. The group attempts to tackle an interesting and relevant topic within the financial industry and has a very clean, quantitative data set to work with.
  2. It is evident that the group put a good amount of time and effort into merging and transforming their data so as to have the column features necessary for their analysis.
  3. The group addresses many important aspects of their data, such as what value constitutes an influential investment, and uses data visualizations to document that process.

A few areas for improvement:

  1. It would be helpful if the report began with background on why the group is pursuing this project and what the objective is. I only fully understood the goal after going back and reading the project proposal.
  2. The report should include information such as the number of features and examples in the data set as well as how the group plans to address over- and under-fitting, as was detailed in the midterm report rubric.
  3. The team may want to explore a different model to fit the data, as the early linear regression does not show a strong linear trend in quarterly correlation between change in amount held over the quarter and stock price over that quarter. At the end of the report, the group does mention that they would like to use decision trees to examine their data, which seems to be a good idea.

Final Peer Review by cct65

The goal of this project is to understand long-term stock prices trends according to Goldman Sachs investments. In my opinion, it is an interesting project that has real world applications. Overall, I think that the report is well organized. I really appreciate the multiple data visualizations that your provided, it helped me better understand your thought process. I also like the fact that you decided to apply some feature engineering and that you justified the use of each of these. I find also great that you took the time to deeply analysis some of your results. Finally, I think that using PCA was a good idea to be able to get further insights on your prediction problem.

When you are talking about cross validation, I would have found useful to specify the train and the test set that you used (the percentage of data that you chose for each). Moreover, be careful with the size of your figures; it is quite hard to look at the values in your first chart for example. Furthermore, I found that surprising that you have a test error is lower than the train error for the third model, I would have like to know your explanation about that. I think that it was a good idea to use quantile regression but I would have advised you to try to also add a regularizer to help your model generalize and reduce variance. In your “Multi-Feature Models and PCA part”, you are mentioning some results but I think that it would have been more helpful to display them so that the reader can better understand your choices. I feel like your project was a little bit too ambitious as the question has not been entirely answered but I like the fact that you mentioned possible ways to improve your prediction model and get all the answers you were looking for.

Overall I think that it is a very interesting project

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.