Giter Site home page Giter Site logo

edgar's People

Contributors

palafrank avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

edgar's Issues

Lookup Company CIK by Name

Hi there,

My fork is highly specialized to my needs at this point but thought I would post my code for how I do a company lookup by name and get the corresponding CIK.

changes to parser.go

func cikPostPageParser(page io.Reader) (string, error) {
	doc, _ := html.Parse(page)
	r := regexp.MustCompile(`CIK=[+]?\d{2,}$`)
	var CIK string
	var f func(*html.Node)
	f = func(n *html.Node) {
	    if n.Type == html.ElementNode && n.Data == "a" {
	        for _, a := range n.Attr {
	            if a.Key == "href" {
			m := r.FindStringSubmatch(a.Val)
			if len(m) > 0 {
				CIK = strings.Split(m[0], "=")[1]
			}
	                break
	            }
	        }
	    }
	    for c := n.FirstChild; c != nil; c = c.NextSibling {
	        f(c)
	    }
	}
	f(doc)
	if CIK != "" {
		for len(CIK) < 10 {
			CIK = "0" + CIK
		}
		return CIK, nil
	}
	return CIK, errors.New("Could not find CIK")
}

func postPage(url1 string, cn string) io.ReadCloser {
	resp, err := http.PostForm(url1, url.Values{"company": {cn}})
	if err != nil {
		log.Fatal("Query to SEC page ", url1, "failed: ", err)
		return nil
	}
	return resp.Body
}

changes page.go

var (
	baseURL   string = "https://www.sec.gov/"
	cikURL    string = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&output=xml&CIK=%s"
	backupCIK string = "https://www.sec.gov/cgi-bin/cik_lookup"
	queryURL  string = "cgi-bin/browse-edgar?action=getcompany&CIK=%s&type=%s&dateb=&owner=exclude&count=10"
	searchURL string = baseURL + queryURL
)

func postPage(url1 string, cn string) io.ReadCloser {
	resp, err := http.PostForm(url1, url.Values{"company": {cn}})
	if err != nil {
		log.Fatal("Query to SEC page ", url1, "failed: ", err)
		return nil
	}
	return resp.Body
}


func getCompanyCIK(ticker string) string {
	fmt.Println("getting company CIK")
	var t bool
	if strings.Contains(ticker, " ") {
                // If the "ticker" has a space in it, we assume it is a company name
		t = true
	} else {
                // Otherwise we assume it is a ticker and try
		url1 := fmt.Sprintf(cikURL, ticker)
		r := getPage(url1)
		rb, _ := ioutil.ReadAll(r) //this is inefficient but upstream it requires an unclosed resp.Body which means I can't test to see if ticker worked fine or not without having to make this call and one later
		t = strings.Contains(string(rb),"No matching Ticker Symbol.")
	}
	switch {
	case t == false:
		url1 := fmt.Sprintf(cikURL, ticker) 
		r2  := getPage(url1) //the inefficient second call
		if cik, err := cikPageParser(r2); err == nil {
			return cik
		}
	case t == true:
		r := postPage(backupCIK, ticker)
		if r != nil {
			if cik, err := cikPostPageParser(r); err == nil {
				fmt.Println(cik)
				return cik
			}
		}
	default:
		fmt.Println("in default")
	   return ""
	}
	return ""
}

It works, its not pretty, but reduces limitations on searching just by CIK or Symbol (as many smaller ones do not automatically work).

Also, I am working on some mass collection of words to correlated them to tags so the number of tags to concepts should increase when I am finished. I will submit some additional tags for you if you want.

Generated data are not Error checked

Dps specifically needs to be checked otherwise *math.Inf or *math.NaN values can mess with the json encoding resulting in a fatal crash as the encoding library does not handle these values for you.

data_def.go changes:

import (
	"errors"
	"fmt"
	"log"
	"reflect"
	"math"
)
// -- snipped

func generateData(fin *financialReport, name string) float64 {
	log.Println("Generating data: ", name)
	switch name {
	case "GrossMargin":
		//Do this only when the parsing is complete for required fields
		if isCollectedDataSet(fin.Ops, "Revenue") && isCollectedDataSet(fin.Ops, "CostOfSales") {
			log.Println("Generating Gross Margin")
			if !math.IsInf(fin.Ops.Revenue - fin.Ops.CostOfSales, 0) && !math.IsNaN(fin.Ops.Revenue - fin.Ops.CostOfSales){
				return fin.Ops.Revenue - fin.Ops.CostOfSales
			}
		}

	case "Dps":
		if isCollectedDataSet(fin.Cf, "Dividends") {
			if isCollectedDataSet(fin.Ops, "WAShares") {
				if !math.IsInf(round(fin.Cf.Dividends * -1 / fin.Ops.WAShares), 0) && !math.IsNaN(round(fin.Cf.Dividends * -1 / fin.Ops.WAShares)){
					return round(fin.Cf.Dividends * -1 / fin.Ops.WAShares)
				}
			} else if isCollectedDataSet(fin.Entity, "ShareCount") {
				if !math.IsInf(round(fin.Cf.Dividends * -1 / fin.Entity.ShareCount), 0) && !math.IsNaN(round(fin.Cf.Dividends * -1 / fin.Entity.ShareCount)){
					return round(fin.Cf.Dividends * -1 / fin.Entity.ShareCount)
				}
			}
		}
	case "OpExpense":
		if isCollectedDataSet(fin.Ops, "Revenue") &&
			isCollectedDataSet(fin.Ops, "CostOfSales") &&
			isCollectedDataSet(fin.Ops, "OpIncome") {
				if !math.IsInf(round(fin.Ops.Revenue - fin.Ops.CostOfSales - fin.Ops.OpIncome),0) && !math.IsNaN(round(fin.Ops.Revenue - fin.Ops.CostOfSales - fin.Ops.OpIncome)) {
					return round(fin.Ops.Revenue - fin.Ops.CostOfSales - fin.Ops.OpIncome)
				}
		}
	}
	return 0
}

Saving filings

Hi Palafrank,

Thanks again for open sourcing this. I was wondering how to save a company folder and if it saves high level info parsed from company documents or the actual documents?

Thanks,
Brock

Parsing Missing fields

I try every test in Edgar package, but tests with live data are returning missing fields example this test "TestLiveAMGNParsing" is returning those fields:
Missing fields in Entity Info[ShareCount,]
Missing fields in Assets[Liab,]
i tried to change the CIK company name from "AMGN" to "IBM" and parsing was without missing fields, i try another CIK - "AAPL" and again the parser failed:
Missing fields in Entity Info[ShareCount,]
And finally i understand that this parsing error is happening with some companies, but i don't know how to fix this.

https://github.com/palafrank/edgar/blob/master/parser_test.go#L889
https://github.com/palafrank/edgar/blob/master/xbrltags.go

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.