Giter Site home page Giter Site logo

irco's People

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

pombredanne

irco's Issues

Process records with invalid affiliation mappings

Some records in the WoS dumps contain ambiguous affiliation data. Currently these records are parsed incorrectly and, as a consequence, invalid data is added to the database.

Correct format

The correct format of the C1 field uses square brackets [] to define to which authors a given affiliation belongs. Example (with added newlines):

[El Miedany, Y.] North Kent Hosp, Dartford, England;
[El Gaafary, M.; El Yassaki, A.; Youssef, S.] Ain Shams Univ, Cairo, Egypt;
[Ahmed, I.] Cairo Univ, Cairo, Egypt;
[Hegazi, M. O.] Al Adan Hosp, Kuwait, Kuwait;
[Palmer, D.] North Middlesex Univ Hosp, London, England

When records are parsed using the correct format, authors parsed from the 'AF' record can be matched to the correct institution by using the information provided in the square brackets.

Wrong format

Some records in the dumped file, have a different structure for the C1 field. The square brackets are missing. Example (with added newlines):

Adan Hosp, Minist Hlth, Dept Med, Ahmadi City, Kuwait;
Adan Hosp, Minist Hlth, Intens Care Unit, Ahmadi City, Kuwait;
Adan Hosp, Minist Hlth, Dept Radiol, Ahmadi City, Kuwait;
Adan Hosp, Minist Hlth, Dept Med, Ahmadi City, Kuwait;
Adan Hosp, Minist Hlth, ICU, Ahmadi City, Kuwait

Records using this format don't contain enough information to allow a precise matching of authors affiliation. The example presented above was taken from a publication with the following content for the AF field:

Bitar, ZI;
Ashebu, SD;
Ahmed, S

As this example shows, positional matching of affiliations to authors is not possible as there are records with more authors than institutions, or more institutions than authors.


Given the issue presented above, I think that the cleanest solution would be to completely ignore these records, even if that means ignoring 25%* of the records.

  • Actually it is less than that. 25% are the numbers of records with no Kuwaiti affiliation, including the ones that WoS erroneously includes because an affiliation contains Kuwait in its name.

adding first author affiliation field

This issue builds on the request in issue 14.

In many cases, the papers have missing reprint author information. In those cases, it would be useful if we can use first author affiliation instead. Can we add another field in the publications table that stores first author affiliation? This would then allow us to run queries in which we specify the corresponding author country and if that is missing, to then use first author affiliation country to search for publications.

documentation request for irco-graph

It would be very helpful if you can document how each of the country, institution and author graphs are generated. What is the node, edge, and node weight or edge weight definition.

Save data modification history into the database

In order to be able to rebuild a consistent data set and to trace the actions that brought to a certain condition.

Ideally, each entry would carry the IRCO version, timestamp, exact command,...

improving author affiliation data

I found the following issue when analyzing detailed data for Qatar. For publication year 1985 there are a total of 15 records in the raw data file (in savedrecs-8.txt). See rows 486 to 500 when you open this .txt file in excel. Of these 15 records, I get 11 to be imported in the database when I run the irco-import command. I checked the author affiliation information in the RP field, and I find that of the four records that are not included (in rows 489, 491, 497 and 498), two of them have clear reprint author information. Row 491 and Row 497 show reprint author information clearly, however the affilation of other authors in the papers is unclear (as seen in field C1). In these cases (where reprint author info is available) but other authors are unclear, let's include these records also. And in such cases, let's put in a 'NA' or some other data for the missing affiliation of the other authors. This would then allow us to do a better counting of papers by corresponding author countries which is an important part of the analysis.

feature request for directed graphs

It would be very useful if we can generate directed graphs of country, institution, and authors. The corresponding author (or reprint author or first author) would be the source, and the target nodes will be its co-authors. In a country graph, the source would country of corresponding author, and target node will be country of co-author. The weight of a directed edge in a country graph would be number of papers between the countries. Say for example, if there are 50 papers in which the corresponding author is from country A and has co-author in country B, and 20 papers are from a corresponding author in country B with a co-author in country A, the directed edge from A to B will have weight 50 and directed edge from B to A will have weight 20. This type of graph will allow us to compute some reciprocity metrics (at country level and author level).

feature request for data import report

The new irco-import function is working fine now. However, can a feature be added so that it generates a csv file in which it places all records that did not get included in the database due to ambiguous affiliation information? And can it also add in that csv file, a report of how many records in each publication year were ignored due to ambiguous information? This would allow for more precise information about the data that was omitted from analysis due to insufficient affiliation information.

country affiliation for UAE data

I have noted that in my records the country affiliation for publications from UAE is not being recorded. I get messages in the terminal window when running the irco-graph command and irco-import command. In many cases the name is given in the record as U Arab Emirates. I am attaching a screen shot here:

irco import not working with new release irco 0.9

I upgraded my irco version to 0.9 and I get an error when I run the irco-import command. I am sending you the error log. I was importing the same datafiles as I have used before with irco, so I don't think it is an issue with the data files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.