dvklopfenstein / pmidcite Goto Github PK

Turbocharge a PubMed literature rather than clicking and clicking and clicking on Google Scholar

License: GNU Affero General Public License v3.0

Makefile 0.03% Python 99.27% Jupyter Notebook 0.70% Shell 0.01%

citation-analysis citation-counts citation-downloader citedby command-line-tool library literature-review literature-search ncbi nih-citation-data pmid pubmed snowballing

pmidcite's People

Contributors

Stargazers

Watchers

Forkers

scbarrera raefcon erxw pamonlan manodeep cgshuo msbased

pmidcite's Issues

ImportError: cannot import name 'summarize_papers' from 'pmidcite.scripts.icite'

Installed via pip3. icite works normally however when running this as a test:

summarize_papers goatools_cites.txt -p TOP CIT CLI

I get

Traceback (most recent call last):
  File "/cluster/path/to/my/directory/env/bin/summarize_papers", line 5, in <module>
    from pmidcite.scripts.icite import summarize_papers
ImportError: cannot import name 'summarize_papers' from 'pmidcite.scripts.icite' (/cluster/path/to/my/directory/env/lib/pypy3.9/site-packages/pmidcite/scripts/icite.py)

function to download NIHOCC data in your python notebooks doesn't work

I downloaded two python notebooks and both functions api.dnld_icite and dnldr.get_icite gives the same error:

icite command not working

I installed the github repo and used the make command but icite command was not working. Can someone please help me with this ?

AttributeError: 'NoneType' object has no attribute 'pmid'

If NIH citation data is not available for one or more requested PMIDs in a list of PMIDs, this error appears:

**WARNING: 1 NIH CITATION DATA NOT DOWNLOADED FOR PMIDs: 32809475
Traceback (most recent call last):
  File "src/bin/dnld_pmids.py", line 40, in <module>
    main()
  File "src/bin/dnld_pmids.py", line 36, in main
    obj.run(queries, dnld_idx)
  File "/cygdrive/c/Users/note2/Data/git/pmidcite/src/pmidcite/pubmedqueryicite.py", line 40, in run
    self.querypubmed_runicite(ntd.filename, ntd.pubmed_query)
  File "/cygdrive/c/Users/note2/Data/git/pmidcite/src/pmidcite/pubmedqueryicite.py", line 61, in querypubmed_runicite
    self.wr_icite(fout_icite, pmids)
  File "/cygdrive/c/Users/note2/Data/git/pmidcite/src/pmidcite/pubmedqueryicite.py", line 78, in wr_icite
    pmid2paper = dnldr.get_pmid2paper(pmids, self.pmid2note)
  File "/cygdrive/c/Users/note2/Data/git/pmidcite/src/pmidcite/icite/pmid_dnlder.py", line 144, in get_pmid2paper
    pmid2icite = {o.pmid:o for o in self.get_icites(pmids_top)}
  File "/cygdrive/c/Users/note2/Data/git/pmidcite/src/pmidcite/icite/pmid_dnlder.py", line 144, in <dictcomp>
    pmid2icite = {o.pmid:o for o in self.get_icites(pmids_top)}
AttributeError: 'NoneType' object has no attribute 'pmid'

Add option to always download citations from the NIH (no temporary working files)

Add an option to always download citations from the NIH; this will now be the default setting.

The previous default mode allows researchers to combine working on-line with large periods of working offline by downloading citation data from the NIH (json format), converting the json data to a Python dict, and writing the Python data into a temporary working file(p.py). Under the former default setting, citation data previously downloaded from the NIH would be loaded from the temporary Python working files rather than re-downloaded from the NIH unless the force_download argument is True.

The new option of always downloading citations from the NIH results in no temporary working files written to disk resulting in the researcher always seeing the latest citation data, but with the cost of not being able to use previously downloaded data when working offline. An advantage of always downloading citations is an increase in speed resulting from not writing citation working data to disk. Another advantage includes simplifiying the user interface of the Python library for researchers using the library in their code. The ability to save NIH citation data to disk as Python dicts remains, but is now an option. No code changes are necessary for any researchers using this library in their code.

Annotate icite results to identify reviews

Hello,

Thank you so much for creating this literature search tool! It is very helpful for my work.

Can you add annotation to the icite results that identifies whether this paper is a review?

Thank you!

High memory usage by pmidcite

@dvklopfenstein Pmidcite uses a huge amount of memory while accessing the pubmed ids. I have a txt file which contains 90,000 pmids. On running pmidcite, the cluster gets aborted due to high memory imprint (about 16 GB). Can you please help me with this ? I only require the headers and do not want the information of the citations.

Makefile does not execute completely

Hi,

I'm trying to set-up your tool manually by cloning this repo and running your Makefile, but the makefile doesn't seem to create the necessary files (but rather stops after executing just 2 find-commands). Could you please add some lines of explanation, how to set-up your tool manually?

Thank you so much for your efforts! Looking forward to testing your tool :)

Here my Makefile print-out:

make -f makefile
find src -regextype posix-extended -regex ".*[a-z]+.py"
src/bin/dnld_pmids.py
src/bin/rpt_dates_top.py
src/bin/plot_pubmed_contents.py
src/bin/plt_guassian_nihperc.py
src/bin/scatter.py
src/bin/query_pubmed.py
src/bin/dnld_pubmed.py
src/bin/icite.py
src/bin/read_pmids.py
src/tests/args_dflt.py
src/tests/pmids_many.py
src/tests/test_cfg_icite.py
src/tests/test_nb_print_paper_sort_cites.py
src/tests/test_paper_sorts.py
src/tests/test_nb_nihocc_data_download_always.py
src/tests/test_speed_api_dnld.py
src/tests/dnld_pmids_100k.py
src/tests/test_cli_icite.py
src/tests/test_nb_print_paper_all_refs_cites.py
src/tests/prt_hms.py
src/tests/test_nb_query_pubmed.py
src/tests/test_nb_nihocc_data_download_or_import.py
src/tests/icite.py
src/tests/pmids.py
src/tests/test_dnld_cites_refs.py
src/tests/test_speed_dnld_load.py
src/tests/test_database_list.py
src/tests/test_icite_longreq.py
src/tests/test_print_paper.py
src/tests/test_dnld_pmids.py
src/pmidcite/eutils/cmds/pubmed.py
src/pmidcite/eutils/cmds/efetch.py
src/pmidcite/eutils/cmds/elink.py
src/pmidcite/eutils/cmds/cmdbase.py
src/pmidcite/eutils/cmds/esearch.py
src/pmidcite/eutils/cmds/base.py
src/pmidcite/eutils/cmds/query_ids.py
src/pmidcite/eutils/pubmed/terms.py
src/pmidcite/eutils/pubmed/query.py
src/pmidcite/eutils/pubmed/author.py
src/pmidcite/eutils/pubmed/qualifiers.py
src/pmidcite/eutils/pubmed/descriptors.py
src/pmidcite/eutils/pubmed/rdwr.py
src/pmidcite/eutils/pubmed/record.py
src/pmidcite/eutils/pubmed/authors.py
src/pmidcite/eutils/pubmed/counts/dnlded_data.py
src/pmidcite/eutils/pubmed/counts/dnld.py
src/pmidcite/eutils/pubmed/counts/plt.py
src/pmidcite/eutils/pubmed/counts/data.py
src/pmidcite/cfg.py
src/pmidcite/pubmedqueryicite.py
src/pmidcite/plot/nih_perc.py
src/pmidcite/plot/scatter.py
src/pmidcite/utils_module.py
src/pmidcite/_version.py
src/pmidcite/icite/pmid_dnlder.py
src/pmidcite/icite/downloader.py
src/pmidcite/icite/papers.py
src/pmidcite/icite/paper.py
src/pmidcite/icite/api.py
src/pmidcite/icite/utils.py
src/pmidcite/icite/entry.py
src/pmidcite/icite/dnldr/pmid_dnlder.py
src/pmidcite/icite/dnldr/pmid_loader.py
src/pmidcite/icite/dnldr/pmid_dnlder_base.py
src/pmidcite/icite/dnldr/pmid_dnlder_only.py
src/pmidcite/icite/nih_grouper.py
src/pmidcite/cli/readpmids.py
src/pmidcite/cli/rptdatestop.py
src/pmidcite/cli/querypubmed.py
src/pmidcite/cli/entry_keyset.py
src/pmidcite/cli/utils.py
src/pmidcite/cli/icite.py
src/pmidcite/cli/dnldpubmed.py
src/pmidcite/cfgini.py
find src -regextype posix-extended -regex "[a-z./]*" -type d
src
src/bin
src/tests
src/tests/data
src/pmidcite
src/pmidcite/eutils
src/pmidcite/eutils/cmds
src/pmidcite/eutils/pubmed
src/pmidcite/eutils/pubmed/counts
src/pmidcite/plot
src/pmidcite/icite
src/pmidcite/icite/dnldr
src/pmidcite/cli

Tested on MacOS-System.

urllib3 warning seen

Thank you so much for writing this project and showing me how to use it!

I am seeing the following warning. I know it's not from your package, but I thought I would let you know.

 % icite 32976797
/Users/user1/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(

Thank you very much!

Add new functionality: PubMed search from command line

I love your tool! It is so amazing. Thank you so much for this.

I see you can do PubMed searches from the script but I need to do it from the command line. Would it be possible to do this from the command line?

icite -s 'HIV AND methylation AND (2017:2023[pdat])' -o HIV_meth_gt2017.txt

Error: Pip install not working

Hey & thx for your tool!

My issue:
After installing pmidcite via pip install pmidcite calling icite does not work.

Error:
-bash: icite: command not found

Request to add 'fl' parameter to icite API

fl: only return publications with the given fields. Separate multiple fields with commas (no space). Field names are very specific and listed in Response example below. No fl param will return all fields.

error: package directory 'src/pmidcite/eutils/pubmed/mesh' does not exist

Hey all,

I made a fresh python 3.8 enviornment and ran pip install pmidcite and got the following error:

Collecting pmidcite
  Using cached pmidcite-0.0.36.tar.gz (2.6 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [7 lines of output]
      running egg_info
      creating /private/var/folders/lg/xq84d0w15qv8wrgxqbrxznkm0000gn/T/pip-pip-egg-info-zi1zdbyn/pmidcite.egg-info
      writing /private/var/folders/lg/xq84d0w15qv8wrgxqbrxznkm0000gn/T/pip-pip-egg-info-zi1zdbyn/pmidcite.egg-info/PKG-INFO
      writing dependency_links to /private/var/folders/lg/xq84d0w15qv8wrgxqbrxznkm0000gn/T/pip-pip-egg-info-zi1zdbyn/pmidcite.egg-info/dependency_links.txt
      writing top-level names to /private/var/folders/lg/xq84d0w15qv8wrgxqbrxznkm0000gn/T/pip-pip-egg-info-zi1zdbyn/pmidcite.egg-info/top_level.txt
      writing manifest file '/private/var/folders/lg/xq84d0w15qv8wrgxqbrxznkm0000gn/T/pip-pip-egg-info-zi1zdbyn/pmidcite.egg-info/SOURCES.txt'
      error: package directory 'src/pmidcite/eutils/pubmed/mesh' does not exist
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I'm using Mac m1 Monterey 12.3 and python 3.8. Any ideas why?

bulk download of the citation network?

Great package, ty. I was wondering if rather than making API requests for a list of pmids, is there dump of the citation network somewhere for pubmed? I would just need the list of edges in two columns basically (source, destination). I figured this would max out API requests fairly quickly using the examples provided. I know this is a big and maybe impossible ask, but just thought how I'd obtain as much of the network as possible to assist in a recommendation engine I'm working on. Thank you.

Alternate delimiter on output file

icite -H $PMID -c > $PMID.txt

I would love to use this tool in a program, but the only issue I have is that it is meant to be human-readable, rather than machine-readable. I mean this in the sense that it is space-delimited to keep the columns in line, but the number of spaces is naturally inconsistent.

Is there an option for the output of the above command to be tab-delimited instead? Comma delimiting seems dangerous due to the paper names potentially containing commas. Standard commands to convert spaces to tabs fail due to the paper names and limiting to only consecutive spaces causes issues with columns 5, 6, and 7 where they are often only separated by a single space.

Thanks!