Giter Site home page Giter Site logo

ietfdata's Introduction

ietfdata - Access the IETF Datatracker and related resources

This project contains Python 3 libraries to interact with, and access, the IETF datatracker, RFC index, and related resources.

Getting started

The project uses Pipenv for dependency management. To begin, run:

pipenv install --dev -e .

to create a Python virtual environment with appropriate packages install. Then, run:

pipenv shell

to start the virtual environment, within which you can run the scripts.

Once the virtual environment is started, running:

python3 tests/test_datatracker.py 

will run the test suite for the datatracker module. Running:

python3 tests/test_rfcindex.py

Will test the rfcindex module.

Caching

The ietfdata library can use a MongoDB instance as a cache. Using a cache reduces the number of requests that are made directly to the Datatracker, improving performance, and reducing the impact on the IETF's infrastructure. While using a cache is optional when accessing the Datatracker, it is required when accessing the mail archive.

The hostname, port, username, and password for the MongoDB instance that is to be used as the cache can be set when instantiated the DataTracker or MailArchive objects. Alternatively, the following environment variables can be set:

  • IETFDATA_CACHE_HOST (defaults to localhost when accessing the mail archive)
  • IETFDATA_CACHE_PORT (defaults to 27017)
  • IETFDATA_CACHE_USER (optional)
  • IETFDATA_CACHE_PORT (optional)

Release Process

  • Edit CHANGELOG.md and ensure up-to-date
  • Edit setup.py to ensure the correct version number is present
  • Edit ietfdata/datatracker.py to fix version number in DataTracker::ua
  • Run make test to run the test suite. If any tests fail, fix then restart the release process
  • Commit changes and push to GitHub
  • Check that the GitHub Continuous Integration run succeeds, and fix any problems (this runs with a fresh cache, so can sometimes catch problems that aren't found by local tests).
  • Run python3 setup.py sdist bdist_wheel to prepare the package
  • Run python3 -m twine upload dist/* to upload the package
  • Commit the packages files in dist/* push to GitHub
  • Tag the release in GitHub

ietfdata's People

Contributors

brodiee121 avatar csperkins avatar dependabot[bot] avatar georgefourm avatar lumisota avatar mladenk42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ietfdata's Issues

No programmatic path from document to document type

It is possible to inspect a document to get a DocumentTypeURI:

$ doc.type
DocumentTypeURI(uri='/api/v1/name/doctypename/draft/', params={})

But retrieving the document type object from this URI is not currently possible without ad hoc string manipulation because the document type lookup function uses the slug, i.e.:

$ dt.document_type('draft')

PavlovaParsingError: Field: consent missing on dt.iab_chair().name

Running this code from the examples/ directory in a local notebook in an environment with ietfdata installed using pip install -U ietfdata:

from datetime                 import timedelta
from pathlib                  import Path
from ietfdata.datatracker     import *
from ietfdata.datatracker_ext import *
from dateutil.parser          import *

dt = DataTrackerExt()

# =============================================================================
# Information about the IAB:

print(F"The IAB chair is {dt.iab_chair().name}")

print("The IAB members are:")
for m in dt.iab_members():
    print(F"  {m.name}")
print("")

I'm getting the following error/stack trace:
https://gist.github.com/sbenthall/688b3e9fb920e0f1f7ec212718a491a0

I wonder if there's been an API change, or if I've mistaken the configuration/installation somehow.

No way to navigate from a document to its authors?

The resolution of #14 makes the author affiliation available given the document author, and the ability to search documents by author.

I don't think there's a way yet to get the authors of a document.

>>> dt.document(DocumentURI('/api/v1/doc/document/draft-gharai-avt-uncomp-video/'))
Document(resource_uri=DocumentURI(uri='/api/v1/doc/document/draft-gharai-avt-uncomp-video/', params={}), id=26290, name='draft-gharai-avt-uncomp-video', title='RTP Payload Format for Uncompressed Video', pages=11, words=2827, time=datetime.datetime(2009, 2, 24, 0, 0), notify='', expires='2002-12-26T00:00:00', type=DocumentTypeURI(uri='/api/v1/name/doctypename/draft/', params={}), rfc=None, rev='00', abstract='This memo specifies a packetization scheme for encapsulating\r\nuncompressed HDTV as defined by SMPTE 274M and SMPTE 296M into\r\na payload format for  the Real-Time Transport Protocol (RTP).\r\nSMPTE 274M  and SMPTE 296M  define the analog and digital\r\nrepresentation of HDTV with image formats of 1920x1080  and\r\n1280x720, respectively. The payload has been designed such\r\nthat it may scale to future higher resolutions, suhc as\r\nDigital Cinema.', internal_comments='', order=1, note='', ad=None, shepherd=None, group=GroupURI(uri='/api/v1/group/group/1027/', params={}), stream=None, intended_std_level=None, std_level=None, states=[DocumentStateURI(uri='/api/v1/doc/state/4/', params={}), DocumentStateURI(uri='/api/v1/doc/state/150/', params={})], submissions=[], tags=[], uploaded_filename='', external_url='')
>>> doc = dt.document(DocumentURI('/api/v1/doc/document/draft-gharai-avt-uncomp-video/'))

I wonder how to find the authors.

Unable to access role history correctly

I am trying to use this package to get records about working group leadership in the present and past.

Getting them for the present seems to work fine. As an example I'm using dnsop.
https://datatracker.ietf.org/group/dnsop/history/

> wg = dt.group_from_acronym("dnsop")
> [dt.person(r.person).name for r in dt.group_roles(group = wg)]
['Warren "Ace" Kumari', 'Benno Overeinder', 'Suzanne Woolf', 'Tim Wicinski']

These are indeed the names of the people currently in leadership of this group.

Now I'm trying to find the people historically but not currently in roles in this working group.

> [dt.person(r.person).name for r in dt.group_role_histories(group = wg)]
['Andrew Sullivan', 'Marc Blanchet']

I don't think these people have ever been in dnsop leadership. I'm expecting the names Peter Koch, Joel Jaeggli, and Ronald Bonica. These names are mentioned in certain GroupEvents about leadership change. Is there link to these other persons maintained in the group history?

PavlovaParsingError: Field: consent missing on dt.document(...) call

Running this snippet from the examples/ directory in local environment with package installed via pip install -U ietfdata:

from pathlib              import Path
from ietfdata.datatracker import *

# =============================================================================
# Example: print information about document authors

dt = DataTracker()

doc = dt.document(DocumentURI('/api/v1/doc/document/draft-ietf-mmusic-rfc4566bis/'))
print("Title: {}".format(doc.title))

Gets error with the following stack trace:
https://gist.github.com/sbenthall/8382edbb0df5c2b09165cd4a9e55f630

HeaderDataMailHelper crashes with bad dates

The scan_message() method of HeaderDataMailHelper can crash if given a bad date:

INFO:ietfdata:scan message dnsext/015459 for metadata
Traceback (most recent call last):
  File "examples/emails_2019.py", line 48, in <module>
    ml = archive.mailing_list("dnsext")
  File "/Users/csp/Projects/glasgow-ipl/ietfdata/ietfdata/mailarchive.py", line 374, in mailing_list
    self._mailing_lists[mailing_list_name] = MailingList(self._cache_dir, mailing_list_name, self._helpers)
  File "/Users/csp/Projects/glasgow-ipl/ietfdata/ietfdata/mailarchive.py", line 189, in __init__
    self._msg_metadata[msg_id] = {**(helper.scan_message(message_text)), **(self._msg_metadata[msg_id])}
  File "/Users/csp/Projects/glasgow-ipl/ietfdata/ietfdata/mailhelper_headerdata.py", line 43, in scan_message
    timestamp = datetime.fromtimestamp(time.mktime(msg_date))
OverflowError: mktime argument out of range

The message in question here has Date: Wed, 14 Jun 100 05:41:34 -0700 (PDT) (i.e., a Y2K bug in the date string).

It may be worth checking how many message dates fail to parse across the entire archive, to see if it's worth writing a work-around.

DocumentTypeURI is not defined for recent pip installation of ietfdata

Today, I've pip installed ietfdata successfully into an environment.

Within that environment, I am able to import the DataTracker, but not able to import the DocumetTypeURI object:

$ python
Python 3.7.6 (default, Jan  8 2020, 19:59:22) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> from ietfdata.datatracker import *
>>> dt = DataTracker()
>>> DocumentTypeURI
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'DocumentTypeURI' is not defined

This is unexpected.

Group uri/resource_uri confusion?

dt = DataTracker(cache_dir=Path("cache"))
group = dt.group_from_acronym('mmusic')
drafts = dt.documents(group = group,
                      doctype = dt.document_type('draft'))

This errors with:

Traceback (most recent call last):
  File "affiliations.py", line 36, in <module>
    doctype = dt.document_type('draft')) ###
  File "/home/sb/projects/bigbang-multi/ietfdata/ietfdata/datatracker.py", line 1648, in document_type
    return self._retrieve(doc_type_uri, DocumentType)
  File "/home/sb/projects/bigbang-multi/ietfdata/ietfdata/datatracker.py", line 1367, in _retrieve
    if self._obj_is_cached(resource_uri):
  File "/home/sb/projects/bigbang-multi/ietfdata/ietfdata/datatracker.py", line 1338, in _obj_is_cached
    return self._cache_filepath(resource_uri).exists()
  File "/home/sb/projects/bigbang-multi/ietfdata/ietfdata/datatracker.py", line 1332, in _cache_filepath
    return Path(self.cache_dir, resource_uri.uri[1:-1] + ".json")
AttributeError: 'str' object has no attribute 'uri'

Submission authors are not tied to an affiliation

Submissions have author fields denoting the authors:

dt.submission(SubmissionURI('/api/v1/submit/submission/47519/'))
Submission(abstract='   This document describes an improved IS-IS neighbor management scheme\n   which can be used to enhance network performance by allowing\n   operators to quickly and accurately shift traffic away from a point-\n   to-point or multi-access LAN interface by allowing one IS-IS router\n   to signal to a second, adjacent IS-IS neighbor to adjust its IS-IS\n   metric that should be used to temporarily reach the first IS-IS\n   router during network maintenance events.\n', access_key='16722aed4cd4ff65aaa4c1f0398e6f28', auth_key='1c14c45f418f8172a9cee94f58f7645abdf3a1ad', authors="[{'email': '[email protected]', 'name': 'Naiming Shen'}, {'email': '[email protected]', 'name': 'Tony Li'}, {'email': '[email protected]', 'name': 'Shane Amante'}, {'email': '[email protected]', 'name': 'Mikael Abrahamsson'}]", checks=['/api/v1/submit/submissioncheck/45495/'],  ...

But the affiliation data, which is available in the RFC text, is lost.

The people can be looked up by email, but there's no way to recover, for example, that this person was affiliated with Apple Inc. for this RFC8500

dt.person_from_email('[email protected]')
Person(resource_uri=PersonURI(uri='/api/v1/person/person/109004/'), id=109004, name='Shane Amante', name_from_draft='Shane Amante', ascii='Shane Amante', ascii_short=None, user='', time='2012-02-26T00:17:36', photo='None', photo_thumb='None', biography='', consent=False)

rfcindex.py

There's a duplicate test in line 265. Is it a typo?

elif (self.doc_id == "RFC2497") or (self.doc_id == "RFC2497") or \

Also, regarding the test in line 268 for RFC2708, as far as I can tell, U+0092 was introduced in draft-ietf-printmib-job-protomap-01 in multiple places. In -02, it was replaced with U+0027 everywhere except section 5.0. Somehow, that stray character became the corrupt text you identified.

PR #9 breaks the tests

I merged PR #9, but it breaks the tests:

(ietfdata) [mangole] > make test
mypy ietfdata/rfcindex.py
Success: no issues found in 1 source file
mypy ietfdata/datatracker.py
ietfdata/datatracker.py:1346: error: Unsupported operand types for + ("str" and "Document")
ietfdata/datatracker.py:1346: error: Unsupported operand types for + ("str" and "RelationshipType")
ietfdata/datatracker.py:1348: error: Unsupported operand types for + ("str" and "Document")
ietfdata/datatracker.py:1350: error: Unsupported operand types for + ("str" and "Document")
ietfdata/datatracker.py:1350: error: Unsupported operand types for + ("str" and "RelationshipType")
ietfdata/datatracker.py:1352: error: Unsupported operand types for + ("str" and "Document")
ietfdata/datatracker.py:1352: error: Unsupported operand types for + ("str" and "RelationshipType")
ietfdata/datatracker.py:1354: error: Unsupported operand types for + ("str" and "Document")
ietfdata/datatracker.py:1356: error: Unsupported operand types for + ("str" and "Document")
ietfdata/datatracker.py:1358: error: Unsupported operand types for + ("str" and "RelationshipType")
Found 10 errors in 1 file (checked 1 source file)
make: *** [Makefile:28: test] Error 1
(ietfdata) [mangole] > 

RFC documents no longer have submissions

A few months ago, the following code would report that there were multiple submissions for each of the documents queried:

from ietfdata.datatracker import *
from ietfdata.datatracker_ext import *
import numpy as np

dt = DataTrackerExt()

g = dt.group_from_acronym('quic')
docs = list(dt.documents(group=g, doctype=dt.document_type_from_slug("rfc")))

np.array([len(draft.submissions) for draft in docs])

This made it possible to collect information about the date and authors of the final draft (by looking at the submissions list, ordering it by submission_date, and pulling the author information).

Now it seems that something has changed, and RFC documents appear to, as a rule, have 0 submissions.

What was the change?
Is it documented anywhere?

How can one do the following:

  • find all the RFCs for a working group
  • get the author and date information from that RFC document (the last document in the series)

Thanks!

Unify CI and Makefile test scripts

Presently, there are two test scripts: the Makefile's test target, and the CI configuration's build-and-test job. As these are separate, it is possible that they would produce different results, making it difficult to determine if changes that pass one set of tests will also pass the other.

These should be unified, ideally around the Makefile's test target. Unifying these test scripts should also include ensuring that sensible, human-readable output is produced locally, while reports in the required archival format are produced by the CI build.

tests fail on fresh checkout

On a fresh clone and running through the pipenv install instructions, I get the following error when running the tests:

$ python tests/test_datatracker.py 
....................................F............
======================================================================
FAIL: test_meeting_session_assignments (__main__.TestDatatracker)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_datatracker.py", line 1103, in test_meeting_session_assignments
    self.fail("not implemented")
AssertionError: not implemented

----------------------------------------------------------------------
Ran 49 tests in 12.185s

FAILED (failures=1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.