Giter Site home page Giter Site logo

Comments (4)

fcbond avatar fcbond commented on July 30, 2024

Hi,

if you have a wordnet derived from PWN 3.0 with the same offsets, then it can be done as follows:

>>> import wn
>>> ewn=wn.WordNet('omw-en:1.4')
>>> ewn.synset(f'omw-en-00981304-s')
Synset('omw-en-00981304-s')

Many people (including omw 1.0) treat all satellite adjectives (pos 's') as adjectives (pos 'a').
wn does not, so if you look up something with pos 'a' and it doesn't work, then it is worth also looking up 's'. So something like the following should get you what you want.

def offset2synset (wn, offset):
  wnid=  f'omw-en-{offset[3:-1]}-{offset[-1]}'
  try:
    synset = wn.synset(wnid)
  except:
    if offset[-1] == 'a':
       wnid=  f'omw-en-{offset[3:-1]}-s' 
       try:
         synset =  wn.synset(wnid)
       except:
         synset = None
    else:
      synset = None
  return synset
>>> print(offset2synset(ewn, 'wn:00981304a'))
Synset('omw-en-00981304-s')
>>> print(offset2synset(ewn, 'wn:02001858v'))
Synset('omw-en-02001858-v')

from wn.

goodmami avatar goodmami commented on July 30, 2024

@BramVanroy thanks for the good questions (here and on the https://github.com/goodmami/penman project, too 👋). I agree that the documentation could be improved in this area, possibly in the NLTK migration guide.

And thanks, @fcbond, for the good description and solution.

The basic problem is that synset offsets (which are specific to each wordnet version) are not an inherent part of the WN-LMF formatted lexicons that are used by Wn, but for some lexicons (mainly the omw- ones), the WordNet 3.0 offsets are conventionally used in the synset identifiers, so you just need to reformat the identifier appropriately, as @fcbond demonstrated.

Note that I also have an unmerged nltk branch that tries to implement the NLTK's API as a shim on top of Wn, and its of2ss() function is implemented using the same wn.util.synset_id_formatter() function you linked to above:

wn/wn/nltk_api.py

Lines 329 to 342 in 5092e62

_ssid_from_pos_and_offset = _synset_id_formatter(prefix='omw-en')
def of2ss(of: str) -> Synset:
pos = of[-1]
offset = int(of[:8])
ssid = _ssid_from_pos_and_offset(pos=pos, offset=offset)
try:
synset = Synset(_wn30.synset(ssid))
except _wn.Error:
raise _wn.Error(
f'No WordNet synset found for pos={pos} at offset={offset}.'
)
return synset

@fcbond said:

Many people (including omw 1.0) treat all satellite adjectives (pos 's') as adjectives (pos 'a').
wn does not

This is not entirely true. Wn does conflate s and a in the wn.ic, wn.morphy, wn.similarity, and wn.taxonomy modules, but it's true that it does not do so on the standard synset-lookup functions.

from wn.

BramVanroy avatar BramVanroy commented on July 30, 2024

Hello @fcbond and @goodmami

First, thanks for the help! I settled for this:

def offset2omw_synset(wnet: wn.Wordnet, offset: str) -> Optional[wn.Synset]:
    offset = offset.replace("wn:", "")
    offset = "0" * (9-len(offset)) + offset
    wnid = f"omw-en-{offset[:-1]}-{offset[-1]}"
    wnid_s = None

    try:
        return wnet.synset(wnid)
    except wn.Error:
        if wnid[-1] == "a":
            wnid_s = f"omw-en-{wnid[:-2]}-s"
            try:
                return wnet.synset(wnid_s)
            except wn.Error:
                pass

    logging.warning(f"Could not find offset {offset} ({wnid}{' or ' + wnid_s if wnid_s else ''}) in {wnet._lexicons}")

I looked at the NLTK branch @goodmami and while I think that would be very useful, I just needed a quick function that I could easily plug into my code (without having to install from GitHub). But I think it'd be a useful API to have - although I can imagine it is a lot of work!

And thank you for your work. It seems a coincidence that you are providing exactly the tools that I need for my work. I am very thankful and motivated that you created these libraries - and that they work so well and are well-documented! I've also peeked at the internals/API and documentation to inspire my own work, so a big thank you!

from wn.

goodmami avatar goodmami commented on July 30, 2024

Thanks for the kind words, @BramVanroy! And I'm glad you were able to find a solution. I'm going to keep the issue open because, as the issue title states, I think this sort of information would be useful in the documentation, so the issue should be closed when that happens.

from wn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.