aryamanarora / carmls-hi Goto Github PK
View Code? Open in Web Editor NEWHindi SNACS (Semantic Network of Adposition and Case Supersenses; Schneider et al., 2018) annotation scheme and guidelines.
Hindi SNACS (Semantic Network of Adposition and Case Supersenses; Schneider et al., 2018) annotation scheme and guidelines.
Frequency
From The Little Prince:
सितारों से तेरा क्या मतलब?
stars COM you-GEN what meaning?
What do you have to do with stars/What's your deal with stars?
Also note #8 and "मैं इस बात से सहमत हूँ।" "I agree with this matter."
Scene role is definitely Topic. Does this need a new function Topic or do we use Ancillary? This does fall into the general meaning of regard that is one of the uses of comitative से, but with an inanimate object Ancillary as function seems weird. On the other hand, से has a lot of functions as is.
I-DAT/GEN he-COM what take-give?
"What dealings do I have with him?/What do I have to do with him?"
Seems like it is broader than this use case but can't think of anything similar yet.
1_9_5: "छह साल का" Characteristic~Characteristic
Compound verbs formed with a noun and करना appear to regularly use the genitive to mark their Themes. However, कोशिश करना present a strange case in that it behaves more like an infinitival `i (as in "try to verb"). Is Theme sufficient for these cases?
Query for UD: _ >nmod _ <compound _
e.g. in lp_hi_7_1, token 19
# sent_id = lp_hi_7_1
# text = जैसे वह किसी समस्या पर चुपचाप बहुत देर से विचार कर रहा हो , उसने बिना किसी भूमिका के झट से कहा ,
1 जैसे जैसे ADV RB None 11 obl _ _ _ P जैसे p.ComparisonRef p.ComparisonRef _ _ _ _
2 वह वह PRON PRP Case=Nom|Number=Sing|Person=3|PronType=Prs 11 nsubj _ _ _ _ _ _ _ _ _ _ _
3 किसी कोई PRON PRP Case=Acc|Number=Sing|Person=3|PronType=Prs 4 nsubj _ _ _ _ _ _ _ _ _ _ _
4 समस्या समस्या NOUN NN Case=Acc|Gender=Fem|Number=Sing|Person=3 11 obl _ _ _ _ _ _ _ _ _ _ _
5 पर पर ADP PSP AdpType=Post 4 case _ _ _ P पर p.Topic p.Theme _ _ _ _
6 चुपचाप चुपचाप ADV RB None 11 advmod _ _ _ _ _ _ _ _ _ _ _
7 बहुत बहुत DET QF PronType=Ind 8 det _ _ _ _ _ _ _ _ _ _ _
8 देर देर NOUN NN Case=Acc|Gender=Fem|Number=Sing|Person=3 11 obl _ _ _ _ _ _ _ _ _ _ _
9 से से ADP PSP AdpType=Post 8 case _ _ _ P से p.StartTime p.Interval _ _ _ _
10 विचार विचार NOUN NN Case=Nom|Gender=Masc|Number=Sing|Person=3 11 compound _ _ _ _ _ _ _ _ _ _ _
11 कर कर VERB VM Gender=Masc|Voice=Act 15 acl:relcl _ _ _ _ _ _ _ _ _ _ _
12 रहा रह AUX VAUX Aspect=Perf|Gender=Masc|Number=Sing|VerbForm=Part 11 aux _ _ _ _ _ _ _ _ _ _ _
13 हो हो AUX VAUX None 11 aux:pass _ _ _ _ _ _ _ _ _ _ _
14 , COMMA PUNCT SYM None 11 punct _ _ _ _ _ _ _ _ _ _ _
15 उसने वह PRON PRP Case=Acc,Erg|Number=Sing|Person=3|PronType=Prs 22 nsubj _ _ _ P ने p.Originator p.Agent _ _ _ _
16 बिना बिना PART RP None 17 dep _ _ _ P बिना p.Circumstance p.Circumstance _ _ _ _
17 किसी कोई PRON PRP Case=Acc|Number=Sing|Person=3|PronType=Prs 18 dep _ _ _ _ _ _ _ _ _ _ _
18 भूमिका भूमिका NOUN NN Case=Acc|Gender=Fem|Number=Sing|Person=3 20 nmod _ _ _ _ _ _ _ _ _ _ _
19 के का ADP PSP AdpType=Post|Case=Acc|Gender=Masc|Number=Sing 18 case _ _ _ _ _ _ _ _ _ _ _
20 झट झट NOUN NN Case=Acc|Gender=Masc|Number=Sing|Person=3 22 obl _ _ _ _ _ _ _ _ _ _ _
21 से से ADP PSP AdpType=Post 20 case _ _ _ P से p.Manner p.Manner _ _ _ _
22 कहा कह VERB VM Aspect=Perf|Gender=Masc|Number=Sing|VerbForm=Part|Voice=Act 0 root _ _ _ _ _ _ _ _ _ _ _
23 , COMMA PUNCT SYM None 22 punct _ _ _ _ _ _ _ _ _ _ _
Beneficiary↝Theme? Just Theme?
SMWE 1 begins with token 4, but it only has one token.
All of these seem to treat the object of वाला as an adjective, i.e. "pink-bricked house", "upstairs room", "hat-wearing man", "drinkable clean water", "alcohol-drinking".
In that sense, Characteristic seems to be a good function for these. However, वाला also tends to be used to pick out one specific item.
So maybe Identity is a better function?
Current treatment: Agent~Instrument
Some of the uses of बिना from the Hindi Dependency Treebank (HDTB).
nmod
uses seem to be PartPortion, but then it's confusing what to do with का/वाला following the nominal. Note that one can do without बिना: चीनी (और दूध) की चाय seems to be PartPortion as it is listing incomplete ingredients. So is बिना even worth annotating here?
बिना नाम और पते वाली चिट्ठी “a letter without name and address”
बिना चीनी की चाय “tea without sugar”
पुलिस ने इस दौरान बिना नंबर के चार वाहनों को भी पकड़ा । “four vehicles without a number”
Manner.
बिना लाग लपेट “without any delay”
बिना वजह नुक़्ताचीनी करना “nitpicking without any reason”
बिना पासपोर्ट पाकिस्तान जाना “going to Pakistan without a passport”
Manner? This is such a strange construction though, I never see the genitive marking an adjunct to a verb like that. So what's the genitive to be labelled? Maybe here, बिना_ _के is a real circumposition, labelled as an MWE, as Manner.
बिना घटना के गुज़ारना “continue without any incident”
बिना कागज के अदालत में काम हो सकेगा “work can be done at the court without paperwork”
बिना मेहनत के पास करना “pass without any hard work”
बिना पासपोर्ट के पाकिस्तान जाना “going to Pakistan without a passport”
बिना पानी के भी बीस साल जीवित रहना “staying alive for twenty years without water”
बिना नोटिस के पैसा लेना “taking money without a notice”
Manner.
बिना समय गँवायें “without wasting any time”
बिना काम किए “without having done work”
बिना झुकाव दिखाए “without yielding”
How to annotate the में? Following #19 perhaps Goal~Locus but reading it as a metaphor? (This is what I've done so far.)
All these seem to be stretching the limits of Experiencer↝Recipient. Can you experience the passage of time? lots of work => experience of being busy?
Sentences from https://repositories.lib.utexas.edu/bitstream/handle/2152/41448/Hindi_Ko.pdf?sequence=2 btw
X ABL/INS/COM? ask "to ask X"
Potentially:
These seem to be a temporal use of Extent, but what function fits them?
So far, these have been annotated Stimulus↝Source since they appear to be using the ablative sense of से. Also note that these can be phrased in a way that they create a clear start-end relation:
The problem here is that the Causer function also seems apt for से in some cases. It makes sense that those cases should be more closely grouped with the ones above.
To introduce more confusion, there appears to be overlap with the Explanation function as well:
Or maybe these should all just be Source for function and I'm overthinking!
Scene role is probably ComparisonRef (comparing the matter to my abilities; the matter is within my abilities).
Function is more difficult to ascertain. As a declaration, one would say: [बात] [मेरे बस की] है। A similar alteration is "[नीले रंग का] [घर]"/"[घर] [नीले रंग का] है", which is Characteristic↝Identity. So is this Identity? "The task is of my abilities".
Instead of बात one can also use काम (work/task) and other similar words.
@aryamanarora's annotations are misaligned by a few rows starting row 174 in lp_adjudicated/ch_14.csv
lp_hi_2_5
has two SMWEs with ID 1
The validator expects the supersense columns to be blank in this case.
This would bring this corpus it in line with what other SNACS corpora call this metadatum, and would be practically useful since some SNACS tooling expects a key sent_id
to exist.
Certain cognition verbs such as समझना, मानना take, as arguments, an attribute and a target for that attribute.
Currently I have labelled this as a separate function Topic↝Topic. That doesn't really seem broad to enough to mandate its own function. Some candidates for this:
X GEN what be-FUT? "What will come of X?"
Seems like Topic. Using the prototypical Topic postposition, के बारे, is a little weird here nevertheless. Could be `d?
ने seems to be used for some obviously non-Agent subjects, in the context of certain verbs:
One option is creating a Theme function for these. But purely grammatically, these are definitely not Themes, since ने is so prototypically agentive.
Another option is using Agent, but these are not actually agentive. Nevertheless, Agent is the prototypical function for an ergative-marked subject.
Seems to be Beneficiary↝Ancillary (like targets of emotion from the English guidelines).
I think the best label for it is `d.
From The Little Prince:
सर पर मौत मँडरा रही है
head on death gather-IMPV CONT be-PRS
death is closing upon my head/death is threatening me/I am in danger of death
This is highly idiomatic. The verb मँडराना is most commonly literally used to refer to the gathering of clouds, signalling an impending thunderstorm. The figurative situation here is that death is ominous looming, so some kind of a reverse Circumstance (since the object is the participant not the circumstance)? On the other hand, I could just do Goal↝Locus as the literal reading (endpoint of motion).
Also note things like सर पर नशा चढ़ना "to have intoxication rise on the head=>to be intoxicated", सर पर सवार "mounting my head=>occupying my thoughts".
ऐसे होते हैं ये लोग, उनसे ऐसी ही उम्मीद रखनी चाहिए।
this-like be-IMPV be-PRS 3PL people, 3PL-OBL-ABL this-like EMPH hope keep should
Such people are like this, you should only hope for this much from them.
Uncertain between COM and ABL (leaning more towards ABL as a write this, but thought it was COM initially). Probably the scene role is Stimulus, in which case Source as the function is more normal (#2).
पर (in the table, LOC-on) is marks the proto-Patient for a large class of verbs. I have been using the usual Locus function for these, but it may just need a separate Theme function because of how broad it is.
The first one seems to actually be just Topic↝Topic. The second is Theme (or Theme↝Locus). The third is Stimulus↝Theme (or Stimulus↝Locus). Merging the second too by creating a Theme function may make sense.
मुझे धीरे-धीरे उसकी बातों से सब कुछ पता चला।
I-DAT slowly he-GEN talks ABL all something knowledge go-PFV
I slowly came to know everything from what he said.
Source? Explanation~Source? Causer~Source?
I guess the third one would require a rephrasing where ने is possible (since that can also be Causer), but that doesn't appear to be possible.
चीन हो या अरीजोना, एक ही झलक में पहचान सकता था।
China be or Arizona, one EMPH glance LOC recognize be-abl-IMPV be-PST
Whether it was China or Arizona, I could recognize it at a glance.
Note also एक घुट में "in one gulp" exists.
These could have Manner (कैसे पहचाना?) or Duration (कितना जल्दी पहचाना?) I think. Maybe also Means?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.