Giter Site home page Giter Site logo

aryamanarora / carmls-hi Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 3.0 5.44 MB

Hindi SNACS (Semantic Network of Adposition and Case Supersenses; Schneider et al., 2018) annotation scheme and guidelines.

Python 52.30% HTML 17.49% Jupyter Notebook 30.21%
carmls hindi lexical-semantics

carmls-hi's People

Contributors

aryamanarora avatar nitinvwaran avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

carmls-hi's Issues

सितारों से तेरा क्या मतलब?

From The Little Prince:

सितारों से तेरा क्या मतलब?
stars COM you-GEN what meaning?
What do you have to do with stars/What's your deal with stars?

Also note #8 and "मैं इस बात से सहमत हूँ।" "I agree with this matter."

Scene role is definitely Topic. Does this need a new function Topic or do we use Ancillary? This does fall into the general meaning of regard that is one of the uses of comitative से, but with an inanimate object Ancillary as function seems weird. On the other hand, से has a lot of functions as is.

Validator issues

1_9_5: "छह साल का" Characteristic~Characteristic

Compound verbs affecting genitive object [Theme, Topic, Stimulus↝Topic]

  • मैंने उसकी पिटाई की।
    I-ERG he-GEN beating do-PRF
    I beat him up.
  • मैंने भागने की कोशिश की।
    I-ERG run-OBL GEN attempt do-PRF
    I tried to run.

Compound verbs formed with a noun and करना appear to regularly use the genitive to mark their Themes. However, कोशिश करना present a strange case in that it behaves more like an infinitival `i (as in "try to verb"). Is Theme sufficient for these cases?

Query for UD: _ >nmod _ <compound _

Circumpositional बिना ... के needs analysis

e.g. in lp_hi_7_1, token 19

# sent_id = lp_hi_7_1
# text = जैसे वह किसी समस्या पर चुपचाप बहुत देर से विचार कर रहा हो , उसने बिना किसी भूमिका के झट से कहा ,
1	जैसे	जैसे	ADV	RB	None	11	obl	_	_	_	P	जैसे	p.ComparisonRef	p.ComparisonRef	_	_	_	_
2	वह	वह	PRON	PRP	Case=Nom|Number=Sing|Person=3|PronType=Prs	11	nsubj	_	_	_	_	_	_	_	_	_	_	_
3	किसी	कोई	PRON	PRP	Case=Acc|Number=Sing|Person=3|PronType=Prs	4	nsubj	_	_	_	_	_	_	_	_	_	_	_
4	समस्या	समस्या	NOUN	NN	Case=Acc|Gender=Fem|Number=Sing|Person=3	11	obl	_	_	_	_	_	_	_	_	_	_	_
5	पर	पर	ADP	PSP	AdpType=Post	4	case	_	_	_	P	पर	p.Topic	p.Theme	_	_	_	_
6	चुपचाप	चुपचाप	ADV	RB	None	11	advmod	_	_	_	_	_	_	_	_	_	_	_
7	बहुत	बहुत	DET	QF	PronType=Ind	8	det	_	_	_	_	_	_	_	_	_	_	_
8	देर	देर	NOUN	NN	Case=Acc|Gender=Fem|Number=Sing|Person=3	11	obl	_	_	_	_	_	_	_	_	_	_	_
9	से	से	ADP	PSP	AdpType=Post	8	case	_	_	_	P	से	p.StartTime	p.Interval	_	_	_	_
10	विचार	विचार	NOUN	NN	Case=Nom|Gender=Masc|Number=Sing|Person=3	11	compound	_	_	_	_	_	_	_	_	_	_	_
11	कर	कर	VERB	VM	Gender=Masc|Voice=Act	15	acl:relcl	_	_	_	_	_	_	_	_	_	_	_
12	रहा	रह	AUX	VAUX	Aspect=Perf|Gender=Masc|Number=Sing|VerbForm=Part	11	aux	_	_	_	_	_	_	_	_	_	_	_
13	हो	हो	AUX	VAUX	None	11	aux:pass	_	_	_	_	_	_	_	_	_	_	_
14	,	COMMA	PUNCT	SYM	None	11	punct	_	_	_	_	_	_	_	_	_	_	_
15	उसने	वह	PRON	PRP	Case=Acc,Erg|Number=Sing|Person=3|PronType=Prs	22	nsubj	_	_	_	P	ने	p.Originator	p.Agent	_	_	_	_
16	बिना	बिना	PART	RP	None	17	dep	_	_	_	P	बिना	p.Circumstance	p.Circumstance	_	_	_	_
17	किसी	कोई	PRON	PRP	Case=Acc|Number=Sing|Person=3|PronType=Prs	18	dep	_	_	_	_	_	_	_	_	_	_	_
18	भूमिका	भूमिका	NOUN	NN	Case=Acc|Gender=Fem|Number=Sing|Person=3	20	nmod	_	_	_	_	_	_	_	_	_	_	_
19	के	का	ADP	PSP	AdpType=Post|Case=Acc|Gender=Masc|Number=Sing	18	case	_	_	_	_	_	_	_	_	_	_	_
20	झट	झट	NOUN	NN	Case=Acc|Gender=Masc|Number=Sing|Person=3	22	obl	_	_	_	_	_	_	_	_	_	_	_
21	से	से	ADP	PSP	AdpType=Post	20	case	_	_	_	P	से	p.Manner	p.Manner	_	_	_	_
22	कहा	कह	VERB	VM	Aspect=Perf|Gender=Masc|Number=Sing|VerbForm=Part|Voice=Act	0	root	_	_	_	_	_	_	_	_	_	_	_
23	,	COMMA	PUNCT	SYM	None	22	punct	_	_	_	_	_	_	_	_	_	_	_

the many many uses of वाला [Characteristic]

  • ग़ुलाबी ईंटों वाला मकान
    pink bricks vālā house
    the house that has pink bricks (Characteristic↝?)
  • नीला वाला
    blue vālā
    the blue one (Characteristic↝?)
  • ऊपर वाला कमरा
    up vālā room
    the upstairs room (Locus↝?)
  • टोपी वाला आदमी
    hat vālā man
    the man wearing a hat (Possession?↝?)
  • पीने वाला साफ़ पानी
    drink-OBL vālā clean water
    clean water that is for drinking (Purpose↝?)
  • शराब पीने वाला
    alcohol drink-OBL vālā
    alcoholic/drunkard (Identity??↝?)

All of these seem to treat the object of वाला as an adjective, i.e. "pink-bricked house", "upstairs room", "hat-wearing man", "drinkable clean water", "alcohol-drinking".

In that sense, Characteristic seems to be a good function for these. However, वाला also tends to be used to pick out one specific item.

  • मुझे सिर्फ़ नीला वाला चाहिए।
    I-DAT only blue vālā wanted
    I only want the blue one.

So maybe Identity is a better function?

बिना

Some of the uses of बिना from the Hindi Dependency Treebank (HDTB).

w/ NOUN and postpositions का/वाला forming an nmod

nmod uses seem to be PartPortion, but then it's confusing what to do with का/वाला following the nominal. Note that one can do without बिना: चीनी (और दूध) की चाय seems to be PartPortion as it is listing incomplete ingredients. So is बिना even worth annotating here?

बिना नाम और पते वाली चिट्ठी “a letter without name and address”
बिना चीनी की चाय “tea without sugar”
पुलिस ने इस दौरान बिना नंबर के चार वाहनों को भी पकड़ा । “four vehicles without a number”

w/ NOUN forming an obl

Manner.

बिना लाग लपेट “without any delay”
बिना वजह नुक़्ताचीनी करना “nitpicking without any reason”
बिना पासपोर्ट पाकिस्तान जाना “going to Pakistan without a passport”

w/ NOUN and postposition के forming an obl

Manner? This is such a strange construction though, I never see the genitive marking an adjunct to a verb like that. So what's the genitive to be labelled? Maybe here, बिना_ _के is a real circumposition, labelled as an MWE, as Manner.

बिना घटना के गुज़ारना “continue without any incident”
बिना कागज के अदालत में काम हो सकेगा “work can be done at the court without paperwork”
बिना मेहनत के पास करना “pass without any hard work”
बिना पासपोर्ट के पाकिस्तान जाना “going to Pakistan without a passport”
बिना पानी के भी बीस साल जीवित रहना “staying alive for twenty years without water”
बिना नोटिस के पैसा लेना “taking money without a notice”

w/ VERB forming advcl

Manner.

बिना समय गँवायें “without wasting any time”
बिना काम किए “without having done work”
बिना झुकाव दिखाए “without yielding”

को as dative subject but not a Recipient/Experiencer [various scene roles↝Recipient]

  • मुझको फ़ुर्सत है।
    I-DAT leisure be-PRS
    "I have free time/I'm free."
  • मुझको काम है।
    I-DAT work be-PRS
    "I have work/I'm busy."
  • मुझको बहुत वक़्त लगा।
    I-DAT much time feel-PRF
    "I took a lot of time."

All these seem to be stretching the limits of Experiencer↝Recipient. Can you experience the passage of time? lots of work => experience of being busy?

Sentences from https://repositories.lib.utexas.edu/bitstream/handle/2152/41448/Hindi_Ko.pdf?sequence=2 btw

Resolve these

  • 3:160 Characteristic~Characteristic
  • 3:218 Gestalt~Gestalt
  • 3:388 Identity~Identity
  • 5:36 Time~Whole
  • 5:439 PartPortion~PartPortion
  • 5:512 Topic~Topic
  • 5:566 Characteristic~Characteristic
  • 5:614 Source~Source
  • 6:41 Time~Whole
  • 6:164 Gestalt~Whole
  • 7:200 Originator~Direction
  • 7:736 Means~Means
  • 8:64 SocialRel~Gestalt
  • 9:188 Goal~Theme
  • 9:192 Originator~Agent
  • 10:796 Source~Source
  • [13:21 remove space]
  • 13:429 Gestalt~Gestalt
  • 14:461 Stimulus~Direction
  • 14:650 Ancillary~Ancillary
  • 15:290 Theme~Theme
  • 17:16 NONSNACS~NONSNACS
  • 19:200 Topic~Topic
  • 21:32 [fix]
  • 21:369 Topic~Topic
  • 21:701 Gestalt~Locus
  • 21:780 Source~Source
  • 21:1030 Theme~Theme
  • 21:1338 Characteristic~Characteristic
  • 21:1382 Gestalt~Gestalt
  • 22:210 Locus~Ancillary
  • 24:291 Topic~Gestalt
  • 25:404 Theme~Theme
  • 25:405 Agent~Recipient
  • 25:428 Theme~Theme
  • 25:429 Agent~Recipient
  • 26:141 Gestalt~Gestalt
  • 26:295 Theme~Theme
  • 26:1681 Beneficiary~Direction

time-related uses of तक and से [StartTime↝Interval etc.]

  • मैं दो घंटों तक बैठा रहा।
    I two hours until sit-PRF CONT
    I kept sitting for two hours.
    Duration↝EndTime?
  • कल तक काम कर।
    tomorrow until work do-IMP
    Keep doing the work until tomorrow.
    EndTime↝EndTime

These seem to be a temporal use of Extent, but what function fits them?

function for Stimulus uses of से [Stimulus↝Source]

  • तुम्हारे बर्ताव से मैं ग़ुस्सा हूँ।
    you-GEN behaviour ABL I angry be-PRS
    I am angry with your behaviour.
  • मैं तुमसे डरता हूँ।
    I you-ABL fear-HAB be-PRS
    I am afraid of you.

So far, these have been annotated Stimulus↝Source since they appear to be using the ablative sense of से. Also note that these can be phrased in a way that they create a clear start-end relation:

  • मुझेExperiencer↝Recipient तुमसेStimulus↝Source डर लगता है।
    I-DAT you-ABL fear feel-HAB be-PRS
    I feel afraid of you.

The problem here is that the Causer function also seems apt for से in some cases. It makes sense that those cases should be more closely grouped with the ones above.

  • मुझे आग से चोट लग गयी।
    I was hurt by the fire.
  • मुझे भालू से डर लगा।
    I felt afraid of the bear.
  • मैं भालू से डर गया।
    I got scared by (/of?) the bear.

To introduce more confusion, there appears to be overlap with the Explanation function as well:

  • मैं ज़्यादा काम करने से आज थक गया।
    I got tired due to working a lot today.

Or maybe these should all just be Source for function and I'm overthinking!

मेरे बस/वश की बात

  • मेरे बस/वश की बात
    I-GEN strength GEN matter
    a matter within my abilities

Scene role is probably ComparisonRef (comparing the matter to my abilities; the matter is within my abilities).

Function is more difficult to ascertain. As a declaration, one would say: [बात] [मेरे बस की] है। A similar alteration is "[नीले रंग का] [घर]"/"[घर] [नीले रंग का] है", which is Characteristic↝Identity. So is this Identity? "The task is of my abilities".

Instead of बात one can also use काम (work/task) and other similar words.

Rename metadata key `id` to `sent_id`

This would bring this corpus it in line with what other SNACS corpora call this metadatum, and would be practically useful since some SNACS tooling expects a key sent_id to exist.

(indirect?) objects of cognition verbs [Topic↝Theme]

Certain cognition verbs such as समझना, मानना take, as arguments, an attribute and a target for that attribute.

  • मैं तुमको बच्चा समझता हूँ।
    I you-DAT child understand-HAB be-PRS
    I think of you as a child.

Currently I have labelled this as a separate function Topic↝Topic. That doesn't really seem broad to enough to mandate its own function. Some candidates for this:

  • Topic↝Theme: The attribute is actually attached to the verb rather than being the direct object. "I (think child) of you."
  • Topic↝Goal: The attribute is being applied to the Goal. However, note that usually को-Goals, which add secondary information, can be dropped without making the sentence ungrammatical. That doesn't seem to be the case here.

non-agentive Agents

ने seems to be used for some obviously non-Agent subjects, in the context of certain verbs:

  • मैंने बहुत दुःख सहा है।
    "I have suffered a lot of grief."
    Experiencer↝??
  • तीन साल में हमने क्या पाया?
    "In three years, what have we earned?"
    Recipient↝??
  • मैंने पिछले साल बहुत भुगता है।
    "I have endured a lot in the past year."
    Experiencer↝??
  • राम ने बहुत मार खायी।
    "Ram got severely beaten up."
    Theme↝??

One option is creating a Theme function for these. But purely grammatically, these are definitely not Themes, since ने is so prototypically agentive.

Another option is using Agent, but these are not actually agentive. Nevertheless, Agent is the prototypical function for an ergative-marked subject.

के साथ as target of emotion/behaviour [Beneficiary↝Ancillary]

  • तू अपने साथ न्याय कर सका
    you self-OBL with justice do be-able-PRF
    "[if] you are able to be just towards yourself" (lpp_10)
  • तुमने उसके साथ कपट किया।
    you-ERG she-OBL with deception do-PRF
    "You deceived her."
  • तुमने मेरे साथ बुरा बर्ताव किया।
    you-ERG I-OBL with bad behaviour do-PRF
    "You behaved badly towards me."

Seems to be Beneficiary↝Ancillary (like targets of emotion from the English guidelines).

सर पर मौत मँडरा रही है [Goal~Locus]

From The Little Prince:

सर पर मौत मँडरा रही है
head on death gather-IMPV CONT be-PRS
death is closing upon my head/death is threatening me/I am in danger of death

This is highly idiomatic. The verb मँडराना is most commonly literally used to refer to the gathering of clouds, signalling an impending thunderstorm. The figurative situation here is that death is ominous looming, so some kind of a reverse Circumstance (since the object is the participant not the circumstance)? On the other hand, I could just do Goal↝Locus as the literal reading (endpoint of motion).

Also note things like सर पर नशा चढ़ना "to have intoxication rise on the head=>to be intoxicated", सर पर सवार "mounting my head=>occupying my thoughts".

X से उम्मीद रखना [Stimulus↝Source]

ऐसे होते हैं ये लोग, उनसे ऐसी ही उम्मीद रखनी चाहिए।
this-like be-IMPV be-PRS 3PL people, 3PL-OBL-ABL this-like EMPH hope keep should
Such people are like this, you should only hope for this much from them.

Uncertain between COM and ABL (leaning more towards ABL as a write this, but thought it was COM initially). Probably the scene role is Stimulus, in which case Source as the function is more normal (#2).

पर needs Theme function?

Screen Shot 2020-06-20 at 3 04 36 PM
Khan (2009)

पर (in the table, LOC-on) is marks the proto-Patient for a large class of verbs. I have been using the usual Locus function for these, but it may just need a separate Theme function because of how broad it is.

  • मैं उसकी बातों पर सोचता रहा।
    I he-GEN talks on think-IPFV CONT
    I kept thinking about what he said.
  • मुझपर हमला हुआ।
    I-on attack be-PRF
    I was attacked. [An attack befell me?]
  • मुझे तुम पर विश्वास है।
    I-DAT you on belief be-PRS
    I believe/trust you.

The first one seems to actually be just Topic↝Topic. The second is Theme (or Theme↝Locus). The third is Stimulus↝Theme (or Stimulus↝Locus). Merging the second too by creating a Theme function may make sense.

X से पता चला etc. [Originator/Source~Source]

मुझे धीरे-धीरे उसकी बातों से सब कुछ पता चला।
I-DAT slowly he-GEN talks ABL all something knowledge go-PFV
I slowly came to know everything from what he said.

Source? Explanation~Source? Causer~Source?

I guess the third one would require a rephrasing where ने is possible (since that can also be Causer), but that doesn't appear to be possible.

झलक में

चीन हो या अरीजोना, एक ही झलक में पहचान सकता था।
China be or Arizona, one EMPH glance LOC recognize be-abl-IMPV be-PST
Whether it was China or Arizona, I could recognize it at a glance.

Note also एक घुट में "in one gulp" exists.

These could have Manner (कैसे पहचाना?) or Duration (कितना जल्दी पहचाना?) I think. Maybe also Means?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.