Giter Site home page Giter Site logo

parsi.io's Introduction

parsi.io

Address extractor

Supported marker

  • Address, Email, URL, Phone Number extractor

Example

from parsi_io.modules.address_extractions import AddressExtraction
extractor = AddressExtraction()
extractor.run('آدرس خیابان شهیدبهشتی می‌باشد و برای اطلاعات بیشتر به page.com مراجعه فرمایید')

Output

{
	'address': ['آدرس خیابان شهیدبهشتی'],
	'email': [],
	'url': ['page.com'],
	'number': [],
	'address_span': [0, 21],
	'email_span': [],
	'url_span': [54, 62],
	'number_span': []
}

Cause and Effect extractor

  • Determines whether a sentence is causal

Supported marker

  • Causal marker extarctor

Example

from parsi_io.modules.cause_effect_extractions import CauseEffectExtraction
extractor = CauseEffectExtraction()
extractor.run('چون نمی‌خواستم اون چیزی از ماجرا بفهمه، مجبور به تظاهر شدم.')

Output

{
      "flag": "بله",
      "marker": "چون",
      "marker_span": "[0, 3]"
}

Number extractor

  • Extracts persian numbers both in numeral form or text form or mixed form.

Example

from parsi_io.modules.number_extractor import NumberExtractor
extractor = NumberExtractor()
extractor.run('من در بیست و پنجمین روز فروردین سوار اتوبوس ۱۲ شدم.')

Output

[
  {
    "span": [6,16],
    "phrase": "بیست و پنج",
    "value": 25
  },
  {
    "span": [44,46],
    "phrase": "۱۲",
    "value": 12
  }
]

Quranic extractor

  • Extracts parts (at least two words) of The Holy Quran verses with their surah and verse number from input text

Supported marker

  • The Holy Quran verses extractor

Example

from parsi_io.modules.quranic_extractions import QuranicExtraction
extractor = QuranicExtraction(model = 'exact', precompiled_patterns='prebuilt')
extractor.run('شان نزول آیه ی انما وليكم اللّه ورسوله والّذين امنوا')

Output

{
    'input_span': [15, 52],
    'extracted': 'انما وليكم اللّه ورسوله والّذين امنوا',
    'quran_id': '5##55',
    'verse': 'إِنَّمَا وَلِيُّكُمُ اللَّهُ وَ رَسُولُهُ وَ الَّذِينَ آمَنُوا الَّذِينَ يُقِيمُونَ الصَّلَاةَ وَ يُؤْتُونَ الزَّكَاةَ وَ هُمْ رَاكِعُونَ'
}

TimeDate extractor

  • Extracts Time Date Markers (stable)
  • Extract Values (Unstable)

Supported marker

  • All Time and Date Markers

Example

from parsi_io.modules.time_extractions import TimeExtraction
extractor = TimeExtraction()
extractor.run("ماریا شنبه عصر در ساعت نه و پنجاه نه دقیقه - مورخ 13 می 1999 با نادیا تماس گرفت اما نادیا بعدا در 1100/09/09 قمری به پرسش او پاسخ داد.")

Output

{'markers': {'date': {'[50, 60]': '13 می 1999',
                      '[6, 10]': 'شنبه',
                      '[98, 113]': '1100/09/09 قمری'},
             'datetime': {'[18, 42]': 'ساعت نه و پنجاه نه دقیقه',
                          '[50, 60]': '13 می 1999',
                          '[6, 14]': 'شنبه عصر',
                          '[98, 113]': '1100/09/09 قمری'},
             'time': {'[11, 14]': 'عصر',
                      '[18, 42]': 'ساعت نه و پنجاه نه دقیقه'}},
 'ner': [('ماریا', 'O'),
         ('شنبه', 'B-DAT'),
         ('عصر', 'I-DAT'),
         ('در', 'O'),
         ('ساعت', 'B-DAT'),
         ('نه', 'I-DAT'),
         ('و', 'I-DAT'),
         ('پنجاه', 'I-DAT'),
         ('نه', 'I-DAT'),
         ('دقیقه', 'I-DAT'),
         ('-', 'O'),
         ('مورخ', 'O'),
         ('13', 'B-DAT'),
         ('می', 'I-DAT'),
         ('1999', 'I-DAT'),
         ('با', 'O'),
         ('نادیا', 'O'),
         ('تماس', 'O'),
         ('گرفت', 'O'),
         ('اما', 'O'),
         ('نادیا', 'O'),
         ('بعدا', 'O'),
         ('در', 'O'),
         ('1100/09/09', 'B-DAT'),
         ('قمری', 'I-DAT'),
         ('به', 'O'),
         ('پرسش', 'O'),
         ('او', 'O'),
         ('پاسخ', 'O'),
         ('داد', 'O'),
         ('.', 'O')],
 'spans': {'date': [[6, 10], [50, 60], [98, 113]],
           'datetime': [[6, 14], [18, 42], [50, 60], [98, 113]],
           'time': [[11, 14], [18, 42]]},
 'values': {'date': {'[50, 60]': '13/05/1999',
                     '[6, 10]': 'شنبه',
                     '[98, 113]': '1100/09/09 ه.ق'},
            'time': {'[11, 14]': 'عصر', '[18, 42]': '09:59:00'}}}

Test

Add test cases to parsi_io/test/testcases/[marker_name].json in the following template

Template

[
    {
        "id":test ID,
        "input":input text,
        "outputs":output dictionary
    },
    ...
]

Contributors

Marker Contributors
Address Extraction Amirreza Mozayani, Arya Kosari, Seyyed Mohammadjavad Feyzabadi, Omid Ghahroodi
CauseEffect Extraction Rozhan Ahmadi, Mohammad Azizmalayeri, Mohammadreza Fereiduni, Saeed Hematian, Seyyed Ali Marashian, Maryam Gheysari
Number Extraction Mohammad Ali Sadraei Javaheri, Mohammad Mozafari
Quranic Extraction Seyyed Mohammad Aref Jahanmir, Alireza Sahebi, Ali Safarpoor Dehkordi, Mohammad Mehdi Hemmatyar, Morteza Abolghasemi, Saman Hadian
Time Date Extraction Parstdex Team

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.