Giter Site home page Giter Site logo

abdulla20-8 / dataset-for-kurdish-digits-and-isolated-characters-recognition Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 29.37 MB

Kurdish language dialects are used across four main nation-states in the Middle East, and only one dialect, Sorani, has official status in one of these nation-states.. One of the two main dialects of Kurdish, known as Central Kurdish (Sorani), is spoken by an estimated 9 to 10 million people.

dataset digit-dataset kurdish characters-dataset kurdish-dastaset kurdish-handwriting-dataset letters-dataset number-dataset handwriting-recognition recognition

dataset-for-kurdish-digits-and-isolated-characters-recognition's Introduction

Kurdish Central handwritten Dataset

  • link of the dataset here

Author

  • Peshraw Ahmed Abdalla

  • Abdalla Taha Jabar

  • Ali Abdalla Salam

  • Hedi Hamid Hama Amin

Description

Kurdish language dialects are used across four main nation-states in the Middle East, and only one dialect, Sorani, has official status in one of these nation-states. The majority of Kurdish-speaking regions are located in Turkey, Iraq, Iran, and Syria. More than 30 million people speak Kurdish as a whole, according to estimates. One of the two main dialects of Kurdish, known as Central Kurdish (Sorani), is spoken by an estimated 9 to 10 million people. It is mostly written with a 35-character modified Arabic/Persian alphabet and includes characters that have recently been replaced, such as (ك) which is no longer used by the Kurdish language and has been replaced with (ک). This work presents two massive datasets for central Kurdish handwriting digits and isolated characters named K-ZHMARA and K-PIT. The first dataset, named K-ZHMARA dataset, contains 70,000 images of Kurdish digits, 7,000 images for each digit, and a printed A4 paper with a grid of 10 × 10 is used for data collection. Apart from digits, the K-PIT dataset includes 245,000 images of all Kurdish characters, 7,000 images for each character; data was collected via a printed A4 paper with a grid of 12 × 10 for this dataset. Moreover, both datasets include 315,000 images. Then, using Python programming, each piece of paper was scanned, segmented, cropped, resized, binarized, and inverted using edge detection and image processing techniques. Most students from the University of Halabja and the primary and preparatory school in the Halabja governorate volunteered to fill out the forms. Furthermore, these datasets are suitable for Kurdish isolate handwritten optical digit/character recognition. Labeling and organizing: Each image is labeled with an ID number, the number of the folder in each dataset represents a single digit or character. For example, folder number 02 in the K-PIT dataset is the id of the letter, which in this case is Alef (ا), and folder number 03 in the K-ZHMARA dataset is the id of the digit, which in this case is three (٣). Each digit and character were stored in a folder with its ID as the name of that folder, with each folder containing 6000 images of that letter/digit for the training and 1000 images for the testing.

Image

  • Character Exmaple

  • Number Exmaple

dataset-for-kurdish-digits-and-isolated-characters-recognition's People

Contributors

abdulla20-8 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.