Giter Site home page Giter Site logo

humannameparser.php's Introduction

Name: HumanNameParse.php

Version: 0.2

Date: 6 Sept. 2010

Author: Jason Priem [email protected]

Website: http://jasonpriem.com/human-name-parse

License: http://www.opensource.org/licenses/mit-license.php

Description

Takes human names of arbitrary complexity and various wacky formats like:

  • J. Walter Weatherman
  • de la Cruz, Ana M.
  • James C. ('Jimmy') O'Dell, Jr.

and parses out the:

  • leading initial (Like "J." in "J. Walter Weatherman")
  • first name (or first initial in a name like 'R. Crumb')
  • nicknames (like "Jimmy" in "James C. ('Jimmy') O'Dell, Jr.")
  • middle names
  • last name (including compound ones like "van der Sar' and "Ortega y Gasset"), and
  • suffix (like 'Jr.', 'III')

humannameparser.php's People

Contributors

jasonpriem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

humannameparser.php's Issues

Cases with a single name throws an exception

Hi,

So I tried using your library to parse customer names coming in from an online ordering system.

Some names tend to just be one word, like "Henry". Your library throws an exception and halts execution if I try to get a last name from cases where there is only a single word. Why not just return ""? Or maybe an option for how strict it should be when returning name parts?

Like
level 1, halt for all missing pieces
level 2, just return empty string for missing pieces with no errors

Port to R?

wow, this would be great to have available in R.

Professional titles

Please parse out professional titles like MD, CFA, RN, CFP, PA...

And military suffix

Other common name combinations

Hi,
your test case does not include common writings like
Чайковский В. Н.
Чайковский ВН
which are used without colon in russian language.

Maybe you should explicitely warn that certain asian languages write names without colon in the opposite order to English which is hard to detect. Similarly, the German naming tradition has no strict order so the categories first name and middle name do not apply. In that tradition there exist only given names and an arbitrary nonempty subset of them was marked as calling names (comparible to the first name in USA or Russia) the others are treated like middle names in the USA, while they are freely chosen by the parents.

Test suite

Google contacts has EXCELLENT name parsing for all languages.

https://www.google.com/contacts/#contacts

"API" at: https://clients6.google.com/plusi/v2/ozInternal/contactstoremutate?key=AIzaSyBuUpn1wi2-0JpM3S-tq2csYx0z2_m_pqc&alt=json

To illustrate: it knows that 诸葛亮 is last name 诸葛 and first name 亮, but it also knows that 柏夫人 is last name 柏 first name 夫人. This is done without language hinting, and it even recognizes the difference between Chinese and Japanese names, which could even use the same characters.


Although your library does not support it today, I request to add these and other examples to the test suite. The will fail, but it will demonstrate the scope and limits of this library.

Regex patterns for Suffix and Title need updating

This is good little parser, but trying it with php 7 I found that the Title and Suffix expressions no longer work as expected. The space between them and the name was not being detected in the pattern so names like "Frank Tester" were becoming "ank Tester"

Here's the update to those patterns. I will try to come back later and do a proper pull request, but for now, here's the update needed in Parse.php

         $nicknamesRegex
             = "/ ('|\"|\(\"*'*)(.+?)('|\"|\"*'*\)) /"; // names that starts or end w/ an apostrophe break this
 
-        $titleRegex = "/^($titles)/";
-        $suffixRegex = "/,* *($suffixes)$/";
+        $titleRegex = "/^($titles)\s+/";
+        $suffixRegex = "/\,?\s+($suffixes)$/";
         $lastRegex = "/(?!^)\b([^ ]+ y |$prefixes)*[^ ]+$/";
         $leadingInitRegex = "/^(.\.*)(?= \p{L}{2})/"; // note the lookahead, which isn't returned or replaced
         $firstRegex = "/^[^ ]+/"; //

port to npm/node?

It looks like this may handle middle names better than the only package on npmjs.org that does name parsing.

Port to Java

Hi, after reading about R and Node ports, and needing to use similar library in Java, here's my take on a Java port of HumanNameParser.php

https://github.com/tupilabs/HumanNameParser.java

I tried to keep the design, code, if/else's, and even tests (used testNames.txt too in the tests).

The code is ready to be deployed to Maven Central repository if there is no objection. I've added the copyright, kept the license and tried to let it clear that the code is a port of the original PHP library.

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.