Giter Site home page Giter Site logo

gender_detector's Introduction

Gender Detector

<img src=“https://secure.travis-ci.org/bmuller/gender_detector.png?branch=master” alt=“Build Status” />

Gender Detector is a Ruby library that will tell you the most likely gender of a person based on first name. It uses the underlying data from the program “gender” by Jorg Michael (described here).

Installation

Add this line to your application’s Gemfile:

gem 'gender_detector'

And then execute:

$ bundle

Or install it yourself as:

$ gem install gender_detector

Usage

Its use is pretty straightforward:

>> require 'gender_detector'
>> d = GenderDetector.new
>> d.get_gender("Bob")
:male
>> d.get_gender("Sally")
:female
>> d.get_gender("Pauley") # should be androgynous
:andy

The result will be one of andy (androgynous), male, female, mostly_male, or mostly_female. Any unknown names are considered andies.

I18N is supported if either UnicodeUtils or ActiveSupport are present. To get I18n support, add either one to your Gemfile:

gem 'unicode_utils' # or gem 'activesupport'

Afterwards, gender detection will work for names with non-ASCII characters as well:

>> d.get_gender("Álfrún")
:female

Additionally, you can give preference to specific countries:

>> d.get_gender("Jamie")
 => :female
>> d.get_gender("Jamie", :great_britain)
 => :mostly_male

If you have an alterative data file, you can pass that in as an optional filename argument to the GenderDetector. Additionally, you can create a detector that is not case sensitive (default is to be case sensitive):

>> d = GenderDetector.new(:case_sensitive => false)
>> d.get_gender "sally"
 => :female
>> d.get_gender "Sally"
 => :female

Try to avoid creating many GenderDetectors, as each creation means reading in the data file.

Licenses

The gender_detector code is distributed under the GPLv3. The data file nam_dict.txt is released under the GNU Free Documentation License.

gender_detector's People

Contributors

andyatkinson avatar bmuller avatar davy avatar fschwahn avatar halan avatar michaeltomko avatar stephencelis avatar torsten avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gender_detector's Issues

speed

Thanks for creating such a great gem! A few suggestions for ya... In my informal testing I can reduce startup time from 800ms to 170ms with the following parse code. Does this look right to you?

  def parse(fname)
    @names = {}
    open(fname, 'r:iso8859-1:utf-8') do |f|
      # old code used each_line
      f.read.split("\n").each do |line|
        fast_eat_name_line line
      end
    end
  end

  private

  def fast_eat_name_line(line)
    return if line.start_with?('#', '=')

    # old code used split, strip, downcase without bang, etc.
    code = line[0...2]
    name = line[3...29]
    country_values = line[30..-1]
    name.rstrip!
    name.downcase! if !@case_sensitive

    case code
    when 'M ' then set(name, :male, country_values)
    when '1M', '?M' then set(name, :mostly_male, country_values)
    when 'F ' then set(name, :female, country_values)
    when '1F', '?F' then set(name, :mostly_female, country_values)
    when '? ' then set(name, :andy, country_values)
    end
  end

Invalid entry for "Jamie"

While checking out my own first name, I get this:

1.9.3p194 :003 > d.get_gender 'Jamie'
 => :female

Now, it's true that in the United States, "Jamie" is more commonly given to women, however that is not always true. Also, in Commonwealth nations, "Jamie" is almost universally masculine.

Looking into the data file, it has both masculine and feminine versions of my name on separate lines. A quick look at the code shows that it checks for femininity first; if a name can be female, it marks it as such. I didn't check too in-depth on it, but I figure this might be a case for many other names.

Motivation

Hey, what's the motivation for this? It seems somehow counter-productive to the issue of gender discrimination in tech.

Remove dependency on unicode_utils ?

Hi,
I'm currently optimizing the boot time of my rails app, and realized that unicode_utils takes a lot of time to load (half a second on my development machine - on windows it seems worse: lang/unicode_utils#6). What would you think about removing this dependency, and instead using it only if it is defined? Futhermore, ActiveSupport could be used if already present. A note to the README could be added that using UnicodeUtils is preferred, but the other ways are possible as well.

def downcase(name)
  if defined?(UnicodeUtils)
    UnicodeUtils.downcase(name)
  elsif defined?(ActiveSupport::Multibyte::Chars)
    name.mb_chars.downcase.to_s
  else
    name.downcase
  end
end

I'd be willing to put together a PR if you think this is worthwhile.

All the best,
Fabian

Add class method shortcuts

Just an idea, but what if instead of creating a new detector object, you could simply run:

GenderDetector.get_gender('Sally') # => :female
GenderDetector.female?('Sally') # => true
GenderDetector.male?('Sally') # => false
GenderDetector.androgynous?('Sally') # => false

In this case, configuration would be done like so:

GenderDetector.case_sensitive = false

Or, alternatively:

GenderDetector.configure do |config|
  config.case_sensitive = false
end

I’d be open to creating a pull request.

parse error with Jose Maria

I have no idea why, but a handful of names contain whitespace. For example, Jose Maria. I think maybe parsing of those names is busted. My speedy parse code might work better for those.

Anyway, thanks again for the great gem!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.