Giter Site home page Giter Site logo

instapy-gender-classification's Introduction

This tooling is inactive at the moment, please don't work on or send any data.

Once we have finished more work on InstaPy, we will come back to this.


instapy-gender-classification

In order to be able to add a general gender classification to InstaPy (even if you don't have a Business Account), we are evaluating some machine learning techniques to test if there is a possibility that we can classify profiles by their gender only given their way of writing a bio, the descriptions on their posts and some other features.
Therefore we need a lot of data, this tool gives you an easy way to browse profiles based on tags.
For every profile the tool will ask you whether you know if it is a male, female or not defineble person (like business pages, e.g. Bike shops)

There will be a part about the tested approaches added once we have more data and can get some insights.

Getting Started

1. git clone https://github.com/timgrossmann/instapy-gender-classification.git
2. cd instapy-gender-classification
3. pip install .
or
3. python setup.py install
  1. Download chromedriver for your system from here. And put it in /assets folder (create the folder if not there).

Starting the tool

Please make sure to use python2

Once you've installed the dependencies and put the chromedriver in the assets folder you can simply start the tool by moving there with the command line. (After you've installed everything, you should already be in the right directory)

python classify_profiles.py <list_of_tags>

e.g.

python classify_profiles.py fun good car shoes nature food

How to classify?

There are 3 classes:

  • m (male)
  • f (female)
  • x (third gender)
  • - (none - company, products etc.)

When the script asks you to enter the gender of the profile, choose one of the above mentioned letters.

Note: the letter x is meant to be used for e.g. shops and other business that definitely don't have any gender. I came across a bike shop. This definitely is profile that should be classified with an x.

Contributing your classifications

Your gathered data is important! This is the key outcome of this tool which will help us build a AI model that can predict the gender of a person based on it's profile page.

If you have a gmail account you can run:

python sendData.py

To speed up the process, type your email and password in the file sendData.py

If you do not have a gmail account:
The content of the logs folder represents the by you classified profiles. Please send me an email to [email protected] with all of the json files you have in your logs folder.

Thank you very much for contributing to InstaPy!

instapy-gender-classification's People

Contributors

gabrielecalarota avatar timgrossmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

instapy-gender-classification's Issues

Bundle instapy-chromedriver as a dependency

Currently you need to explicitly add it to PATH if not already done. This was the error for me.
Ideally it should work out of the box like InstaPy.

Traceback (most recent call last):
File "classify_profiles.py", line 21, in
browser = webdriver.Chrome('./assets/chromedriver', chrome_options=chrome_options)
File "/Users/ishandutta2007/.pyenv/versions/3.6.0/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in init
self.service.start()
File "/Users/ishandutta2007/.pyenv/versions/3.6.0/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 83, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

error starting instabot

pi@raspberrypi:~/InstaPy $ python3.7 quickstart.py

.. .. .. .. .. .. .. ..
Workspace in use: "/home/pi/InstaPy"

OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
INFO [2019-04-13 22:13:33] ['''''''''''] Session started!
oooooooooooooooooooooooooooooooooooooooooooooooooooooo

Cookie file not found, creating cookie...

INFO [2019-04-13 22:13:39] [''''''''''] Sessional Live Report:
|> No any statistics to show

[Session lasted 11.22 seconds]

OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
INFO [2019-04-13 22:13:39] ['''''''''''] Session ended!
ooooooooooooooooooooooooooooooooooooooooooooooooooooo

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request
six.raise_from(e, None)
File "", line 2, in raise_from
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 380, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.7/http/client.py", line 1321, in getresponse
response.begin()
File "/usr/local/lib/python3.7/http/client.py", line 296, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.7/http/client.py", line 265, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "quickstart.py", line 20, in
with smart_run(session):
File "/usr/local/lib/python3.7/contextlib.py", line 112, in enter
return next(self.gen)
File "/home/pi/.local/lib/python3.7/site-packages/instapy/util.py", line 1683, in smart_run
session.login()
File "/home/pi/.local/lib/python3.7/site-packages/instapy/instapy.py", line 393, in login
self.bypass_with_mobile):
File "/home/pi/.local/lib/python3.7/site-packages/instapy/login_util.py", line 173, in login_user
reload_webpage(browser)
File "/home/pi/.local/lib/python3.7/site-packages/instapy/util.py", line 1710, in reload_webpage
browser.execute_script("location.reload()")
File "/home/pi/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 636, in execute_script
'args': converted_args})['value']
File "/home/pi/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 319, in execute
response = self.command_executor.execute(driver_command, params)
File "/home/pi/.local/lib/python3.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 374, in execute
return self._request(command_info[0], url, body=data)
File "/home/pi/.local/lib/python3.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 397, in _request
resp = self._conn.request(method, url, body=body, headers=headers)
File "/usr/local/lib/python3.7/site-packages/urllib3/request.py", line 72, in request
**urlopen_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/request.py", line 150, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/poolmanager.py", line 323, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 367, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request
six.raise_from(e, None)
File "", line 2, in raise_from
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 380, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.7/http/client.py", line 1321, in getresponse
response.begin()
File "/usr/local/lib/python3.7/http/client.py", line 296, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.7/http/client.py", line 265, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
pi@raspberrypi:~/InstaPy $

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.