instapy / instagram-profilecrawl Goto Github PK

View Code? Open in Web Editor NEW

1.1K 55.0 240.0 13 MB

📝 quickly crawl the information (e.g. followers, tags etc...) of an instagram profile.

License: MIT License

Python 95.69% Shell 2.60% PowerShell 1.71%

instagram crawler python instapy selenium simple information python-script automation

instagram-profilecrawl's Introduction

Instagram-Profilecrawl

Quickly crawl the information (e.g. followers, tags etc...) of an instagram profile. No login required!

Automation Script for crawling information from ones instagram profile.
Like e.g. the number of posts, followers, and the tags of the the posts

Guide to Bot Creation: Learn to Build your own Bots and Automations with the Creators of InstaPy

Getting started

Just do:

git clone https://github.com/timgrossmann/instagram-profilecrawl.git

It uses selenium and requests to get all the information so install them with:

pip install -r requirements.txt

Copy the .env.example to .env

cp .env.example .env

Modify your IG profile inside .env

IG_USERNAME=<Your Instagram Username>
IG_PASSWORD=<Your Instagram Password>

Install the proper chromedriver for your operating system. Once you download it just drag and drop it into instagram-profilecrawl/assets directory.

Use it!

Now you can start using it following this example:

python3.7 crawl_profile.py username1 username2 ... usernameX

Download The Images Posts to your local

python3.7 extract_image.py <colected_profiles_path>

Settings: To limit the amount of posts to be analyzed, change variable limit_amount in settings.py. Default value is 12000.

Optional login

If you want to access more features (such as private accounts which you followed with yours will be accessible) you must enter your username and password in settings.py. Remember, it's optional.

Here are the steps to do so:

Open Settings.py
Search for login_username & login_password
Put your information inside the quotation marks

Second option: just the settings to your script

Settings.login_username = 'my_insta_account'
Settings.login_password = 'my_password_xxx'

Run on Raspberry Pi

To run the crawler on Raspberry Pi with Firefox, follow these steps:

Install Firefox: sudo apt-get install firefox-esr
Get the geckodriver as described here
Install pyvirtualdisplay: sudo pip3 install pyvirtualdisplay
Run the script for RPi: python3 crawl_profile_pi.py username1 username2 ...

Collecting stats:

If you are interested in collecting and logging stats from a crawled profile, use the log_stats.py script after runnig crawl_profile.py (or crawl_profile_pi.py). For example, on Raspberry Pi run:

Run python3 crawl_profile_pi.py username
Run python3 log_stats.py -u username for specific user or python3 log_stats.py for all user

This appends the collected profile info to stats.csv. Can be useful for monitoring the growth of an Instagram account over time. The logged stats are: Time, username, total number of followers, following, posts, likes, and comments. The two commands can simply be triggered using crontab (make sure to trigger log_stats.py several minutes after crawl_profile_pi.py).

Settings:

Path to the save the profile jsons:

Settings.profile_location = os.path.join(BASE_DIR, 'profiles')

Should the profile json file should get a timestamp

Settings.profile_file_with_timestamp = True

Path to the save the commenters:

Settings.profile_commentors_location = os.path.join(BASE_DIR, 'profiles')

Should the commenters file should get a timestamp

Settings.profile_commentors_file_with_timestamp = True

Scrape & save the posts json

Settings.scrape_posts_infos = True

How many (max) post should be scraped

Settings.limit_amount = 12000

Should the comments also be saved in json files

Settings.output_comments = False

Should the mentions in the post image saved in json files

Settings.mentions = True

Should the users who liked the post saved in json files Attention: be aware it would take a lot of time. script just can load 12 like at once. before making a break and load again

Settings.scrape_posts_likers = True

Should the profile followers be scrap Attention: crawler must has be logged in (see above) / crashes sometimes on huge accounts

Settings.scrape_follower = True

Time between post scrolling (increase if you got errors)

Settings.sleep_time_between_post_scroll = 1.5

Time between comment scrolling (increase if you got errors)

Settings.sleep_time_between_comment_loading = 1.5

Output debug messages to Console

Settings.log_output_toconsole = True

Path to the logfile

Settings.log_location = os.path.join(BASE_DIR, 'logs')

Output debug messages to File

Settings.log_output_tofile = True

New logfile for every run

Settings.log_file_per_run = False

The information will be saved in a JSON-File in ./profiles/{username}.json

Example of a files data

{
  "alias": "Tim Gro\u00dfmann",
  "username": "grossertim",
  "num_of_posts": 127,
  "posts": [
    {
      "caption": "It was a good day",
      "location": {
        "location_url": "https://www.instagram.com/explore/locations/345421482541133/caffe-fernet/",
        "location_name": "Caffe Fernet",
        "location_id": "345421482541133",
        "latitude": 1.2839,
        "longitude": 103.85333
      },
      "img": "https://scontent.cdninstagram.com/t51.2885-15/e15/p640x640/16585292_1355568261161749_3055111083476910080_n.jpg?ig_cache_key=MTQ0ODY3MjA3MTQyMDA3Njg4MA%3D%3D.2",
      "date": "2018-04-26T15:07:32.000Z",
      "tags": ["#fun", "#good", "#goodday", "#goodlife", "#happy", "#goodtime", "#funny", ...],
      "likes": 284,
      "comments": {
        "count": 0,
        "list": [],
       },
     },
     {
      "caption": "Wild Rocket Salad with Japanese Sesame Sauce",
      "location": {
        "location_url": "https://www.instagram.com/explore/locations/318744905241462/junior-kuppanna-restaurant-singapore/",
        "location_name": "Junior Kuppanna Restaurant, Singapore",
        "location_id": "318744905241462",
        "latitude": 1.31011,
        "longitude": 103.85672
      },
      "img": "https://scontent.cdninstagram.com/t51.2885-15/e35/16122741_405776919775271_8171424637851271168_n.jpg?ig_cache_key=MTQ0Nzk0Nzg2NDI2ODc5MTYzNw%3D%3D.2",
      "date": "2018-04-26T15:07:32.000Z",
      "tags": ["#vegan", "#veganfood", "#vegansofig", "#veganfoodporn", "#vegansofig", ...],
      "likes": 206,
      "comments": {
        "count": 1,
        "list": [
          {
            "user": "pastaglueck",
            "comment": "nice veganfood"
           },
         ],
       },
     },
     .
     .
     .
     ],
  "prof_img": "https://scontent.cdninstagram.com/t51.2885-19/s320x320/14564896_1313394225351599_6953533639699202048_a.jpg",
  "followers": 1950,
  "following": 310
}

The script also collects usernames of users who commented on the posts and saves it in ./profiles/{username}_commenters.txt file, sorted by comment frequency.

With the help of Wordcloud you could do something like that with your used tags

Have Fun & Feel Free to report any issues

instagram-profilecrawl's People

Contributors

Stargazers

Watchers

Forkers

lkfgroup tojen haykinz timetraveler90 boomshanker victorsc cheepo2109 nanospeck willgatto cfirmo33 alonecuzzo vanova danyalsh kurozone tonoli mushoffa omarrr newyorkdev jayphen javiplav pesouza bernadsatriani kaidmml shantanuj danieleidan knightth0r phazz hanjinda userfine digimix christinecardoso joozz mght sangkwun mrscrog rittamdebnath mayuragg estebancortero mohamedelephant nyon-one giovannipaganini95 david-miron tuanlha sabaoon96 ezrawilliam gabrielmacedoo alantanlc ericpinedo sonitywolf davasu andrewjburnett justdvl mookimooki diegocaldeira iamarif pysky m4z3n alexrossello amahajavon andreyhartung timmoh tranvansang brianzhou13 thomasviennet-zz gregblt rahul15495 khanof89 simasima121 oomree77 lbenassi nicolomantini kyuhwas albererre imansh77 pttnx vuongeric astu9880 kusw3 anubhav-jangra rtpharry acubaniti 0rc0 indexnotfound404 deenadayalans excelenz valentin0h maminh piperer astrobenhart adityabohra007 noke8868 alexperegrina mark1002 flerov yujuanjiang artcmd bhargav4892 anki92 romanwixinger mukira

instagram-profilecrawl's Issues

[Feature Request] - adding an option to only receive the public data of each profile instead of crawling it to the end

It would be great if you add an option so that user can only retrieve Username, # of Posts, # of Followers, # of Following, and Bio. it is very efficient for big crawls when you only need to identify a certain group of people in contrast to dumping their whole profile data.

Unable to located element.

Here's my error. The script ran fine a few times but is now unable to locate this element.

[selenium.webdriver.remote.remote_connection] DEBUG: Finished Request {'sessionId': '202cae480ffc0f2e7f76665155ad5760', 'status': 7, 'value': {'message': 'no such element: Unable to locate element: {"method":"xpath","selector":"//a[contains(@Class, "_1cr2e _epyes")]"}\n (Session info: chrome=64.0.3282.140)\n (Driver info: chromedriver=2.35.528157 (4429ca2590d6988c0745c24c8858745aaaec01ef),platform=Mac OS X 10.12.6 x86_64)'}}

Read user's bio?

Is there a way to extract a user's bio text and write it into a file?

Followers list?

I used your tool, but it didn't output the followers list. I checked the setting file and there was also nothing in there about followers. But in the git description, it is said that followers list is supported ?!

Unable to locate element: class name "_mesn5"

Trying to scrape a public Instagram profile, I got this error:


Extracting information from ???
Traceback (most recent call last):
  File "crawl_profile.py", line 28, in <module>
    information = extract_information(browser, username)
  File "C:\Users\Stefan\git-repos\instagram-profilecrawl\util\extractor.py", lin                                                                                                                e 89, in extract_information
    = get_user_info(browser)
  File "C:\Users\Stefan\git-repos\instagram-profilecrawl\util\extractor.py", lin                                                                                                                e 11, in get_user_info
    container = browser.find_element_by_class_name('_mesn5')
  File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\we                                                                                                                bdriver.py", line 485, in find_element_by_class_name
    return self.find_element(by=By.CLASS_NAME, value=name)
  File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\we                                                                                                                bdriver.py", line 855, in find_element
    'value': value})['value']
  File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\we                                                                                                                bdriver.py", line 308, in execute
    self.error_handler.check_response(response)
  File "C:\Program Files\Python36\lib\site-packages\selenium\webdriver\remote\er                                                                                                                rorhandler.py", line 194, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Una                                                                                                                ble to locate element: {"method":"class name","selector":"_mesn5"}
  (Session info: headless chrome=63.0.3239.132)
  (Driver info: chromedriver=2.35.528161 (5b82f2d2aae0ca24b877009200ced9065a772e                                                                                                                73),platform=Windows NT 10.0.16299 x86_64)

Am I missing something? Why is it using headless chrome? I actually appreciate headless, but I read in another issue that it's apparently not yet supported?
Or is this an issue with Python 3.6?

Cant crawl, it print "-Only few posts" always

User has more than 12 posts but after running the program, only print out -Only few posts

fork of project on nodeJS

Hello, I realized a fork of your project but since I do not know the python, I did it in nodejs and I tried to make improvements. It is still under development, so there are still bugs.
You can see it here : https://github.com/nacimgoura/instagram-profilecrawl

Cant start running

I run using nohup and got this error
/Users/phongyewtong/Desktop/InstaPy-master/chainingExample.py: line 3: syntax error near unexpected token username='test',' /Users/phongyewtong/Desktop/InstaPy-master/chainingExample.py: line 3: InstaPy(username='test', password='test')'

Could not get information from post...

Everything works great. After the script runs it creates the json file and outputs the profile information. However, when it starts crawling individual posts, although I can see the driver scanning the correct post, it fails to grab any information relevant to any post.

To troubleshoot, I added some print statements in the extract_post_info function to see if it was grabbing info. It's able to perform well up to here:

 if len(imgs) >= 2:
    img = imgs[1].get_attribute('src')
    print(img) #added print statement

After that, I tried adding some print statements here:

likes = likes.split(' ')
  
  print("likes is: ", likes) #my addition

  #count the names if there is no number displayed
  if len(likes) > 2:
    likes = len([word for word in likes if word not in ['and', 'like', 'this']])
    print(likes) #my addition
  else:
    likes = likes[0]
    likes = likes.replace(',', '').replace('.', '')
    likes = likes.replace('k', '00')

But nothing prints out. I assume, this must be related to the issue of the function not returning anything and thus leading to the except NoSuchElementException: print('- Could not get information from post: ' + link)

What do you think could be wrong? I don't want to alter the code too much since I'm not particularly familiar with selenium.

Any help would awesome!

Thanks!

selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally

instapy is running perfectly, however I'm having trouble starting profilecrawl. Heres the output after trying to start it :

Traceback (most recent call last): File "crawl_profile.py", line 18, in <module> browser = webdriver.Chrome('./assets/chromedriver', chrome_options=chrome_options) File "/home/jwkoch/.local/lib/python2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 69, in __init__ desired_capabilities=desired_capabilities) File "/home/jwkoch/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 98, in __init__ self.start_session(desired_capabilities, browser_profile) File "/home/jwkoch/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 188, in start_session response = self.execute(Command.NEW_SESSION, parameters) File "/home/jwkoch/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 256, in execute self.error_handler.check_response(response) File "/home/jwkoch/.local/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally (Driver info: chromedriver=2.29.461571 (8a88bbe0775e2a23afda0ceaf2ef7ee74e822cc5),platform=Linux 4.4.0-83-generic x86_64)

You got any hints/ideas?

Instagram Update - Class Name Issue

Hello,

I keep getting an error when I run extractor.py particularly in line 14. I think Instagram updated their class name because this is the error I get after changing the error code in line 301-304. I'm not sure how to find the correct class name. All help is greatly appreciated!

`Message: no such element: Unable to locate element: {"method":"class name","selector":"v9tJq"}'

Error Code I put in:

except Exception as e: print(e) print ("\nError: Couldn't get user profile.\nTerminating") quit()

css is dynamically generated, not hard coded

https://github.com/timgrossmann/instagram-profilecrawl/blob/e657d6c4b443641053c8da4812593fb3abb30596/util/extractor.py#L98

TypeError on save_profile_json()

Traceback (most recent call last):
File "crawl_profile.py", line 33, in
Datasaver.save_profile_json(username,information)
TypeError: unbound method save_profile_json() must be called with Datasaver inst
ance as first argument (got str instance instead)

I didn't do any change to the code so far.

[Feature Request] - help needed: crawl only the N most recent posts

Hello,
I need your assistance.
How can I download the most N recent posts ? (it takes a lot of time to process the entire profile posts in case there are many posts)

Thanks.
Noam

Incorrect followers/following value

The followers/following value is incorrect for IG accounts that have no decimal in their followers/following count.
(e.g. 630k followers becomes 63000 instead of 630000)

followers = infos[1].text.split(' ')[0].replace(',', '').replace('.', '')
followers = int(followers.replace('k', '00').replace('m', '00000'))

This can be corrected with the following lines of code:

followers = str(infos[1].text.split(' ')[0].replace(',', ''))
if followers.find('.') != -1:
  followers = followers.replace('.', '')
  followers = int(followers.replace('k', '00').replace('m', '00000'))
else:
  followers = int(followers.replace('k', '000').replace('m', '000000'))

following = str(infos[2].text.split(' ')[0].replace(',', ''))
if following.find('.') != -1:
  following = following.replace('.', '')
  following = int(following.replace('k', '00').replace('m', '00000'))
else:
  following = int(following.replace('k', '000').replace('m', '000000'))

Run using firefox on RPi 3

I'd like to run the crawler headless on my RPi3, ideally just using Firefox/geckodriver. Installing chrome on RPi is always kind of a mess. Is there a simple workaround? It should be possible with selenium, right?

Minor error: replacing k,m with '000'/'000000'

inside the code block in utils/extractor.py,
k is replaced as '00', but should be replaced as '000'.
similarly: m is replaced as '00000', but should be replaced as '000000'.

  followers = int(followers.replace('k', '00').replace('m', '00000'))
  following = infos[2].text.split(' ')[0].replace(',', '').replace('.', '')
  following = int(following.replace('k', '00'))

Script error - chromedriver

Should add user id in json file.

[Feature Request] - Get a list of the most engaged followers ordered by rank

It would be nice to get some info about who left the likes and not just the number so that it would be easy to understand which followers are more active and interacting more with the posts. Is this even possible without a login? Thanks!

do this project work yet ?

URL IS EMPTY !

'Service' object has no attribute 'process'

Hey guys, its me, again...

I'm trying to run that DigitalOcen Ubuntu... but getting that error when I try to run the script. If someone has any tip, would be helpful.

In the page it says to run Python3.5etc... is that mandatory?

If that's the problem, I'm sorry... but I wanna try it with 2.7 version.

Thanks guys.

.... is not clickable at point (943, 933)

Hi folks,

just wanted to tell about a problem i got and fixed:

i got this error:

Traceback (most recent call last):
  File "./crawl_profile.py", line 30, in <module>
    information = extract_information(browser, username)
  File "/home/pi/instagram/instagram-profilecrawl/util/extractor.py", line 102, in extract_information
    load_button.click()
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 78, in click
    self._execute(Command.CLICK_ELEMENT)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 499, in _execute
    return self._parent.execute(command, params)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 297, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Element <a href="/freddyfrog/?max_id=1594149044930567909" class="_1cr2e _epyes">...</a> is not clickable at point (943, 933). Other element would receive the click: <div class="_8c4cy">...</div>
  (Session info: headless chrome=60.0.3112.113)
  (Driver info: chromedriver=2.29 (8e8216e581c512667203931f81c1a1ead47222e5),platform=Linux 4.9.50-v7+ armv7l)

Solution:
the problem is that the page seems to be not fully loaded, so just go ahead and raise the sleep before this statement in utils/extractor.py:

      :
      load_button = body_elem.find_element_by_xpath\
        ('//a[contains(@class, "_1cr2e _epyes")]')
      body_elem.send_keys(Keys.END)
>>>sleep(3)
      load_button.click()
      :

[Feature Request] Add teh date, hour, timestamp of each post

In extract_post_info, is it possible to add the date and / or timestamp of each post? It could be very useful to analyze properly the account :)

I'm finding instagram developer

Hello I am finding instagram developer for my follower and likes script.
contact skype: hiphopahmet

IndexError: list index out of range - Video post

Hi @timgrossmann

Thanks for your amazing work !!!
For some obvious reason your app stop to work since there are videos posts ..
The error is img = imgs[1].get_attribute('src')
IndexError: list index out of range

Do you have an idea about this issue ?
Regards

Laurent

Cannot find Chrome Binary

Hello, I'm trying to run this with Python 3.5, and chrome driver 2.40, but I'm getting the next error
_Traceback (most recent call last):
File "crawl_profile.py", line 21, in
browser = webdriver.Chrome('./assets/chromedriver', chrome_options=chrome_options)
File "/home/psyco/.local/lib/python3.5/site-packages/selenium/webdriver/chrome/webdriver.py", line 75, in init
desired_capabilities=desired_capabilities)
File "/home/psyco/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 156, in init
self.start_session(capabilities, browser_profile)
File "/home/psyco/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 245, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/home/psyco/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 314, in execute
self.error_handler.check_response(response)
File "/home/psyco/.local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary
(Driver info: chromedriver=2.40.565383 (76257d1ab79276b2d53ee976b2c3e3b9f335cde7),platform=Linux 4.13.0-45-generic x86_64)
_

Does anyone know something I can try to fix it?

Empty Caption

Hello.
Line 50 to 57 in extractor.py file is where, post's caption will have read.
but it seems in new version of instagram HTML file it doesn't work properly,
i fixed this issue, just replace all of codes in try block with this line:
caption = post.find_element_by_class_name('gElp9').find_element_by_tag_name('span').text
Be lucky :-)

Can't work since yesterday

The script can't get information since yesterday. It seems that Instagram has changed the tag&class name on the html page?
Error message：
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"_de9bg"}

absolute import instead of a relative one?

File "crawl_profile.py", line 5, in
from .settings import Settings
SystemError: Parent module '' not loaded, cannot perform relative import

I was running into this issue i googled a bit and found

so i changed line 5 to:

#!/usr/bin/env python3.5
"""Goes through all usernames and collects their information"""
import json
from util.settings import Settings

works for me

Logging In?

Sorry if this is spelled out in the documentation (or InstaPy), but I was wondering if there was a way to login during the session. I'd like to get the likes of a friend's pics, but his profile's private - I could access it if I could figure a way to log in during the Headless session. Thanks!

extractor.py

please help!

File "crawl_profile.py", line 27, in <module>
    information = extract_information(browser, username)
  File "/home/kurozone/instagram-profilecrawl/util/extractor.py", line 127, in extract_information
    img, tags, likes, comments = extract_post_info(browser)
  File "/home/kurozone/instagram-profilecrawl/util/extractor.py", line 80, in extract_post_info
    return img, tags, int(likes), int(len(comments) - 1)
TypeError: object of type 'int' has no len()

fixed issue - comments problem

Sorry not great with github but there was a problem where comments = 0 on line 61 of extractor.py and then later tried to find the len of it which caused:

 return img, tags, int(likes), int(len(comments) - 1)
TypeError: object of type 'int' has no len()

if you just change comments = 0 to comments = [] then works fine.

WebDriverException: DevToolsActivePort file doesn't exis

I've installed chrome on EC2 running Amazon Linux AMI.
When running crawl_profile.py, WebDriverException pop and the script stop.
What's the problem and how to fix it?
Thank you in advance.
Here's the error message:

Traceback (most recent call last):
File "crawl_profile.py", line 21, in
browser = webdriver.Chrome('./assets/chromedriver', chrome_options=chrome_options)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 75, in init
desired_capabilities=desired_capabilities)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 156, in init
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 251, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: DevToolsActivePort file doesn't exist
(Driver info: chromedriver=2.40.565383 (76257d1ab79276b2d53ee976b2c3e3b9f335cde7),platform=Linux 4.1.7-15.23.amzn1.x86_64 x86_64)

Broken after Instagram updated profiles.

The error message is suppressed so I can't post the error beyond:

$python crawl_profile.py john
Waiting 10 sec
Extracting information from john

Error: Couldn't get user profile.
Terminating

TypeError: 'NoneType' object is not iterable

Getting the following error after scrolling the profile and scrapping the first link:

Traceback (most recent call last): File "crawl_profile.py", line 33, in <module> information, user_commented_list = extract_information(browser, username, limit_amount) File "/Users/kevinleahey/Git/instagram-profilecrawl/util/extractor.py", line 225, in extract_information caption, location_url, location_name, location_id, lat, lng, img, tags, likes, comments, date, user_commented_list = extract_post_info(browser) TypeError: 'NoneType' object is not iterable

I looked at the extract_post_info method, but nothing stuck out to me. Any thoughts?

[Feature Request] - adding a function to the script so that it records the progress on each profile

Adding a read/write function to the script so that it updates the profile.json after each page crawl, now it only records if the process has been completed successfully, which has happened for me very rarely, most of time the script crashes for a reason and therefore nothing is saved.

the script doesn't get past posts and shows error

root@ubuntu:/home/aria/instagram-profilecrawl# python3.5 crawl_profile.py sabaasafari
Extracting information from sabaasafari
BEFORE IMG
- Could not get information from post: https://www.instagram.com/p/BWc05WaHn8u/?taken-by=sabaasafari
BEFORE IMG
	- Could not get information from post: https://www.instagram.com/p/BWYS2inHYsN/?taken-by=sabaasafari
BEFORE IMG
- Could not get information from post: https://www.instagram.com/p/BWUkEu5HAEe/?taken-by=sabaasafari
BEFORE IMG
- Could not get information from post: https://www.instagram.com/p/BWTDPuxnG79/?taken-by=sabaasafari

can you test this and see if this happens on ur side as well or not? if yes can you provide a fix?

bug report .

i got this error i must be the most annoying user of your scrpit but i can't figure what i did wrong

Traceback (most recent call last):
File "crawl_profile.py", line 27, in
information = extract_information(browser, username)
File "/Users/Desktop/instagram-profilecrawl/util/extractor.py", line 117, in extract_information
img, tags, likes, comments = extract_post_info(browser)
File "/Users/Desktop/instagram-profilecrawl/util/extractor.py", line 34, in extract_post_info
img = imgs[1].get_attribute('src')
IndexError: list index out of range
thanks for the support

WebDriver Issue

Excuse me, I've changed the webdriver to newest version by chrome.
But still happened the same error.. how to fix it?

Traceback (most recent call last):
File "/Users/edward/instagram-profilecrawl/crawl_profile.py", line 11, in
browser = webdriver.Chrome('./assets/chromedriver')
File "/Library/Python/2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 62, in init
self.service.start()
File "/Library/Python/2.7/site-packages/selenium/webdriver/common/service.py", line 81, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

Comments not exceeding 25?

The number of comments are stuck at 25 for me
Like so:
"likes": 1029020,
"comments": 25

Can anyone help me to fix this?

Couldn't get user profile.

Hello,

now i am running into straight errors.

When i execute this script i get the return:

Couldn't get user profile.

Anyone any idea?

Caption and location information

I have modified my local repo to extract caption, location name, and location url for each post. I would love to contribute if this can be considered as an enhancement.

Terminate

I'm encountering
"bio:
Error: Couldn't get user profile.
Terminating"
How can I solve this?
I will be grateful for any help you can provide.

Scrolling Profile no stop

Good morning people.

I started using Instagram-profilecrawl. But a problem.

It is not to load all the posts.

The message at the prompt is:

...
Scrolling profile 324/380
Scrolling profile 336/380
Scrolling profile 348/380
Scrolling profile 360/380
Scrolling profile 372/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380
Scrolling profile 379/380

I already tried to increase sleep in extractor.py but no work

error code 127, selenium and chromedriver issues on ubuntu 17.04

hi this is what i'm getting (testing on my DO droplet and local desktop ubuntu) both same message:

root@ubuntu:/home/aria/instagram-profilecrawl# python3.5 crawl_profile.py ashishegaran
Traceback (most recent call last):
  File "crawl_profile.py", line 17, in <module>
    browser = webdriver.Chrome('./assets/chromedriver', chrome_options=chrome_options)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/chrome/webdriver.py", line 62, in __init__
    self.service.start()
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/common/service.py", line 96, in start
    self.assert_process_still_running()
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running
    % (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service ./assets/chromedriver unexpectedly exited. Status code was: 127

first it said this

root@ubuntu:/home/aria/instagram-profilecrawl# python3.5 crawl_profile.py ashishegaran
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/common/service.py", line 74, in start
    stdout=self.log_file, stderr=self.log_file)
  File "/usr/lib/python3.5/subprocess.py", line 676, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.5/subprocess.py", line 1282, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: './assets/chromedriver'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "crawl_profile.py", line 17, in <module>
    browser = webdriver.Chrome('./assets/chromedriver', chrome_options=chrome_options)
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/chrome/webdriver.py", line 62, in __init__
    self.service.start()
  File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/common/service.py", line 81, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

which it seemed it needs the forlder "assets" but it's not there, i created that and got to the first error. i donno what is happening or why it is showing this, i installed selenium on pip and pip3 and google chrom is also installed as well as chrome driver x64 linux.

[Feature Request] any plans for headless crawling? PhantomJS or similar?

do you have any plans for adding phantom js integration so that the script is actually usable on production in contrast to now needing a GUI to run?

unknown error: call function result missing 'value'

Traceback (most recent call last):
File "C:/Users/kk703.DESKTOP-J939SLP/PycharmProjects/POM/TestScripts/Login_Test.py", line 7, in
loginPage.login("admin","manager")
File "C:\Users\kk703.DESKTOP-J939SLP\PycharmProjects\POM\PageClasses\LoginPage.py", line 11, in login
self.__username.send_keys (user_name)
File "C:\Users\kk703.DESKTOP-J939SLP\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 479, in send_keys
'value': keys_to_typing(value)})
File "C:\Users\kk703.DESKTOP-J939SLP\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 628, in _execute
return self._parent.execute(command, params)
File "C:\Users\kk703.DESKTOP-J939SLP\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 312, in execute
self.error_handler.check_response(response)
File "C:\Users\kk703.DESKTOP-J939SLP\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: call function result missing 'value'
(Session info: chrome=65.0.3325.181)
(Driver info: chromedriver=2.33.506120 (e3e53437346286c0bc2d2dc9aa4915ba81d9023f),platform=Windows NT 10.0.16299 x86_64)

Process finished with exit code 1

Having problem crawling a complete profile

root@ubuntu:/home/aria/instagram-profilecrawl# ./crawl_profile.py behzadshishegaran
Extracting information from behzadshishegaran
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
BEFORE IMG
Traceback (most recent call last):
  File "./crawl_profile.py", line 27, in <module>
    information = extract_information(browser, username)
  File "/home/aria/instagram-profilecrawl/util/extractor.py", line 128, in extract_information
    img, tags, likes, comments = extract_post_info(browser)
  File "/home/aria/instagram-profilecrawl/util/extractor.py", line 68, in extract_post_info
    while (comments[1].text == 'load more comments'):
IndexError: list index out of range

the script quits without any extra data on what happened. what's the problem here?

We can't use pyvirtualdisplay on Windows

Hi,
In windows we have a problem with Display function (from pyvirtualdisplay import Display)
actually we can't use pyvirtualdisplay on Windows.
It is just a wrapper that calls Xvfb. Xvfb is a headless display server for the X Window System. Windows does not use the X Window System.

How can I use this in windows OS ?