shaikhsajid1111 / facebook_page_scraper Goto Github PK

Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV

Home Page: https://pypi.org/project/facebook-page-scraper/

License: MIT License

Python 100.00%

facebook-scraper facebook-page facebook-page-scraper facebook-page-post web-scraping web-scraper facebook csv python selenium scraper facebook-apis social-media fb-scrapper fb facebook-page-post-scraper open-source hacktoberfest

facebook_page_scraper's Introduction

Facebook Page Scraper

No need of API key, No limitation on number of requests. Import the library and Just Do It !

Table of Contents

Table of Contents

Getting Started

Prerequisites

Installation

Installing from source

Installing with PyPI

Usage

How to instantiate?

Parameters for Facebook_scraper()

Scrape in JSON format
JSON Output Format

Scrape in CSV format
Parameters for scrape_to_csv() method

Keys of the output data

Tech

License

Prerequisites

Internet Connection
Python 3.7+
Chrome or Firefox browser installed on your machine

Installation:

Installing from source:

git clone https://github.com/shaikhsajid1111/facebook_page_scraper

Inside project's directory

python3 setup.py install

Installing with pypi

pip3 install facebook-page-scraper

How to use?

#import Facebook_scraper class from facebook_page_scraper
from facebook_page_scraper import Facebook_scraper

#instantiate the Facebook_scraper class

page_or_group_name = "Meta"
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True
# get env password
fb_password = os.getenv('fb_password')
fb_email = os.getenv('fb_email')
# indicates if the Facebook target is a FB group or FB page
isGroup= False
meta_ai = Facebook_scraper(page_or_group_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless, isGroup=isGroup)

Parameters for `Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless)` class


Parameter Name	Parameter Type	Description
page_or_group_name	String	Name of the facebook page or group
posts_count	Integer	Number of posts to scrap, if not passed default is 10
browser	String	Which browser to use, either chrome or firefox. if not passed,default is chrome
proxy(optional)	String	Optional argument, if user wants to set proxy, if proxy requires authentication then the format will be `user:password@IP:PORT`
timeout	Integer	The maximum amount of time the bot should run for. If not passed, the default timeout is set to 10 minutes
headless	Boolean	Whether to run browser in headless mode?. Default is True
isGroup	Boolean	Whether the Facebook target is a group or page. Default is False
username	String	username to log into Facebook when scraping (recommended to use .env)
password	String	password to log into Facebook when scraping (recommended to use .env)

⚠️

Warning: Use Logged-In Scraping at Your Own Risk

⚠️

Using logged-in scraping methods may result in the permanent suspension of your account. Proceed with caution, as violating a platform's terms of service can lead to severe consequences. Exercise discretion and adhere to ethical practices when collecting data through scraping. The library/provider assumes no responsibility for any consequences resulting from the misuse of scraping methods.

Done with instantiation?. Let the scraping begin!

For post's data in JSON format:

#call the scrap_to_json() method

json_data = meta_ai.scrap_to_json()
print(json_data)

Output:

{
  "2024182624425347": {
    "name": "Meta AI",
    "shares": 0,
    "reactions": {
      "likes": 154,
      "loves": 19,
      "wow": 0,
      "cares": 0,
      "sad": 0,
      "angry": 0,
      "haha": 0
    },
    "reaction_count": 173,
    "comments": 2,
    "content": "We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code:  https://ai.facebook.com/…/the-first-high-performance-self-s…",
    "posted_on": "2022-01-20T22:43:35",
    "video": [],
    "image": [
      "https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71"
    ],
    "post_url": "https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARBoSaQ-pAC_ApucZNHZ6R-BI3YUSjH4sXsfdZRQ2zZFOwgWGhjt6dmg0VOcmGCLhSFyXpecOY9g1A94vrzU_T-GtYFagqDkJjHuhoyPW2vnkn7fvfzx-ql7fsBYxL5DgQVSsiC1cPoycdCvHmi6BV5Sc4fKADdgDhdFvVvr-ttzXG1ng2DbLzU-XfSes7SAnrPs-gxjODPKJ7AdqkqkSQJ4HrsLgxMgcLFdCsE6feWL7rXjptVWegMVMthhJNVqO0JHu986XBfKKqB60aBFvyAzTSEwJD6o72GtnyzQ-BcH7JxmLtb2_A&__tn__=-R"
  }, ...

}

Output Structure for JSON format:

{
    "id": {
        "name": string,
        "shares": integer,
        "reactions": {
            "likes": integer,
            "loves": integer,
            "wow": integer,
            "cares": integer,
            "sad": integer,
            "angry": integer,
            "haha": integer
        },
        "reaction_count": integer,
        "comments": integer,
        "content": string,
        "video" : list,
        "image" : list,
        "posted_on": datetime,  //string containing datetime in ISO 8601
        "post_url": string
    }
}

For saving post's data directly to CSV file

#call scrap_to_csv(filename,directory) method


filename = "data_file"  #file name without CSV extension,where data will be saved
directory = "E:\data" #directory where CSV file will be saved
meta_ai.scrap_to_csv(filename, directory)

content of data_file.csv:

id,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,content,posted_on,video,image,post_url
2024182624425347,Meta AI,0,154,19,0,0,0,0,0,173,2,"We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code:  https://ai.facebook.com/…/the-first-high-performance-self-s…",2022-01-20T22:43:35,,https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71,https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARAse4eiZmZQDOZumNZEDR0tQkE5B6g50K6S66JJPccb-KaWJWg6Yz4v19BQFSZRMd04MeBmV24VqvqMB3oyjAwMDJUtpmgkMiITtSP8HOgy8QEx_vFlq1j-UEImZkzeEgSAJYINndnR5aSQn0GUwL54L3x2BsxEqL1lElL7SnHfTVvIFUDyNfAqUWIsXrkI8X5KjoDchUj7aHRga1HB5EE0x60dZcHogUMb1sJDRmKCcx8xisRgk5XzdZKCQDDdEkUqN-Ch9_NYTMtxlchz1KfR0w9wRt8y9l7E7BNhfLrmm4qyxo-ZpA&__tn__=-R
...

Parameters for `scrap_to_csv(filename, directory)` method.


Parameter Name	Parameter Type	Description
filename	String	Name of the CSV file where post's data will be saved
directory	String	Directory where CSV file have to be stored.

Keys of the outputs:


Key	Type	Description

id	String	Post Identifier(integer casted inside string)
name	String	Name of the page
shares	Integer	Share count of post
reactions	Dictionary	Dictionary containing reactions as keys and its count as value. Keys => `["likes","loves","wow","cares","sad","angry","haha"]`
reaction_count	Integer	Total reaction count of post
comments	Integer	Comments count of post
content	String	Content of post as text
video	List	URLs of video present in that post
images	List	List containing URLs of all images present in the post
posted_on	Datetime	Time at which post was posted(in ISO 8601 format)
post_url	String	URL for that post

Tech

This project uses different libraries to work properly.

If you encounter anything unusual please feel free to create issue here

LICENSE

MIT

facebook_page_scraper's People

Contributors

Stargazers

Watchers

Forkers

yashodhank eddielin1123 pynsuphasueb israelccarvalho mmmdbot asnouk ridhwanrazaliwork h1code2 theycallmejeano luizfloripa lcsouzamenezes emirov nlamsocial jaric e-kirkland lliryc mikecarey134 antonykamp prakashjhaaa hannahvangoor morkesiden florinaahmeti sach-12 sebaschb analiticageb luismorenolopera ahcamachod real-xses hackerax1 banadda yellowrosecx peter279k chauvietnam javedharis ibeae devianl2 stephencoduor exbyte112 rihemansour muhammetdemirci pulkit7700 dmdv thebarton sathish-appdev nimra064 ckchianggit chasseuragace yuvbindal lrudolph333 yungang pumbogongles yusef63 moda20 sann05 jinanmh123 peterqtr11 edmond7450 maor121 nananaufal ericcat sshuster

facebook_page_scraper's Issues

Facebook Login page popup, --- facebook_page_scraper.driver_utilities - CRITICAL - No posts were found!

Here is my result running this code:

[WDM] - Current google-chrome version is 108.0.5359
[WDM] - Get LATEST driver version for 108.0.5359

[WDM] - Driver [C:\Users\Zoey.wdm\drivers\chromedriver\win32\108.0.5359.71\chromedriver.exe] found in cache
2023-01-06 22:06:59,049 - facebook_page_scraper.driver_utilities - CRITICAL - No posts were found!

--
I use proxy in the US in the code.

After the first three lines of results, the facebook login page popup in chrome. Then a few seconds of timeout, it shows no posts were found

No results are obtained in new facebook page template

Hello, in pages that have migrated to the new template, it is not possible to recover the posts.
In pages that keep the old template, it works without problem.

Do you plan to support the new facebook template in the future?

No posts were found - with newest version

Hey,

I get this error:
2022-11-07 20:10:22,447 - facebook_page_scraper.driver_utilities - CRITICAL - No posts were found!

running: python3 posts.py

posts.py file (unchanged from readme.md suggestion):
**#import Facebook_scraper class from facebook_page_scraper
from facebook_page_scraper import Facebook_scraper

#instantiate the Facebook_scraper class

page_name = "metaai"
posts_count = 10
browser = "chrome"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)

json_data = meta_ai.scrap_to_json()
print(json_data)
**

Tries master and 4.x branches with the same result. Checked the code - I can't find that css selector on the facebook by myself as well, so will this be working, or did facebook change everything ? Thanks

At Import: AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv2_METHOD'

When trying to import the module, I get the following:

>>> from facebook_page_scraper import Facebook_scraper as fbscrape
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/USER/.local/lib/python3.9/site-packages/facebook_page_scraper/__init__.py", line 1, in <module>
    from .driver_initialization import Initializer
  File "/home/USER/.local/lib/python3.9/site-packages/facebook_page_scraper/driver_initialization.py", line 3, in <module>
    from seleniumwire import webdriver
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/webdriver.py", line 13, in <module>
    from seleniumwire import backend
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/backend.py", line 4, in <module>
    from seleniumwire.server import MitmProxy
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/server.py", line 4, in <module>
    from seleniumwire.handler import InterceptRequestHandler
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/handler.py", line 5, in <module>
    from seleniumwire import har
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/har.py", line 11, in <module>
    from seleniumwire.thirdparty.mitmproxy import connections
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/thirdparty/mitmproxy/connections.py", line 9, in <module>
    from seleniumwire.thirdparty.mitmproxy.net import tls, tcp
  File "/home/USER/.local/lib/python3.9/site-packages/seleniumwire/thirdparty/mitmproxy/net/tls.py", line 43, in <module>
    "SSLv2": (SSL.SSLv2_METHOD, BASIC_OPTIONS),
AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv2_METHOD'

Is this a known issue? Searching the internet, I didnt found any useful solutions...

Thx in advance

Only likes and loves are scraped properly

Hi,

Great tool, congrats! I am using the following code:

from facebook_page_scraper import Facebook_scraper
import os
import stem.process
SOCKS_PORT = 9050
TOR_PATH = os.path.normpath(os.getcwd()+"\\Tor\\tor\\tor.exe")
tor_process = stem.process.launch_tor_with_config(
  config = {
    'SocksPort': str(SOCKS_PORT),
  },
  init_msg_handler = lambda line: print(line) if re.search('Bootstrapped', line) else False,
  tor_cmd = TOR_PATH
)
page_name = "metaai"
posts_count = 10
browser = "firefox"
proxy = "socks5://127.0.0.1:9050"
timeout = 600
headless = False
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)
json_data = meta_ai.scrap_to_json()
print(json_data)

Numbers of likes and loves are correct, shares and other reactions seem to be always zero, as for number of comments I am getting a different (lower) number.

Same without proxy.

I am using the tool from Europe with language set to English(UK), although I am not sure about the correct way to select language without using authentication.

I would appreciate any advice you may have for me.

TypeError: Facebook_scraper.init() takes from 2 to 4 positional arguments but 6 were given

Im not getting any reaction

Hello,
I am having this problem when I bring information from a page, in all cases it happens that the reactions are all zero, Do you have any advice about this issue?

Thanks for all your work!

update number of reactions and comments

hello we all know that Number of reactions and comments is updated everyday so does facebook_page_scraper offer this possibility to update .
and my second question is can we scrap comments as a text with the name of the user who commented?

thanks

Does not scrape reactions

Hi I tried using your scraper,

However it does not seem to accurately scrape the reactions (the emoticons).

It does show up as a key, but the value is just always zero, where on the Facebook it does have the reactions.

scarping posts by date

my question is "is it possible to scrap posts posted on a date given for example on posts posted on 24/04/2023"?

When I run the script I am getting this : error at find_elements method : invalid literal for int() with base 10: ''

TypeError: init() got an unexpected keyword argument 'proxy'

Hi guys, im trying to work with a proxy.
And i get this error:
TypeError: __init__() got an unexpected keyword argument 'proxy'

my setting:

page_name = "FacebookAI"
posts_count = 25
browser = "chrome"
proxy = "proxyIP:9999" #if proxy requires authentication then user:password@IP:PORT
facebook_ai = Facebook_scraper(page_name,posts_count,browser,proxy=proxy)

Script scraps incomplete facebook posts

When I scrape longer messages, the script scrapes only the visible part of the fb post and not what is also under "see more". The end result message is always a message ending with string 'see more'.

Getting Data of META AI

I am changing page name but still I am getting data of META AI. Could you please resolve this issue.

Scrapping multiple page_name

I want to scrap multiple page names, I define a list of page name on a variable

page_names = ["pagename1", "pagename2", "pagename3"]

and then I iterate Facebook_scraper through multiple page_names with this code

`results=[]
posts_count = 2
browser = "firefox"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True

for name in page_names:
meta_ai = Facebook_scraper(name, posts_count, browser,proxy=proxy, timeout=timeout, headless=headless)
json_data = meta_ai.scrap_to_json()
results.append(json_data)
print(results)`

The first iteration was successful, but the next iteration still scrapes the first page_names. Is this because every iteration the cache was not deleted, or is there any approach that I can reach my objective?

README.MD FileNotFound on install

When installing 0.1.8 either through pip3 or setup.py it fails with a FileNotFound on "README.MD"

Workaround - rename README.md to README.MD.

delete first json's element

the result i get after scraping is this;
[{"1766004853797710": {"username": "2M.ma", "shares": 0, "likecount": 101, "replycount": 0,
i want to remove the status wich is in this case 1766004853797710 and put it in this way "id_post":1766004853797710 next to username likecount..

AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv2_METHOD'. Did you mean: 'SSLv23_METHOD'?

Hello,

When I try to import :

from facebook_page_scraper import Facebook_scraper

I get the following error :

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mpl/Desktop/FaceBookPagesScraper/facebook_page_scraper/facebook_page_scraper/__init__.py", line 1, in <module>
    from .driver_initialization import Initializer
  File "/Users/mpl/Desktop/FaceBookPagesScraper/facebook_page_scraper/facebook_page_scraper/driver_initialization.py", line 3, in <module>
    from seleniumwire import webdriver
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/webdriver.py", line 13, in <module>
    from seleniumwire import backend
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/backend.py", line 4, in <module>
    from seleniumwire.server import MitmProxy
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/server.py", line 4, in <module>
    from seleniumwire.handler import InterceptRequestHandler
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/handler.py", line 5, in <module>
    from seleniumwire import har
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/har.py", line 11, in <module>
    from seleniumwire.thirdparty.mitmproxy import connections
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/thirdparty/mitmproxy/connections.py", line 9, in <module>
    from seleniumwire.thirdparty.mitmproxy.net import tls, tcp
  File "/opt/homebrew/lib/python3.10/site-packages/selenium_wire-4.3.1-py3.10.egg/seleniumwire/thirdparty/mitmproxy/net/tls.py", line 43, in <module>
    "SSLv2": (SSL.SSLv2_METHOD, BASIC_OPTIONS),
AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv2_METHOD'. Did you mean: 'SSLv23_METHOD'?

I have tried downgrading to PyOpenSSL==22.0.0.0 didn't resolve the issue

Search posts by keyword

Hello, very good project. I have a question for you. I wanted to change this url https://www.facebook.com/pg/{ }/posts to https://www.facebook.com/search/posts/? q = for the bot to search posts by keywords. But this method failed. How can I make the bot search posts by keyword? to enter a keyword in page_name and he searched for posts in the search and took them. Is this possible?
Thanks

issue with geckodriver when running inside a dockerized python app

I work with a fast api application this worked with me on the local but when tried to "dockerize" the app got this exception

  raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command li

It seems to be an exception related to the path maybe of geckodriver

Allow providing driver's executable_path explicitly without going through .install() method

I have a funny env: WSLv1 Linux on Windows. Python runs inside Linux emulation, while Chrome and chromedriver.exe are running on Windows. I have a symlink /usr/bin/chromedriver pointing to chromedriver.exe. This all works out well, but the automatic driver installer may get confused. So having an option to specify driver's executable_path explicitly to Facebook_scraper instance would be nice! Thanks! (when I patch Initializer manually, all works well!)

Error message when attempting to install-- Please Help a Qualitative Graduate Researcher

Hello,

I am new to coding, especially using Python. For my dissertation, I want to pull Tweets and Facebook page data from various organizations. I had no issues eventually figuring out how to install and run twitter-scraper, but I need help getting the Facebook scraper to install. Every time I run pip install facebook-scraper or pip install git+https://github.com/kevinzg/facebook-scraper.git, I get an error message about either an invalid syntax error or no parent package being present once I attempt to address the code. At one point, I was able to run # !pip install git+https://github.com/kevinzg/facebook-scraper.git, which did not result in an error code but also didn't install anything. This is the code I used to install the Twitter scraper, so I thought it was worth a shot. I am using the latest (free) versions of Python and PyCharm on Mac.

Thanks in advance for any insight!

error at find_elements method : local variable 'status_link' referenced before assignment

I've been running this code for a while, and just today, I've been getting this issue. I've been using this to collect data for research, and I don't understand why it would only start creating issues now. The title of the post is the error I'm receiving. Any help is appreciated.

Implement login

I'm running into some problems that I fast run into a login wall and therefore can't scrape much more.
Is it possible to implement a login function? ie. something like

facebook.scrap_to_json(credentials = {email: email, pass: pass})

About parsing the json file

Hi,
I am testing your nice project.... and after getting the JSON file, i am wondering what is the "Key" of the hole JSON file? the "Key" of the "Values" which are the different collected data, because i want to parse it into a flutter app

e.g: which i would like to mention by "Key" is like the following JSON file, the "Key" here is "items"

{ "items": [ { "id": "p1", "name": "Item 1", "description": "Description 1" }, { "id": "p2", "name": "Item 2", "description": "Description 2" }, { "id": "p3", "name": "Item 3", "description": "Description 3" } ] }

I hope that you've understand my request, and thank you in advance

"posted_on": "Failed to fetch!"

I’m getting this error "posted_on": "Failed to fetch!" on specific posts. Other posts in the same feed are fine. Any advice?

getting reactions_count,all values of reactions dictionary as 0 and Unicode as a output when language is not english

When I change page_name to scrape , but output data no change

the code can scrape only the metaai page

i tried to scrape other pages the output still related with the metaai page can anyone explain why ??

selenium.common.exceptions.ElementClickInterceptedException: Message: Element is not clickable at point ([x],[y]) because another element <div class=> obscures it

I am using Firefox as a Browser.

When trying to connect with the Facebook Page, I sometimes face the Error mentioned in the Issue title.

This IS related to the Cookie Banner. However, searching the Internet, I found the following Link: https://proxyway.com/guides/how-to-scrape-facebook that gives some advice for adding code to the driver_utilities.py

The weird part is, that adding the following code to the py module helps - but only sometimes:

allow_span = driver.find_element(
    By.XPATH, '//div[contains(@aria-label, "Allow")]/../following-sibling::div')
allow_span.click()

Im not sure if someone can reproduce this weird behavior

error at find_elements method : local variable 'status_link' referenced before assignment

I just started getting this error message today and can't find a way around it. I need the data for my research. Any help will be really appreciated.

I ran the code below
#import Facebook_scraper class from facebook_page_scraper
from facebook_page_scraper import Facebook_scraper

#instantiate the Facebook_scraper class

page_name = "metaai"
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout)

Running the library in Linux Debian 8.11

I have a server that build with os Debian 8.11 (jessie). When I try to run the code, i get this error

[WDM] - Driver [/home/cucakrowo/.wdm/drivers/geckodriver/linux64/v0.32.1/geckodriver] found in cache 2023-02-06 18:20:25,350 - facebook_page_scraper.scraper - ERROR - Error at scrap_to_csv : Message: Process unexpectedly closed with status 1 Traceback (most recent call last): File "/home/cucakrowo/.local/lib/python3.7/site-packages/facebook_page_scraper/scraper.py", line 151, in scrap_to_csv data = self.scrap_to_json() # get the data in JSON format from the same class method File "/home/cucakrowo/.local/lib/python3.7/site-packages/facebook_page_scraper/scraper.py", line 80, in scrap_to_json self.__start_driver() File "/home/cucakrowo/.local/lib/python3.7/site-packages/facebook_page_scraper/scraper.py", line 60, in __start_driver self.browser, self.proxy, self.headless, self.browser_profile).init() File "/home/cucakrowo/.local/lib/python3.7/site-packages/facebook_page_scraper/driver_initialization.py", line 90, in init driver = self.set_driver_for_browser(self.browser_name) File "/home/cucakrowo/.local/lib/python3.7/site-packages/facebook_page_scraper/driver_initialization.py", line 83, in set_driver_for_browser return webdriver.Firefox(executable_path=GeckoDriverManager().install(), options=self.set_properties(browser_option)) File "/home/cucakrowo/.local/lib/python3.7/site-packages/seleniumwire/webdriver.py", line 75, in __init__ super().__init__(*args, **kwargs) File "/home/cucakrowo/.local/lib/python3.7/site-packages/selenium/webdriver/firefox/webdriver.py", line 183, in __init__ keep_alive=True) File "/home/cucakrowo/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 268, in __init__ self.start_session(capabilities, browser_profile) File "/home/cucakrowo/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 359, in start_session response = self.execute(Command.NEW_SESSION, parameters) File "/home/cucakrowo/.local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 424, in execute self.error_handler.check_response(response) File "/home/cucakrowo/.local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status 1

Is this because this library is not compatible with my OS, or are there any configuration that I have to set??

is it possible to fixe a period of date using since and until

my question is it possible to scrape post for a period of time for exemple if i want to scrap only post since 1/2/2023 until the date of today or until any other date?

The csv does not reflect the required number of posts.

Hello, first of all thank you for this very useful code :)
I tried to run it requesting 100 posts instead of 10
posts_count =100
and everything runs perfect, but when I open the csv only 5 to 10randomly posts appear, [I have run it several times and the result is random (same posts but different total number each time) but never reaches the 100 required].

Could not get version for Chrome with this command: google-chrome --version || google-chrome-stable --version

Hello, the code gives me the following error: "Could not get version for Chrome with this command: google-chrome --version || google-chrome-stable --version".
Currently I'm trying to run the code on DeepNote.
Can you please help me?

ModuleNotFoundError: No module named 'facebook_page_scraper'

Hola, realice cada uno de los pasos descriptos pero me aparece este error, no se que estaré haciendo mal, alguna sugerencia?
comparto el archivo que ejecuto
from facebook_page_scraper import Facebook_scraper

#instantiate the Facebook_scraper class

page_name = "Turismogsm"
posts_count = 2
browser = "chrome"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)

json_data = meta_ai.scrap_to_json()
print(json_data)

#filename = "extraccionminpei" #file name without CSV extension,where data will be saved
#directory = "C:\Users\cgpal\Desktop\Julieta\IPECD\Web_scraping" #directory where CSV file will be saved
#meta_ai.scrap_to_csv(filename, directory)

SSL_CTX_set_ecdh_auto Error

I'm trying to get the example in the README to work. I first encountered the issue with SSL and downgraded to version 21.0.0. I then ran the example code with the only change to the browser type. I used "chrome" instead of "firefox". I got the error below. I'm on a Mac (OS 10.15.7), Intel hardware, running Python 3.11 with the following packages installed:

async-generator 1.10
attrs 22.1.0
beautifulsoup4 4.11.1
blinker 1.5
certifi 2022.9.24
cffi 1.15.1
charset-normalizer 2.1.1
colorama 0.4.6
configparser 5.3.0
crayons 0.4.0
cryptography 38.0.3
facebook-page-scraper 4.0.1
google 3.0.0
h11 0.14.0
h2 4.1.0
hpack 4.0.0
html5lib 1.1
hyperframe 6.0.1
idna 3.4
kaitaistruct 0.10
outcome 1.2.0
pip 22.3.1
pyasn1 0.4.8
pycparser 2.21
pyOpenSSL 21.0.0
pyparsing 3.0.9
PySocks 1.7.1
python-dateutil 2.8.2
requests 2.28.1
selenium 4.1.0
selenium-wire 4.3.1
setuptools 65.5.0
six 1.16.0
sniffio 1.3.0
sortedcontainers 2.4.0
soupsieve 2.3.2.post1
termcolor 2.1.0
trio 0.22.0
trio-websocket 0.9.2
urllib3 1.26.12
urllib3-secure-extra 0.1.0
webdriver-manager 3.2.2
webencodings 0.5.1
wsproto 1.2.0

Also, I'm trying to scrape information from the top-level page, specifically the email address. Can this library do that?

[WDM] - Current google-chrome version is 107.0.5304
[WDM] - Get LATEST driver version for 107.0.5304
[WDM] - There is no [mac64] chromedriver for browser 107.0.5304 in cache
[WDM] - Get LATEST driver version for 107.0.5304
[WDM] - Trying to download new driver from http://chromedriver.storage.googleapis.com/107.0.5304.62/chromedriver_mac64.zip
[WDM] - Driver has been saved in cache [/Users/bwright/.wdm/drivers/chromedriver/mac64/107.0.5304.62]
127.0.0.1:64896: Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/server.py", line 113, in handle
root_layer()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/modes/http_proxy.py", line 9, in call
layer()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 285, in call
layer()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http1.py", line 100, in call
layer()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 205, in call
if not self._process_flow(flow):
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 304, in _process_flow
return self.handle_regular_connect(f)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 223, in handle_regular_connect
layer()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 278, in call
self._establish_tls_with_client_and_server()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 358, in _establish_tls_with_client_and_server
self._establish_tls_with_server()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 445, in _establish_tls_with_server
self.server_conn.establish_tls(
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/connections.py", line 290, in establish_tls
self.convert_to_tls(cert=client_cert, sni=sni, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/net/tcp.py", line 382, in convert_to_tls
context = tls.create_client_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/net/tls.py", line 276, in create_client_context
context = _create_ssl_context(
^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/seleniumwire/thirdparty/mitmproxy/net/tls.py", line 163, in _create_ssl_context
context = SSL.Context(method)
^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/OpenSSL/SSL.py", line 674, in init
res = _lib.SSL_CTX_set_ecdh_auto(context, 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'lib' has no attribute 'SSL_CTX_set_ecdh_auto'

posts_count bigger than 19 results in only 19 scraped posts

Hi,

When I want to scrape the last 100 posts on a Facebook page:

facebook_ai = Facebook_scraper("facebookai",100,"chrome")
json_data = facebook_ai.scrap_to_json()
print(json_data)

Only 19 posts are scraped. I tried with other pages too, the same result.

Any ideas what goes wrong?

selenium.common.exceptions.TimeoutException: Message: Failed to read marionette port

This error appears to be related with Ubuntu 22.04 and the snap version of Firefox.
It might be a solution to uninstall the snap-firefox version and use APT instead. See SeleniumHQ/selenium#10813 (comment) for reference.

Facing some webdriver exceptions

inside facebook_page_scraper i have runned setup.py as per instrcutions.
when iam trying to run the same example iam getting an error:
[WDM] - Driver [/root/.wdm/drivers/geckodriver/linux64/v0.29.0/geckodriver] found in cache
Traceback (most recent call last):
File "face.py", line 8, in
json_data = facebook_ai.scrap_to_json()..

raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status 1

can anyone figure out this error and provide me running example .

No module named 'seleniumwire' error

I am encountering the error shown below when trying to execute the driver_initialization.py file. I have tried installing both selenium and seleniumwire using pip:

!pip install seleniumwire
!pip install selenium

But I am still unable to resolve this error

#import Facebook_scraper class from facebook_page_scraper
----> 2 from facebook_page_scraper import Facebook_scraper
3
4 #instantiate the Facebook_scraper class
5

1 frames

/content/facebook_page_scraper/facebook_page_scraper/driver_initialization.py in
1 #!/usr/bin/env python3
2
----> 3 from seleniumwire import webdriver
4 # to add capabilities for chrome and firefox, import their Options with different aliases
5 from selenium.webdriver.chrome.options import Options as ChromeOptions

ModuleNotFoundError: No module named 'seleniumwire'

Get Comments content

Dear,
There is the possibility to get the content of the comments of the posts? Because in your code you only get the number of comment (I have to change the id of the list to 1 instead of 0 in the method that gets this number) but I would like to know if from post is possible to extract the text and reaction of a comment. You have some advice?

List index out of range

I have followed the github documenation and nothing more to get posts, and I am encountering this error:

File "\facebook_page_scraper-4.0.1-py3.11.egg\facebook_page_scraper\element_finder.py", line 374, in __accept_cookies
button[-1].click()
~~~~~~^^^^
IndexError: list index out of range

With firefox as the browser.

error to resolve!! No module named 'seleniumwire' No module named 'selenium' No module named 'selenium' No module named 'selenium'

No module named 'seleniumwire'
No module named 'selenium'
No module named 'selenium'
No module named 'selenium'

PHP version?

Without going too much into the background, Facebook have disabled the 'Like' button plugin (for 3rd party websites) in Europe except for users who are logged in and have consented to the relevant cookies.

After two years, Facebook has failed to come up with an alternative (such as a simple link showing the number of followers, as Twitter has).

Small businesses need to use social media to keep apace. A simple 'Like on Facebook' button showing the number of followers/likes is all they need. But Facebook has taken that away out of mindless self-interest (probably disgruntlement at the court rulings, and perhaps whilst continuing to collect data illegally). They do however provide a 'brand asset pack' which, in conjunction with your scraper, could be used to recreate the same, with the bonus of not leaking information to Facebook.

However, you've used Python, which is not so convenient to incorporate into a web application, particularly portably as a library. Would it be easy to port the Python code to PHP?

Why I have only one post ?

Hello and thank you for this tool.

This script seems to work good but I receive only one post ?

json_data = meta_ai.scrap_to_json()
print(json_data)

Can you explain me how can I make to receive all the posts from one facebook profile?

Many Thanks

No posts were found!

Hey! Thanks for your script.
But I was trying to run your example and get the 'no posts were found' error.
Is it because of the new layout?
Thanks!

Failure when calling scrap_to_json()

Error:

[WDM] - Current google-chrome version is 112.0.5615
[WDM] - Get LATEST driver version for 112.0.5615
[WDM] - Driver [***] found in cache

DevTools listening on ws://127.0.0.1:56478/devtools/browser/e6c8b2dc-2403-490c-9eeb-36b531efcdea
127.0.0.1:56489: Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\server.py", line 113, in handle
    root_layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\modes\http_proxy.py", line 9, in __call__
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 285, in __call__
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http1.py", line 100, in __call__
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http.py", line 205, in __call__
    if not self._process_flow(flow):
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http.py", line 304, in _process_flow
    return self.handle_regular_connect(f)
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http.py", line 223, in handle_regular_connect
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 278, in __call__
    self._establish_tls_with_client_and_server()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 358, in _establish_tls_with_client_and_server
    self._establish_tls_with_server()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 445, in _establish_tls_with_server
    self.server_conn.establish_tls(
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\connections.py", line 290, in establish_tls
    self.convert_to_tls(cert=client_cert, sni=sni, **kwargs)
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tcp.py", line 382, in convert_to_tls
    context = tls.create_client_context(
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tls.py", line 276, in create_client_context
    context = _create_ssl_context(
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tls.py", line 163, in _create_ssl_context
    context = SSL.Context(method)
  File "C:\Python310\lib\site-packages\OpenSSL\SSL.py", line 674, in __init__
    res = _lib.SSL_CTX_set_ecdh_auto(context, 1)
AttributeError: module 'lib' has no attribute 'SSL_CTX_set_ecdh_auto'

[0422/154849.820:ERROR:ssl_client_socket_impl.cc(992)] handshake failed; returned -1, SSL error code 1, net_error -100
127.0.0.1:56492: Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\server.py", line 113, in handle
    root_layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\modes\http_proxy.py", line 9, in __call__
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 285, in __call__
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http1.py", line 100, in __call__
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http.py", line 205, in __call__
    if not self._process_flow(flow):
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http.py", line 304, in _process_flow
    return self.handle_regular_connect(f)
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\http.py", line 223, in handle_regular_connect
    layer()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 278, in __call__
    self._establish_tls_with_client_and_server()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 358, in _establish_tls_with_client_and_server
    self._establish_tls_with_server()
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\server\protocol\tls.py", line 445, in _establish_tls_with_server
    self.server_conn.establish_tls(
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\connections.py", line 290, in establish_tls
    self.convert_to_tls(cert=client_cert, sni=sni, **kwargs)
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tcp.py", line 382, in convert_to_tls
    context = tls.create_client_context(
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tls.py", line 276, in create_client_context
    context = _create_ssl_context(
  File "C:\Python310\lib\site-packages\seleniumwire\thirdparty\mitmproxy\net\tls.py", line 163, in _create_ssl_context
    context = SSL.Context(method)
  File "C:\Python310\lib\site-packages\OpenSSL\SSL.py", line 674, in __init__
    res = _lib.SSL_CTX_set_ecdh_auto(context, 1)
AttributeError: module 'lib' has no attribute 'SSL_CTX_set_ecdh_auto'

[0422/154849.935:ERROR:ssl_client_socket_impl.cc(992)] handshake failed; returned -1, SSL error code 1, net_error -100
2023-04-22 15:48:50,068 - facebook_page_scraper.scraper - ERROR - Error at scrap_to_csv : Message: unknown error: net::ERR_CONNECTION_CLOSED
  (Session info: headless chrome=112.0.5615.138)
Stacktrace:
Backtrace:
        GetHandleVerifier [0x00B8DCE3+50899]
        (No symbol) [0x00B1E111]
        (No symbol) [0x00A25588]
        (No symbol) [0x00A21D87]
        (No symbol) [0x00A18B45]
        (No symbol) [0x00A19B1A]
        (No symbol) [0x00A18E20]
        (No symbol) [0x00A18275]
        (No symbol) [0x00A1820C]
        (No symbol) [0x00A16F06]
        (No symbol) [0x00A17668]
        (No symbol) [0x00A26D22]
        (No symbol) [0x00A7E631]
        (No symbol) [0x00A6B8FC]
        (No symbol) [0x00A7E01C]
        (No symbol) [0x00A6B6F6]
        (No symbol) [0x00A47708]
        (No symbol) [0x00A4886D]
        GetHandleVerifier [0x00DF3EAE+2566302]
        GetHandleVerifier [0x00E292B1+2784417]
        GetHandleVerifier [0x00E2327C+2759788]
        GetHandleVerifier [0x00C25740+672048]
        (No symbol) [0x00B28872]
        (No symbol) [0x00B241C8]
        (No symbol) [0x00B242AB]
        (No symbol) [0x00B171B7]
        BaseThreadInitThunk [0x75627D49+25]
        RtlInitializeExceptionChain [0x7781B74B+107]
        RtlClearBits [0x7781B6CF+191]
Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\facebook_page_scraper\scraper.py", line 151, in scrap_to_csv
    data = self.scrap_to_json()  # get the data in JSON format from the same class method
  File "C:\Python310\lib\site-packages\facebook_page_scraper\scraper.py", line 83, in scrap_to_json
    self.__driver.get(self.URL)
  File "C:\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 436, in get
    self.execute(Command.GET, {'url': url})
  File "C:\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 424, in execute
    self.error_handler.check_response(response)
  File "C:\Python310\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: net::ERR_CONNECTION_CLOSED
  (Session info: headless chrome=112.0.5615.138)
Stacktrace:
Backtrace:
        GetHandleVerifier [0x00B8DCE3+50899]
        (No symbol) [0x00B1E111]
        (No symbol) [0x00A25588]
        (No symbol) [0x00A21D87]
        (No symbol) [0x00A18B45]
        (No symbol) [0x00A19B1A]
        (No symbol) [0x00A18E20]
        (No symbol) [0x00A18275]
        (No symbol) [0x00A1820C]
        (No symbol) [0x00A16F06]
        (No symbol) [0x00A17668]
        (No symbol) [0x00A26D22]
        (No symbol) [0x00A7E631]
        (No symbol) [0x00A6B8FC]
        (No symbol) [0x00A7E01C]
        (No symbol) [0x00A6B6F6]
        (No symbol) [0x00A47708]
        (No symbol) [0x00A4886D]
        GetHandleVerifier [0x00DF3EAE+2566302]
        GetHandleVerifier [0x00E292B1+2784417]
        GetHandleVerifier [0x00E2327C+2759788]
        GetHandleVerifier [0x00C25740+672048]
        (No symbol) [0x00B28872]
        (No symbol) [0x00B241C8]
        (No symbol) [0x00B242AB]
        (No symbol) [0x00B171B7]
        BaseThreadInitThunk [0x75627D49+25]
        RtlInitializeExceptionChain [0x7781B74B+107]
        RtlClearBits [0x7781B6CF+191]
        ```
        Code:
        ```
        from facebook_page_scraper import Facebook_scraper
page_name = "metaai"
posts_count = 10
browser = "chrome"
proxy = "IP:PORT"
timeout = 10 
headless = True
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)
json_data = meta_ai.scrap_to_json()
print(json_data)

Packages:
pyOpenSSL         21.0.0
selenium              4.1.0
cryptography       38.0.4

setup.py install is deprecated. Use build and pip and other standards-based tools.

when im running python setup.py install it show me the error mentioned in the title

Login In

i need scraper from my account in a facebook How??

selenium.common.exceptions.SessionNotCreatedException - Issues with GeckoDriver

Hi, thanks for the codes! Was trying to test this package out and faced this issue when running the sample codes in the README. I am not very familiar with web scraping but seems to be some issue with GeckoDriver, not too sure on why this error is popping out. Besides that, I would like to also check if this package works with Facebook groups (not pages). Please help!

>>> json_data = facebook_ai.scrap_to_json()
[WDM] - Driver [C:\Users\Jiayi\.wdm\drivers\geckodriver\win64\v0.29.0\geckodriver.exe] found in cache
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Jiayi\Documents\GitHub\facebook_page_scraper\facebook_page_scraper\scraper.py", line 56, in scrap_to_json
    self.__start_driver()
  File "C:\Users\Jiayi\Documents\GitHub\facebook_page_scraper\facebook_page_scraper\scraper.py", line 52, in __start_driver
    self.__driver = Initializer(self.browser).init()
  File "C:\Users\Jiayi\Documents\GitHub\facebook_page_scraper\facebook_page_scraper\driver_initialization.py", line 48, in init
    driver = self.set_driver_for_browser(self.browser_name)
  File "C:\Users\Jiayi\Documents\GitHub\facebook_page_scraper\facebook_page_scraper\driver_initialization.py", line 42, in set_driver_for_browser
    return webdriver.Firefox(executable_path=GeckoDriverManager().install(),options=self.set_properties(browser_option))
  File "C:\Users\Jiayi\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\firefox\webdriver.py", line 170, in __init__
    RemoteWebDriver.__init__(
  File "C:\Users\Jiayi\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "C:\Users\Jiayi\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "C:\Users\Jiayi\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\Jiayi\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response       
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line

shaikhsajid1111 / facebook_page_scraper Goto Github PK

facebook_page_scraper's Introduction

Facebook Page Scraper

Table of Contents

Prerequisites

Installation:

Installing from source:

Inside project's directory

How to use?

Parameters for Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless) class

Done with instantiation?. Let the scraping begin!

For post's data in JSON format:

For saving post's data directly to CSV file

Parameters for scrap_to_csv(filename, directory) method.

Keys of the outputs:

Tech

LICENSE

facebook_page_scraper's People

Contributors

Stargazers

Watchers

Forkers

facebook_page_scraper's Issues

But I am still unable to resolve this error

Recommend Projects

Recommend Topics

Recommend Org

Parameters for `Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless)` class

Parameters for `scrap_to_csv(filename, directory)` method.