Giter Site home page Giter Site logo

makepad-fr / fbjs Goto Github PK

View Code? Open in Web Editor NEW
63.0 8.0 23.0 602 KB

Tooling that automates your Facebook interactions.

Home Page: https://www.npmjs.com/package/@makepad/fbjs

License: GNU General Public License v3.0

JavaScript 2.76% TypeScript 97.24%
puppeteer facebook typescript typescript-library npm-module automation social-media scraper facebook-groups social-media-mining

fbjs's People

Contributors

dependabot[bot] avatar gakowalski avatar idilsaglam avatar kaanyagci avatar svoeth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fbjs's Issues

More detailed documentation

For instance, we do not have much information on our README. Also, we do not have proper documentation. We need to update this to help newcomers.

TimeoutError: waiting for selector "#login_form" failed: timeout 30000ms exceeded

Hello.

I'm excited to try out this tool. I just installed it and ran it but got this error:

(node:47502) UnhandledPromiseRejectionWarning: TimeoutError: waiting for selector "#login_form" failed: timeout 30000ms exceeded.

This is the command I used:

fgps --group-ids ##########

I then ran it with the --headful param and can see the browser open and load the facebook.com but then doesn't fill in the username/password.

Scrapping videos from Facebook

I been searching and I found this browser API: captureStream().
It allows you to capture stream from html video/audio/canvas elements:

var video = document.querySelector('video')
//Capture a video stream in 30 FPS
var stream = video.captureStream(30)

This implementation can be used to scrape Facebook videos simply by recording the captured stream using the MediaRecorder browser api.

Get the post id

The post id is useful for having a sort of hash to detect the changes in a post. For instance, if a post changes by the time, we just scrape that one more time. This post id can be get from the href attribute of the date a element at the bottom of the author's name. Once the post id is got the GropPost class should be updated

Unable to run on my machine

Hi there! I can't run the example script on my local machine.

Steps I took:

  • Run git clone [email protected]:mihailthebuilder/fbjs.git && cd fbjs/example.
  • Run npm install inside the example folder.
  • Run this as per the README:
FACEBOOK_USERNAME="<your_facebook_username>" FACEBOOK_PASSWORD="<your_facebook_password>" FACEBOOK_2FA_CODE="<facebook_2fa_code>" FACEBOOK_GROUP_ID="<ffacebook_group_id>" npm start
  • I get a number of errors above caused by the fact that I don't have the packages installed.
    image
  • Run cd .. && npm install to fix above errors (takes a good few minutes to install)
  • Run this again in example folder:
cd example
FACEBOOK_USERNAME="<your_facebook_username>" FACEBOOK_PASSWORD="<your_facebook_password>" FACEBOOK_2FA_CODE="<facebook_2fa_code>" FACEBOOK_GROUP_ID="<ffacebook_group_id>" npm start

but I get a MODULE_NOT_FOUND error
image

Local environment:

  • OS: Ubuntu 20.04.3 LTS
  • node 16.13.0
  • npm 8.1.0

Is there any chance you could add a Docker image?

Merge cli in this repository

  • We need to merge the fbjs-cli repository content in this repository as a cli folder
  • Update the package.json to compile both library and CLI with a single script
  • Update the package.json to publish the CLI

Fix CI issue

For instance, CircleCI is not working for deployment. It only checks for linter errors. We need to deploy automatically via CircleCI

Refacto existing code

For instance, all our functions are in the FB class. If we keep adding new features this will be very ugly. We need to transform existing group posts related functions in a FacebookGroup class that will contain all Facebook groups related features.

Add new friend

Given the user id, we want to add this user as a friend. If it is not already on our friend list. If the given user already exists in our friend list, do nothing.

UnhandledPromiseRejectionWarning: TimeoutError

fgps --group-ids 610355872400171 --output C:/Users/xx/Documents/ Cookie banner did not appear (node:9572) UnhandledPromiseRejectionWarning: TimeoutError: waiting for XPath "//div[@data-pagelet="Stories"]" failed: timeout 30000ms exceeded at new WaitTask (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\DOMWorld.js:549:28) at DOMWorld._waitForSelectorOrXPath (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\DOMWorld.js:478:22) at DOMWorld.waitForXPath (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\DOMWorld.js:441:17) at Frame.waitForXPath (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\FrameManager.js:642:47) at Frame.<anonymous> (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\helper.js:112:23) at Page.waitForXPath (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\Page.js:1131:29) at facebookLogIn (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\src\index.js:366:14) at processTicksAndRejections (internal/process/task_queues.js:93:5) at async main (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\src\index.js:612:10) (Use node --trace-warnings ...to show where the warning was created) (node:9572) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag--unhandled-rejections=strict(see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1) (node:9572) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code. (node:9572) UnhandledPromiseRejectionWarning: Error: Page crashed! at Page._onTargetCrashed (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\Page.js:213:24) at CDPSession.<anonymous> (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\Page.js:122:56) at CDPSession.emit (events.js:315:20) at CDPSession._onMessage (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\Connection.js:200:12) at Connection._onMessage (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\Connection.js:112:17) at WebSocket.<anonymous> (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\WebSocketTransport.js:44:24) at WebSocket.onMessage (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\ws\lib\event-target.js:120:16) at WebSocket.emit (events.js:315:20) at Receiver.receiverOnMessage (C:\Users\xx\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\ws\lib\websocket.js:789:20) at Receiver.emit (events.js:315:20) (node:9572) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag--unhandled-rejections=strict(see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 2) ERROR: The process with PID 2340 (child process of PID 9572) could not be terminated. Reason: There is no running instance of the task.

Hi I'm experiencing this error after installing 2.4.0.

A solution to scrape Markdown from posts

Note: This solution applies to the desktop version of the Facebook website, just as the other solutions I'm providing to improve this library, you should switch from the mobile version first then I'll start making some pull requests.

Scrapping text from posts on the desktop version is much complicated than the mobile version, since it comes in the form of HTML elements rather than plain text, the key here is finding the right selector for the post body, as for the other elements we need to scrape like images, videos, submission permalink... and other staff, this needs a separate issue and a deeper discussion.

Anyway, using the browser inspector we can see how it looks like under the hood:

image

You'll notice that It's located between two pseudo-elements (::before and ::after), we just need to copy the .innerHTML of the parent element, then converting it to markdown, and there is a very good library for that called turndown, and as you can see from the image below, we MADE IT!

image
image

Another issue is the See More button, you should click it first to allow more text to appear:

image

And that's all, I hope that this information will help <3

Problem

fb groups:get:posts -i 689598851175466 --headfull --output=C:\Users\myname\Desktop\fc.json -c 10
Running in headless mode ? true
Спомени за родната казарма* публична група | Facebook Facebook group's posts scraped: 156 posts found
Error: ENOENT: no such file or directory, open 'C:\Users\apollo\Desktop\fc.jsongroupname* публична група | Facebook.json'
Code: ENOENT

Get user profile by user id

Given a user id, we want to return public information for the user.

  • Check what information is public
  • Create a UserProfile which contains this information
  • Create a profile module which will contain all functions related to the user profile.

Consider using `mbasic.facebook.com`

Facebook is a heavy beast. You guys use puppeteer and wait a random timeout before scrolling down. This seems to be slow AF.

mbasic.facebook.com does not use any js files, it's simply a html page that could be parsed. No need for loading all of the images and videos while scrolling.

  • You can easily find posts with #m_group_stories_container > section > article selector.
  • Next page link can be found with #m_group_stories_container > section + div > a.

Also it would make implementing of #15 much easier.

Only downside I can see, is one additional request per post to get an url of full size image.

Example

Can you please give an example of how this could be used in a script?

Is there usage as module?

I'd love to have RSS feeds generated from private facebook groups. This repo has pretty much everything I need though instead of forking it and maintaining a modified version I'd rather use it as a module.

Implement tests

For instance, there are no tests at all except for manual tests. We need to have tests.

Error when running script with credentials

  1. Everything is installed correctly (version 2.3.2)
  2. Added credentials (fgps init)

Running with this example: fgps --group-ids 196322762283544 --output /path

get this error:

(node:45896) UnhandledPromiseRejectionWarning: TimeoutError: waiting for XPath "//button[@data-cookiebanner="accept_button"]" failed: timeout 30000ms exceeded at new WaitTask (/Users/assafelovic/.nvm/versions/node/v12.9.1/lib/node_modules/facebook-group-posts-scraper/node_modules/puppeteer/lib/DOMWorld.js:549:28) at DOMWorld._waitForSelectorOrXPath (/Users/assafelovic/.nvm/versions/node/v12.9.1/lib/node_modules/facebook-group-posts-scraper/node_modules/puppeteer/lib/DOMWorld.js:478:22) at DOMWorld.waitForXPath (/Users/assafelovic/.nvm/versions/node/v12.9.1/lib/node_modules/facebook-group-posts-scraper/node_modules/puppeteer/lib/DOMWorld.js:441:17) at Frame.waitForXPath (/Users/assafelovic/.nvm/versions/node/v12.9.1/lib/node_modules/facebook-group-posts-scraper/node_modules/puppeteer/lib/FrameManager.js:642:47) at Frame.<anonymous> (/Users/assafelovic/.nvm/versions/node/v12.9.1/lib/node_modules/facebook-group-posts-scraper/node_modules/puppeteer/lib/helper.js:112:23) at Page.waitForXPath (/Users/assafelovic/.nvm/versions/node/v12.9.1/lib/node_modules/facebook-group-posts-scraper/node_modules/puppeteer/lib/Page.js:1131:29) at facebookLogIn (/Users/assafelovic/.nvm/versions/node/v12.9.1/lib/node_modules/facebook-group-posts-scraper/src/index.js:331:14) at processTicksAndRejections (internal/process/task_queues.js:85:5) at async main (/Users/assafelovic/.nvm/versions/node/v12.9.1/lib/node_modules/facebook-group-posts-scraper/src/index.js:608:10) (node:45896) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1) (node:45896) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

CentOS without GUI: cannot open shared object file

On CentOS 8 without GUI:

(node:1136594) UnhandledPromiseRejectionWarning: Error: Failed to launch the browser process!
/home/grzegorz.kowalski/facebook-group-posts-scraper/node_modules/puppeteer/.local-chromium/linux-722234/chrome-linux/chrome: error while loading shared libraries: libX11-xcb.so.1: cannot open shared object file: No such file or directory

To fix it (and a host of simiilar errors) I did:

sudo dnf install libX11-xcb libXcomposite libXcursor libXdamage libXi libXtst libXss cups libXScrnSaver alsa-lib atk at-spi2-atk pango gtk3

TimeoutError and then app hangs

Installed just now with

npm i facebook-group-posts-scraper -g --unsafe-perm

Version:

# fgps --version
2.3.0

(node:35) UnhandledPromiseRejectionWarning: TimeoutError: waiting for selector "#login_form" failed: timeout 30000ms exceeded
at new WaitTask (/usr/local/lib/node_modules/facebook-group-posts-scraper/node_modules/puppeteer/lib/DOMWorld.js:549:28)
at DOMWorld._waitForSelectorOrXPath (/usr/local/lib/node_modules/facebook-group-posts-scraper/node_modules/puppeteer/lib/DOMWorld.js:478:22)
at DOMWorld.waitForSelector (/usr/local/lib/node_modules/facebook-group-posts-scraper/node_modules/puppeteer/lib/DOMWorld.js:432:17)
at Frame.waitForSelector (/usr/local/lib/node_modules/facebook-group-posts-scraper/node_modules/puppeteer/lib/FrameManager.js:627:47)
at Frame. (/usr/local/lib/node_modules/facebook-group-posts-scraper/node_modules/puppeteer/lib/helper.js:112:23)
at Page.waitForSelector (/usr/local/lib/node_modules/facebook-group-posts-scraper/node_modules/puppeteer/lib/Page.js:1122:29)
at facebookLogIn (/usr/local/lib/node_modules/facebook-group-posts-scraper/src/index.js:335:14)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async main (/usr/local/lib/node_modules/facebook-group-posts-scraper/src/index.js:599:10)
(node:35) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:35) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

remove static folder in output

In example folder the static file for the output of the group messages is present.

  • Be sure it's included in .gitignore
  • Delete the file

Ability to login with cookies

I don't quite see how I'm going to pass the 2FA code every time I want to scrape the group. Also one may perceive inputting password as a security threat. Especially if it's stored in plaintext (I have no idea if it is.)

Fix CI

CI is broken for now.

  • Add linter
  • Add automatic npm deployment on each new tag creation
  • #69

Get comments from given post

For instance, we're scraping only messages published in a Facebook Group. We need to get comments on this message and comments on a comment.

  • #52
  • A comment object can be like as follows:
{
id: String,
parent: String,
author: String,
content: String
}

The id the id of the comment (that we got as for post from the date's link href) ,parent will be the id of the parent, id will be the id of the comment, author the name of the author and content the content of the comment.

Update min node and npm version in README

For the README lines you're right. The written version is the version that I'm using on my computer, but if you node v14.17.2 works great, I'll update this. Do you mind share your npm version too?

@kaanyagci Sure, my npm version is 6.14.13, but this doesn't change the fact that it may work on a lower version, technically you should target the min supported version of puppeteer (node v8.x should work fine, check this).
In the other side I couldn't Install the dev dependencies with node v10 (I used to have this before upgrading to v14), so the min supported version for the development machine is higher than the min version for production.

Originally posted by @iMrDJAi in #55 (comment)

Error when using --headful

I want to download all the memes in this group

C:\Users\Abdullah>fgps --group-ids sadnibbahourshitposting --output "C:\Users\Abdullah\Desktop\fb test" --headful
Cookie banner did not appear
node:internal/process/promises:246
          triggerUncaughtException(err, true /* fromPromise */);
          ^

Error: No node found for selector: input#email
    at assert (C:\Users\Abdullah\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\helper.js:283:11)
    at DOMWorld.focus (C:\Users\Abdullah\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\DOMWorld.js:376:5)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async facebookLogIn (C:\Users\Abdullah\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\src\index.js:349:3)
    at async main (C:\Users\Abdullah\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\src\index.js:612:10)
  -- ASYNC --
    at Frame.<anonymous> (C:\Users\Abdullah\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\helper.js:111:15)
    at Page.focus (C:\Users\Abdullah\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\node_modules\puppeteer\lib\Page.js:1071:29)
    at facebookLogIn (C:\Users\Abdullah\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\src\index.js:349:14)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async main (C:\Users\Abdullah\AppData\Roaming\npm\node_modules\facebook-group-posts-scraper\src\index.js:612:10)

Node.js v17.2.0

I'm using headful mode because if I don't put the --headful flag, I don't see anything on my output folder, so the program is essentially doing nothing.

Comment on a post

We want to be able to post a comment a given post. We can do that with a function void comment(String content) in the Post class.

Scraping post date

Hi all,

great lib works very well, can we add a way to scrape the post date too? I tried to look into the lib inner workings, but cannot figure what is the right selector to extract the date. Else I've wrote some func that will parse out properly the date from string to actual date object.

Using this without authentication?

Should we login to see posts from public groups? No.
I guess you should make an option to skip authentication, this is pointless in this case, unless there are limitations I'm not aware of.
Good job btw, I'm very excited for the NodeJS module support! <3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.