dchrastil / scrapedin Goto Github PK

View Code? Open in Web Editor NEW

973.0 973.0 126.0 2.78 MB

A tool to scrape LinkedIn without API restrictions for data reconnaissance

Python 100.00%

scrapedin's People

Contributors

Stargazers

Watchers

Forkers

p3t3rp4rk3r cyberjunky wikijm rgegriff rkornmeyer turpure cclauss subinacls ayooshagarwal shantanu561993 tjumick sourcingdenis jbarcia null-none testtz bollwarm onisimchukv lin0x stahlz priestd09 jdrew1303 webmarko garrettcadams misterfxguy maxssage swpflow xiaoganghan puppycodes ro9ueadmin hapazores morrolan zaibatsi rvvvt steve-offutt sachinkun21 rootinshell zshell j87 4n6strider jpalanco slooppe sandeeptade ssolidus solertis solidoptionos jgabriellima yohayg yashraj0077 rasty techrec13 subi3wr3x dugler1990 ank-kumar netsec ntgitdude23 rajivraj naconbits jeffleft nlawarp zhaoshiling1017 eund3mqu3p4rric1d4l15anim1fluc7u5 3453-315h 5l1v3r1 fgpalacios r21gh prisnelov reanimat0r vardanyansedrak nosorry natgho ryan1547 daed5 osinter-project conflict239 keyman9848 sandymandy12 kimistein trhacknonimous mjmarkjohn aumint-io acephale4w sean-bailey benjileibo acealchemycyberblaze jrandomsage jesusgavancho torarganaft jackofclubz nanox arman642 gofullthrottle thecatfix res10re emdnaia formertechie orinocoz ebelax dmore scriptidiot shonker

scrapedin's Issues

No JSON returned

The URL "https://www.linkedin.com/voyager/api/search/cluster?count=40&guides=List(v-%%3EPEOPLE,facetGeoRegion-%%3Ear%%3A0)&keywords=%s&origin=FACETED_SEARCH&q=guided&start=0" does not return a json object for me.

So decoding fails.. :(

Here have the example.
error.txt

Not an issue!

Have a look at this. Are you interested in some kind of support? We like your project.

Not able to deploy it in remote servers

Attempt to run it in a remote server like heroku, encountered this error:

(node:9167) UnhandledPromiseRejectionWarning: Error: linkedin: manual check was required, verify if your login is properly working man
ually or report this issue: https://github.com/leonardiwagner/scrapedin/issues
at page.waitFor.then.catch (/var/app/current/node_modules/scrapedin/src/login.js:62:31)
at process._tickCallback (internal/process/next_tick.js:68:7)
(node:9167) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async
function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 2)
(node:9167) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not
handled will terminate the Node.js process with a non-zero exit code.
(node:9167) UnhandledPromiseRejectionWarning: Error: Protocol error (Runtime.callFunctionOn): Target closed.
at Promise (/var/app/current/node_modules/puppeteer/lib/Connection.js:183:56)
at new Promise ()
at CDPSession.send (/var/app/current/node_modules/puppeteer/lib/Connection.js:182:12)
at ExecutionContext.evaluateHandle (/var/app/current/node_modules/puppeteer/lib/ExecutionContext.js:106:44)
at ExecutionContext. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at ElementHandle.$ (/var/app/current/node_modules/puppeteer/lib/JSHandle.js:378:50)
at ElementHandle. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at DOMWorld.$ (/var/app/current/node_modules/puppeteer/lib/DOMWorld.js:114:34)
at process._tickCallback (internal/process/next_tick.js:68:7)
-- ASYNC --
at Frame. (/var/app/current/node_modules/puppeteer/lib/helper.js:108:27)
at Page.$ (/var/app/current/node_modules/puppeteer/lib/Page.js:300:29)
at Page. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at page.waitFor.then.catch (/var/app/current/node_modules/scrapedin/src/login.js:60:16)
at process._tickCallback (internal/process/next_tick.js:68:7)

It is not possible to do manual verification as it is not possible to open a browser in such server and provide the verification. Is there any workaround to this issue?

[Fatal] Could not authenticate to linkedin. Set credentials in your environment variables.

Hello,

I have configured the credentials in the configuration file and in the environment variables:

cat config.py~:

...
linkedin = dict(
    username = '[email protected]',
    password = '#ExamplePwd123!'
)
...

export LI_USERNAME={[email protected]}
export LI_PASSWORD={#ExamplePwd123!}

and

export [email protected]
export LI_PASSWORD=#ExamplePwd123!

I keep getting the same validation error.

Is the correct endpoint yet?

Current setup/configuration method is incompatible with pipenv

Because of all the pinned dependencies this tool requires, I strongly feel that there should be some built-in expectation for users to be able to run this with pipenv. I've built a LinkedIn scraper before, so I can appreciate how tedious the process is, hence why pipenv compatibility is a big help for both the dev & user in this instance.

Is there any plans to do away with the user/password environment variable method and introduce an option to supply these variables at start? It would be safer as well since python's garbage collector would remove the values from memory at the end of its run (ideally).

Has this been considered?

Not able to filter by company

I am trying to filter people by company => facetCurrentCompany.

I would like to sponsor your repo

Hi, as the subject says, can we chat?

tweepy 4.14.0 requires requests 2.27.0

I've installed the latest "requests" and tried to update it, it says it's at the newest version.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tweepy 4.14.0 requires requests<3,>=2.27.0, but you have requests 2.20.0 which is incompatible.

Python 3.6 incompatible?

I encountered a lot of errors while executive, tried to fix them on by one but I cant even install urllib2 on python 3.6? which version of python should I use?

No cookie found

UnboundLocalError: local variable 'mycookies' referenced before assignment

Image files are missing

Hi,

On 2 different cases, I'm not able to catch images of LinkedIn profiles.
Thoses cases are:

Kali (latest Windows 10 Microsoft Store app)
Kali (latest official VMware VM)

There's no error message during process.

Does this still work?

No matching distribution found for futures==3.2.0 (from -r requirements.txt (line 6))

I just tried to run sudo pip install -r requirements.txt and this is the result:

Collecting beautifulsoup4==4.6.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/9e/d4/10f46e5cfac773e22707237bfcd51bbffeaf0a576b0a847ec7ab15bd7ace/beautifulsoup4-4.6.0-py3-none-any.whl (86kB)
    100% |████████████████████████████████| 92kB 1.8MB/s
Collecting certifi==2023.7.22 (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/4c/dd/2234eab22353ffc7d94e8d13177aaa050113286e93e7b40eae01fbf7c3d9/certifi-2023.7.22-py3-none-any.whl (158kB)
    100% |████████████████████████████████| 163kB 3.4MB/s
Requirement already satisfied: chardet==3.0.4 in /usr/lib/python3/dist-packages (from -r requirements.txt (line 3)) (3.0.4)
Collecting cryptography==41.0.6 (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/4d/b4/828991d82d3f1b6f21a0f8cfa54337ed33fdb52135f694130060839cfc33/cryptography-41.0.6.tar.gz (630kB)
    100% |████████████████████████████████| 634kB 2.5MB/s
  Installing build dependencies ... done
Collecting enum34==1.1.6 (from -r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/af/42/cb9355df32c69b553e72a2e28daee25d1611d2c0d9c272aa1d34204205b2/enum34-1.1.6-py3-none-any.whl
Collecting futures==3.2.0 (from -r requirements.txt (line 6))
  Could not find a version that satisfies the requirement futures==3.2.0 (from -r requirements.txt (line 6)) (from versions: 0.2.python3, 0.1, 0.2, 1.0, 2.0, 2.1, 2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.1.5, 2.1.6, 2.2.0, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.1.0, 3.1.1)
No matching distribution found for futures==3.2.0 (from -r requirements.txt (line 6))

So I tried to run python ScrapedIn.py and I receive:

Traceback (most recent call last):
  File "ScrapedIn.py", line 22, in <module>
    from thready import threaded
ModuleNotFoundError: No module named 'thready'

So I try to install it through pip install threaded and it works:

Collecting threaded
  Downloading https://files.pythonhosted.org/packages/13/e4/87977aafea1cb6c1f7064f5bd6eaad0f7fadc30c82b21c0bce695c4455c0/threaded-4.1.0-cp37-cp37m-manylinux1_x86_64.whl (813kB)
    100% |████████████████████████████████| 819kB 1.7MB/s
Installing collected packages: threaded
Successfully installed threaded-4.1.0

I now try again python ScrapedIn.py and I have:

Traceback (most recent call last):
  File "ScrapedIn.py", line 22, in <module>
    from thready import threaded
ModuleNotFoundError: No module named 'thready'

There must be some problem with that library

Images missing

Images are missing from the readme.

Multiple Errors while running python script.

Hi,
I am running Ubuntu 16.04 (KDE) 64-bit machine. After installing xlsxwriter and thready I ran the script and got this error.

    [Info] Obtained new session: error
    Traceback (most recent call last):
    File "./ScrapedIn.py", line 156, in <module>
        get_search()
    File "./ScrapedIn.py", line 41, in get_search
        r = requests.get(url, cookies=cookies, headers=headers)
    File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 72, in get
        return request('get', url, params=params, **kwargs)
    File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request
        return session.request(method=method, url=url, **kwargs)
    File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request
        resp = self.send(prep, **send_kwargs)
    File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 640, in send
        history = [resp for resp in gen] if allow_redirects else []
    File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 140, in resolve_redirects
        raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
    requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
    Exception Exception: Exception('Exception caught in workbook destructor. Explicit close() may be required for workbook.',) in <bound method Workbook.__del__ of <xlsxwriter.workbook.Workbook object at 0x7f7e40cf9cd0>> ignored

Please check it and update with solution.

Thanks.

Traceback Error : TypeError: 'NoneType' object is not subscriptable

Traceback (most recent call last):
File "/home/admin/Tools/ScrapedIn/ScrapedIn.py", line 358, in
companyResults = companyLookup(companyName)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/admin/Tools/ScrapedIn/ScrapedIn.py", line 152, in companyLookup
if c['item']['entityResult']['title']['text']:
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable

Too many redirects

Here's my input and the stacktrace:
inputAndStacktrace.txt

I can't find a way around the "too many redirects" error.

No module named Thready

When I try to launch the python script I get the error:

root@kali:~/ScrapedIn# python ScrapedIn.py
Traceback (most recent call last):
File "ScrapedIn.py", line 20, in
from thready import threaded
ImportError: No module named thready

I've ran pip install thready but even though the requirement is supposedly satisfied, the error persists.