dchrastil / scrapedin Goto Github PK
View Code? Open in Web Editor NEWA tool to scrape LinkedIn without API restrictions for data reconnaissance
A tool to scrape LinkedIn without API restrictions for data reconnaissance
The URL "https://www.linkedin.com/voyager/api/search/cluster?count=40&guides=List(v-%%3EPEOPLE,facetGeoRegion-%%3Ear%%3A0)&keywords=%s&origin=FACETED_SEARCH&q=guided&start=0" does not return a json object for me.
So decoding fails.. :(
Here have the example.
error.txt
Attempt to run it in a remote server like heroku, encountered this error:
(node:9167) UnhandledPromiseRejectionWarning: Error: linkedin: manual check was required, verify if your login is properly working man
ually or report this issue: https://github.com/leonardiwagner/scrapedin/issues
at page.waitFor.then.catch (/var/app/current/node_modules/scrapedin/src/login.js:62:31)
at process._tickCallback (internal/process/next_tick.js:68:7)
(node:9167) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async
function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 2)
(node:9167) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not
handled will terminate the Node.js process with a non-zero exit code.
(node:9167) UnhandledPromiseRejectionWarning: Error: Protocol error (Runtime.callFunctionOn): Target closed.
at Promise (/var/app/current/node_modules/puppeteer/lib/Connection.js:183:56)
at new Promise ()
at CDPSession.send (/var/app/current/node_modules/puppeteer/lib/Connection.js:182:12)
at ExecutionContext.evaluateHandle (/var/app/current/node_modules/puppeteer/lib/ExecutionContext.js:106:44)
at ExecutionContext. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at ElementHandle.$ (/var/app/current/node_modules/puppeteer/lib/JSHandle.js:378:50)
at ElementHandle. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at DOMWorld.$ (/var/app/current/node_modules/puppeteer/lib/DOMWorld.js:114:34)
at process._tickCallback (internal/process/next_tick.js:68:7)
-- ASYNC --
at Frame. (/var/app/current/node_modules/puppeteer/lib/helper.js:108:27)
at Page.$ (/var/app/current/node_modules/puppeteer/lib/Page.js:300:29)
at Page. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at page.waitFor.then.catch (/var/app/current/node_modules/scrapedin/src/login.js:60:16)
at process._tickCallback (internal/process/next_tick.js:68:7)
It is not possible to do manual verification as it is not possible to open a browser in such server and provide the verification. Is there any workaround to this issue?
Hello,
I have configured the credentials in the configuration file and in the environment variables:
cat config.py~:
...
linkedin = dict(
username = '[email protected]',
password = '#ExamplePwd123!'
)
...
export LI_USERNAME={[email protected]}
export LI_PASSWORD={#ExamplePwd123!}
and
export [email protected]
export LI_PASSWORD=#ExamplePwd123!
I keep getting the same validation error.
Is the correct endpoint yet?
Because of all the pinned dependencies this tool requires, I strongly feel that there should be some built-in expectation for users to be able to run this with pipenv. I've built a LinkedIn scraper before, so I can appreciate how tedious the process is, hence why pipenv compatibility is a big help for both the dev & user in this instance.
Is there any plans to do away with the user/password environment variable method and introduce an option to supply these variables at start? It would be safer as well since python's garbage collector would remove the values from memory at the end of its run (ideally).
Has this been considered?
I am trying to filter people by company => facetCurrentCompany.
Hi, as the subject says, can we chat?
I've installed the latest "requests" and tried to update it, it says it's at the newest version.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tweepy 4.14.0 requires requests<3,>=2.27.0, but you have requests 2.20.0 which is incompatible.
I encountered a lot of errors while executive, tried to fix them on by one but I cant even install urllib2 on python 3.6? which version of python should I use?
UnboundLocalError: local variable 'mycookies' referenced before assignment
Hi,
On 2 different cases, I'm not able to catch images of LinkedIn profiles.
Thoses cases are:
There's no error message during process.
I just tried to run sudo pip install -r requirements.txt
and this is the result:
Collecting beautifulsoup4==4.6.0 (from -r requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/9e/d4/10f46e5cfac773e22707237bfcd51bbffeaf0a576b0a847ec7ab15bd7ace/beautifulsoup4-4.6.0-py3-none-any.whl (86kB)
100% |████████████████████████████████| 92kB 1.8MB/s
Collecting certifi==2023.7.22 (from -r requirements.txt (line 2))
Downloading https://files.pythonhosted.org/packages/4c/dd/2234eab22353ffc7d94e8d13177aaa050113286e93e7b40eae01fbf7c3d9/certifi-2023.7.22-py3-none-any.whl (158kB)
100% |████████████████████████████████| 163kB 3.4MB/s
Requirement already satisfied: chardet==3.0.4 in /usr/lib/python3/dist-packages (from -r requirements.txt (line 3)) (3.0.4)
Collecting cryptography==41.0.6 (from -r requirements.txt (line 4))
Downloading https://files.pythonhosted.org/packages/4d/b4/828991d82d3f1b6f21a0f8cfa54337ed33fdb52135f694130060839cfc33/cryptography-41.0.6.tar.gz (630kB)
100% |████████████████████████████████| 634kB 2.5MB/s
Installing build dependencies ... done
Collecting enum34==1.1.6 (from -r requirements.txt (line 5))
Downloading https://files.pythonhosted.org/packages/af/42/cb9355df32c69b553e72a2e28daee25d1611d2c0d9c272aa1d34204205b2/enum34-1.1.6-py3-none-any.whl
Collecting futures==3.2.0 (from -r requirements.txt (line 6))
Could not find a version that satisfies the requirement futures==3.2.0 (from -r requirements.txt (line 6)) (from versions: 0.2.python3, 0.1, 0.2, 1.0, 2.0, 2.1, 2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.1.5, 2.1.6, 2.2.0, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.1.0, 3.1.1)
No matching distribution found for futures==3.2.0 (from -r requirements.txt (line 6))
So I tried to run python ScrapedIn.py
and I receive:
Traceback (most recent call last):
File "ScrapedIn.py", line 22, in <module>
from thready import threaded
ModuleNotFoundError: No module named 'thready'
So I try to install it through pip install threaded
and it works:
Collecting threaded
Downloading https://files.pythonhosted.org/packages/13/e4/87977aafea1cb6c1f7064f5bd6eaad0f7fadc30c82b21c0bce695c4455c0/threaded-4.1.0-cp37-cp37m-manylinux1_x86_64.whl (813kB)
100% |████████████████████████████████| 819kB 1.7MB/s
Installing collected packages: threaded
Successfully installed threaded-4.1.0
I now try again python ScrapedIn.py
and I have:
Traceback (most recent call last):
File "ScrapedIn.py", line 22, in <module>
from thready import threaded
ModuleNotFoundError: No module named 'thready'
There must be some problem with that library
Images are missing from the readme.
Hi,
I am running Ubuntu 16.04 (KDE) 64-bit machine. After installing xlsxwriter
and thready
I ran the script and got this error.
[Info] Obtained new session: error
Traceback (most recent call last):
File "./ScrapedIn.py", line 156, in <module>
get_search()
File "./ScrapedIn.py", line 41, in get_search
r = requests.get(url, cookies=cookies, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 640, in send
history = [resp for resp in gen] if allow_redirects else []
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 140, in resolve_redirects
raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
Exception Exception: Exception('Exception caught in workbook destructor. Explicit close() may be required for workbook.',) in <bound method Workbook.__del__ of <xlsxwriter.workbook.Workbook object at 0x7f7e40cf9cd0>> ignored
Please check it and update with solution.
Thanks.
Traceback (most recent call last):
File "/home/admin/Tools/ScrapedIn/ScrapedIn.py", line 358, in
companyResults = companyLookup(companyName)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/admin/Tools/ScrapedIn/ScrapedIn.py", line 152, in companyLookup
if c['item']['entityResult']['title']['text']:
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable
Here's my input and the stacktrace:
inputAndStacktrace.txt
I can't find a way around the "too many redirects" error.
When I try to launch the python script I get the error:
root@kali:~/ScrapedIn# python ScrapedIn.py
Traceback (most recent call last):
File "ScrapedIn.py", line 20, in
from thready import threaded
ImportError: No module named thready
I've ran pip install thready but even though the requirement is supposedly satisfied, the error persists.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.