Yoyo! I'm mez0.
Targeted Ops @ TrustedSec
- ๐ญ Residing at: TrustedSec
- ๐ Write stuff at: mez0.cc
- ๐ซ Reach me at: mez0
Yet Another LInkedIn Scraper...
License: MIT License
Yoyo! I'm mez0.
Targeted Ops @ TrustedSec
On the bigger tables, it takes forever to load. Sometimes it doesnt lol
.
To fix this, its probably going to be DataTables.
Hi guys,
Hope you are all well !
Can you add the instruction about how to create the cookie.txt ?
I tried with a google chrome extension to use the following cookie.txt
# HTTP Cookie File for linkedin.com by Genuinous @genuinous.
# To download cookies for this tab click here, or download all cookies.
# Usage Examples:
# 1) wget -x --load-cookies cookies.txt "https://www.linkedin.com/feed/"
# 2) curl --cookie cookies.txt "https://www.linkedin.com/feed/"
# 3) aria2c --load-cookies cookies.txt "https://www.linkedin.com/feed/"
#
.linkedin.com TRUE / TRUE 1618248615 lissc 1
.linkedin.com TRUE / TRUE 1649826467 bcookie "v=2&xxxxxxx-8363-43b4-8e35-86474cbe7b7a"
.www.linkedin.com TRUE / TRUE 1649826467 bscookie "v=1&20200412173014413a8dea-b4e9-47c3-8a4c-xxxxxxx-0PWt0"
.linkedin.com TRUE / FALSE 1650194049 _ga GA1.2.1274459269.1586712618
.linkedin.com TRUE / FALSE 0 xxxxxxx%40AdobeOrg 1
.linkedin.com TRUE / FALSE 1589714054 aam_uuid xxxxxxx
.linkedin.com TRUE / TRUE 1594488621 liap true
.www.linkedin.com TRUE / TRUE 1594488621 sl v=1&7MCd6
.www.linkedin.com TRUE / TRUE 1618248621 li_at xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxx-xxxxxxx-xxxxxxx
.www.linkedin.com TRUE / TRUE 1594488621 JSESSIONID "ajax:xxxxxxxxxxxxxx"
.www.linkedin.com TRUE / TRUE 1589304621 lissc1 1
.www.linkedin.com TRUE / TRUE 1589304621 lissc2 1
.linkedin.com TRUE / TRUE 0 lang v=2&lang=en-us
.www.linkedin.com TRUE / FALSE 0 spectroscopyId xxxxxxx-11b4-4bbc-87ed-0b94ca1ac8b2
.linkedin.com TRUE / TRUE 1589457227 UserMatchHistory AQLcZ6t7X8SGswAAAXF4icNceJceoCM3yIrOyZ5LDyxqeYijA1v3bvrL44cubHKRTrzQgabW7eM
.linkedin.com TRUE / TRUE 1589457228 li_oatml AQGGJyC67vacYgAAAXF4icm74JOhMgG6hSaPHkK4S0OKPIdHmIVoahhXq6gmV_L3n9CGdvSIqoaJOvzbQVcjLciuyBBnpxih
.linkedin.com TRUE / FALSE 1650191868 AMCV_14215E3D5995C57C0A495C55%40AdobeOrg -1303530583%7CMCIDTS%7C18369%7CMCMID%7C13793629839677649112930854099465204048%7CMCAAMLH-1587724668%7C6%7CMCAAMB-1587724668%7C6G1ynYcLPuiQxYZrsz_pkqfLG9yMXBpb2zX5dvJdYQJzPXImdj0y%7CMCOPTOUT-1587127068s%7CNONE%7CvVersion%7C3.3.0%7CMCCIDH%7C1633815264
.www.linkedin.com TRUE / TRUE 1589714047 UserMatchHistory AQKKFOW2MEDobgAAAXGH2IBYHTH4Xi8S7FRUDgiL26hEFR_tZ4tSPILYHLh8HiCD4x5LAoWKtO5Du0NNALe83jMbAA8NJDn2JbfwjzhTB8I63Mf1ScxO18qTe8Lktm9jpAPEwsHIyjQhUEDZXoHTASKC9PAVZWGf_A5KiXzbKaFqXTInnNspIJmZu4yZi0yN4K1-T-pzPdCQhFy2V_Pml333G1Enn4eF
.linkedin.com TRUE / FALSE 1587122649 _gat 1
.linkedin.com TRUE / TRUE 1587139709 lidc "b=TB89:g=2260:u=502:i=1587122050:t=1587139708:s=AQFY26mnb2z2_H8kMmGLt5okV0oI0aYo"
But did not work.
Ps. I replaced by xxxxxxxx
the real values
How to fix it ?
Cheers,
X
With the URLs returned from the api, each one can be requested and data can be extracted via the user bios.
Using regex, it could be possible to extract Windows Server R2 2012
with something like [Ww] [Ss]erver [Rr]\d [\d\d\d\d|\d\d]
(for example)
Codename: UhOh365
[20/11/19, 01:35:53] >> Please add the cookie to a file
Hi, where do I have to add the cookie file and in what format should the data be present in it?
Add office365enum and/or Hunter API to validate emails.
can we change the script to search base on the keyword without a specific company ?
thank you for your help
The function users = parse_users(data,userdata_per_page)
in linkedin_scraper.py
could probably do with threading, especially when validation is being used.
Currently, the api only responds with 1000 results.
url='https://www.linkedin.com/voyager/api/search/cluster?count=40&guides=List(v->PEOPLE,facetCurrentCompany->%s)&origin=OTHER&q=guided&start=0' % company_id
Here, the start=0
determines where the data starts from. Potentially, when the results is approaching 1000, this value gets set to 1000 and starts the process again.
Traceback (most recent call last):
File "linky.py", line 147, in
users=core.run(data)
File "C:\Users\Clint\Downloads\Linkedin scraper\lib\core.py", line 63, in run
logger.dump(users,validation)
File "C:\Users\Clint\Downloads\Linkedin scraper\lib\logger.py", line 211, in dump
green('%s (%s): %s at %s' % (GREEN(fullname),email,current_role,GREEN(current_company)))
File "C:\Users\Clint\Downloads\Linkedin scraper\lib\logger.py", line 57, in green
print('['+log_time+']'+GREEN(' >> ' )+string)
File "C:\Users\Clint\DOWNLO1\LINKED1\env\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 108-109: character maps to
python3 -r install requirements.txt
Unknown option: -r
usage: python3 [option] ... [-c cmd | -m mod | file | -] [arg] ...
Try `python -h' for more information.
โโ[โ]โ[user@parrot]โ[~/Desktop/linky]
โโโโผ $python3 install -r requirements.txt
python3: can't open file 'install': [Errno 2] No such file or directory
can you help me?
--keyword
needs to support multiple roles.
Currently, this filth will work:
#!/bin/bash
ROLES='developer engineer director'
ID=1441
COMPANY='google'
DOMAIN='google.com'
for ROLE in $ROLES; do
./linky.py --cookie cookie.txt --company-id $ID --domain $DOMAIN --output $COMPANY_employees_$ROLE --format 'firstname.surname' --keyword $ROLE
sleep 5
done
As of now, the naming scheme has to be manually set. A potential way around this would be to take a sample size of the users, and generate a multiple naming schemes per name.
Then, attempt to validate these users and identify which naming scheme came back positive.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.