Giter Site home page Giter Site logo

m8sec / crosslinked Goto Github PK

View Code? Open in Web Editor NEW
1.2K 29.0 175.0 155 KB

LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping

License: GNU General Public License v3.0

Python 100.00%
webscraping python3 osint enumeration username-generator pentest-tool pentest-scripts linkedin-scraper

crosslinked's Introduction

CrossLinked

     

CrossLinked is a LinkedIn enumeration tool that uses search engine scraping to collect valid employee names from an organization. This technique provides accurate results without the use of API keys, credentials, or accessing LinkedIn directly!

Table of Contents

Install

PyPi

Install the last stable release from PyPi:

pip3 install crosslinked

Poetry

Install and run the latest code using Poetry:

git clone https://github.com/m8sec/subscraper
cd subscraper
poetry install
poetry run crosslinked -h

Python

Install the most recent code from GitHub:

git clone https://github.com/m8sec/crosslinked
cd crosslinked
pip3 install .

Prerequisites

CrossLinked assumes the organization's account naming convention has already been identified. This is required for execution and should be added to the CMD args based on your expected output. See the Naming Format and Example Usage sections below:

Naming Format

{first.{last}           = john.smith
CMP\{first}{l}          = CMP\johns
{f}{last}@company.com   = [email protected]

🦖 Still Stuck? Metadata is always a good place to check for hidden information such as account naming convention. see PyMeta for more.


Advanced Formatting

💥 New Feature 💥

To be compatible with alternate naming conventions CrossLinked allows users to control the index position of the name extracted from search text. Should the name not be long enough, or errors encountered with the search string, CrossLinked will revert back to its default format.

Note: the search string array starts at 0. Negative numbers can also be used to count backwards from the last value.

# Default output
python3 crosslinked.py -f '{first}.{last}@company.com' Company
John David Smith = [email protected]

# Use the second-to-last name as "last"
python3 crosslinked.py -f '{0:first}.{-2:last}@company.com' Company
John David Smith    = [email protected]
Jane Doe            = [email protected]

# Use the second item in the array as "last"
python3 crosslinked.py -f '{first}.{1:last}@company.com' Company
John David Smith    = [email protected]
Jane Doe            = [email protected]

Search

By default, CrossLinked will use google and bing search engines to identify employees of the target organization. After execution, two files (names.txt & names.csv) will appear in the current directory, unless modified in the CMD args.

  • names.txt - List of unique user accounts in the specified format.
  • names.csv - Raw search data. See the Parse section below for more.

Example Usage

python3 crosslinked.py -f '{first}.{last}@domain.com' company_name
python3 crosslinked.py -f 'domain\{f}{last}' -t 15 -j 2 company_name

⚠️ For best results, use the company name as it appears on LinkedIn "Target Company" not the domain name.

Screenshots

Parse

Account naming convention changed after execution and now your hitting CAPTCHA requests? No Problem!

CrossLinked includes a names.csv output file, which stores all scraping data including: name, job title, and url. This can be ingested and parsed to reformat user accounts as needed.

Example Usage

python3 crosslinked.py -f '{f}{last}@domain.com' names.csv

Screenshots

Additional Options

Proxy Rotation

The latest version of CrossLinked provides proxy support to rotate source addresses. Users can input a single proxy with --proxy 127.0.0.1:8080 or use multiple via --proxy-file proxies.txt.

> cat proxies.txt
127.0.0.1:8080
socks4://111.111.111.111
socks5://222.222.222.222

> python3 crosslinked.py --proxy-file proxies.txt -f '{first}.{last}@company.com' -t 10 "Company"

⚠️ HTTP/S proxies can be added by IP:Port notation. However, socks proxies will require a socks4:// or socks5:// prefix.*

Command-Line Arguments

positional arguments:
  company_name        Target company name

optional arguments:
  -h, --help          show help message and exit
  -t TIMEOUT          Max timeout per search (Default=15)
  -j JITTER           Jitter between requests (Default=1)

Search arguments:
  --search ENGINE     Search Engine (Default='google,bing')

Output arguments:
  -f NFORMAT          Format names, ex: 'domain\{f}{last}', '{first}.{last}@domain.com'
  -o OUTFILE          Change name of output file (omit_extension)

Proxy arguments:
  --proxy PROXY       Proxy requests (IP:Port)
  --proxy-file PROXY  Load proxies from file for rotation

Contribute

Contribute to the project by:

  • Like and share the tool!
  • Create an issue to report any problems or, better yet, initiate a PR.
  • Reach out with any potential features or improvements @m8sec.

crosslinked's People

Contributors

a7t0fwa7 avatar c3l3si4n avatar dependabot[bot] avatar frapava98 avatar m8sec avatar mplattner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crosslinked's Issues

Remove accents/diacritics

Very nice app.... works like a charm.

For some languages (like Portuguese), it should remove accents/diacritics, cause it's not used in e-mail addresses.

I get it done with function...

def strip_accents(text):
    try:
        text = unicode(text, 'utf-8')
    except NameError: # unicode is a default on python 3 
        pass

    text = unicodedata.normalize('NFD', text)\
           .encode('ascii', 'ignore')\
           .decode("utf-8")

    return str(text)

....called before save to file!

How to use CrossLinked with latin names?

In Spain and Latin America people have 2 surnames, their father's and mother.

For instance, Jose Perez Rodriguez, José would be the name, and then Perez the first surname, and Rodriguez the second surname.

For the email addresses, we generally use name.surname1@company.com ([email protected]). Right now, with the current logic, it is taking name.surname2@company.com ([email protected]). I assume that CrossLInked considers the surname1 as the second part of the firstname.

Is there any command option to take the first 2 strings as the firstname+surname, ignoring the 3th string?

cloud API with similar concept: search and extract LinkedIn profiles

I don't see a lot of action happening here across CrossLinked issues recently, though I know a lot of people are struggling when extracting data from Google Search and Linkedin.

So, I decided to share with you guys a managed API which tries to solve a problem of extracting LinkedIn data reliably:
https://rapidapi.com/restyler/api/linkedin-profiles1

The best part? It does not require your login or cookies.

I am the author and though the API was recently launched, I already see a great interest from a lot of developers.

The API has just two endpoints: /search to get a list of linkedin URLs for a person or a company, and /extract to get the JSON structured data from a linkedin URL. The API uses a combination of proxies and smart retry strategies to get the information.

Does Google ban Crosslinked?

I successfully cloned the CrossLinked directory and ran the algorithm. I tried to get a list of people working for a company. Google returned 280 hits and Bing returned about 80. Both numbers are pretty low, given the fact that this particular company has more than 5000 employees. For now, that's a minor issue. :-) However, running the algorithm for a second time returned zero hits for Google. Since this second run Google does not return any hits at all. Bing works -sort of- alright. Does Google have a policy of banning the use of Crosslinked after recognizing the IP -address? Or are we talking an old issue?

With Proxy shows False in resp

Hi when i use proxy it shows resp as False and shows zero results.

python3 crosslinked.py --proxy 119.160.98.147:8080 -f "{first}.{last}@purelogics.net" "purelogics"

No results

Hello,
I have tried to execute the following command : python3 test.py -f 'domain\{f}{last}' -t 15 -j 10 'capgemini' to get results of linkedin profiles of Capgemini profiles. But I get 0 results
When I send the requests ( urls that i get from the logs ) manually from the browser, i see links to the Linkedin profiles. But when i tried with Insomia i get a HTML response with 200 status; but with 0 search results

Error in install documentation

Hello sir.

Love your software by the way. Great job.

Just wanted to let you know that the installation instructions say
python3 setup install but it should actually be: python3 ./setup.py install

Thanks :)

--proxy-file isn't working

Having some issues using crosslinked with multiple proxies. I am able to execute things fine with no proxy, and specifying a single proxy, however the --proxy-file switch seems to break.

└─$ crosslinked --proxy-file 'proxies.txt' -f '{f}{last}@microsoft.com' -j 3 -t 10 "Microsoft"

     _____                    _             _            _ 
    /  __ \                  | |   (x)     | |          | |
    | /  \/_ __ ___  ___ ___ | |    _ _ __ | | _____  __| |
    | |   | '__/ _ \/ __/ __|| |   | | '_ \| |/ / _ \/ _` |
    | \__/\ | | (_) \__ \__ \| |___| | | | |   <  __/ (_| | @m8sec
     \____/_|  \___/|___/___/\_____/_|_| |_|_|\_\___|\__,_| v0.2.0

    
Traceback (most recent call last):
  File "/home/test/tools/crosslinked/bin/crosslinked", line 8, in <module>
    sys.exit(main())
  File "/home/test/tools/crosslinked/lib/python3.10/site-packages/crosslinked/__init__.py", line 102, in main
    args = cli()
  File "/home/test/tools/crosslinked/lib/python3.10/site-packages/crosslinked/__init__.py", line 48, in cli
    return args.parse_args()
  File "/usr/lib/python3.10/argparse.py", line 1838, in parse_args
    args, argv = self.parse_known_args(args, namespace)
  File "/usr/lib/python3.10/argparse.py", line 1871, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "/usr/lib/python3.10/argparse.py", line 2084, in _parse_known_args
    start_index = consume_optional(start_index)
  File "/usr/lib/python3.10/argparse.py", line 2024, in consume_optional
    take_action(action, args, option_string)
  File "/usr/lib/python3.10/argparse.py", line 1932, in take_action
    argument_values = self._get_values(action, argument_strings)
  File "/usr/lib/python3.10/argparse.py", line 2467, in _get_values
    value = self._get_value(action, arg_string)
  File "/usr/lib/python3.10/argparse.py", line 2500, in _get_value
    result = type_func(arg_string)
  File "/home/test/tools/crosslinked/lib/python3.10/site-packages/crosslinked/__init__.py", line 47, in <lambda>
    pr.add_argument('--proxy-file', dest='proxy', default=False, type=lambda x: utils.file_exists(x), help='Load proxies from file for rotation')
  File "/home/test/tools/crosslinked/lib/python3.10/site-packages/crosslinked/utils.py", line 20, in file_exists
    return [line.strip() for line in open('filename')] if contents else filename
FileNotFoundError: [Errno 2] No such file or directory: 'filename'

content of proxies.txt

┌──(crosslinked)─(test㉿test)-[~/tools/crosslinked/bin]
└─$ cat proxies.txt                                                                                           
workingproxy1:8888
workingproxy2:8888
workingproxy3:8888

Google/Bing Search Limit

When using CrossLinked on large organizations, it appears to be limited by Google's 300 result limit (source)(source) and Bing's 1000 result limit (source). In practical application, I can't pull more than 300ish results for a single organization, regardless of the number of employees on LinkedIn.

Using The Hershey Company as an example, there are 7,360 potential employees. For this example, a fresh IP address from NordVPN will be used, and relatively high jitter and timeout values were set.
image

We can run CrossLinked for Hershey:
image

We can observe that Google only returns 300 results, as expected. I don't know what is going on with Bing, but I killed it after appeared to hang. I'm not sure what is happening here, as 34 isn't anywhere close to Bing's minimum.

If I am correct (and feel free to let me know if I am not), I fully recognize this is likely going to be a very difficult problem to correct. For now, it might be worth mentioning the limitation in the ReadMe.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.