Giter Site home page Giter Site logo

scrapedin's People

Contributors

dchrastil avatar dependabot[bot] avatar disk0nn3ct avatar steve-offutt avatar wikijm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scrapedin's Issues

Image files are missing

Hi,

On 2 different cases, I'm not able to catch images of LinkedIn profiles.
Thoses cases are:

  • Kali (latest Windows 10 Microsoft Store app)
  • Kali (latest official VMware VM)

There's no error message during process.

No cookie found

UnboundLocalError: local variable 'mycookies' referenced before assignment

Traceback Error : TypeError: 'NoneType' object is not subscriptable

Traceback (most recent call last):
File "/home/admin/Tools/ScrapedIn/ScrapedIn.py", line 358, in
companyResults = companyLookup(companyName)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/admin/Tools/ScrapedIn/ScrapedIn.py", line 152, in companyLookup
if c['item']['entityResult']['title']['text']:
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable

tweepy 4.14.0 requires requests 2.27.0

I've installed the latest "requests" and tried to update it, it says it's at the newest version.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tweepy 4.14.0 requires requests<3,>=2.27.0, but you have requests 2.20.0 which is incompatible.

Multiple Errors while running python script.

Hi,
I am running Ubuntu 16.04 (KDE) 64-bit machine. After installing xlsxwriter and thready I ran the script and got this error.

    [Info] Obtained new session: error
    Traceback (most recent call last):
    File "./ScrapedIn.py", line 156, in <module>
        get_search()
    File "./ScrapedIn.py", line 41, in get_search
        r = requests.get(url, cookies=cookies, headers=headers)
    File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 72, in get
        return request('get', url, params=params, **kwargs)
    File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request
        return session.request(method=method, url=url, **kwargs)
    File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request
        resp = self.send(prep, **send_kwargs)
    File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 640, in send
        history = [resp for resp in gen] if allow_redirects else []
    File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 140, in resolve_redirects
        raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
    requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
    Exception Exception: Exception('Exception caught in workbook destructor. Explicit close() may be required for workbook.',) in <bound method Workbook.__del__ of <xlsxwriter.workbook.Workbook object at 0x7f7e40cf9cd0>> ignored

Please check it and update with solution.

Thanks.

Python 3.6 incompatible?

I encountered a lot of errors while executive, tried to fix them on by one but I cant even install urllib2 on python 3.6? which version of python should I use?

No module named Thready

When I try to launch the python script I get the error:

root@kali:~/ScrapedIn# python ScrapedIn.py
Traceback (most recent call last):
File "ScrapedIn.py", line 20, in
from thready import threaded
ImportError: No module named thready

I've ran pip install thready but even though the requirement is supposedly satisfied, the error persists.

No matching distribution found for futures==3.2.0 (from -r requirements.txt (line 6))

I just tried to run sudo pip install -r requirements.txt and this is the result:

Collecting beautifulsoup4==4.6.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/9e/d4/10f46e5cfac773e22707237bfcd51bbffeaf0a576b0a847ec7ab15bd7ace/beautifulsoup4-4.6.0-py3-none-any.whl (86kB)
    100% |████████████████████████████████| 92kB 1.8MB/s
Collecting certifi==2023.7.22 (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/4c/dd/2234eab22353ffc7d94e8d13177aaa050113286e93e7b40eae01fbf7c3d9/certifi-2023.7.22-py3-none-any.whl (158kB)
    100% |████████████████████████████████| 163kB 3.4MB/s
Requirement already satisfied: chardet==3.0.4 in /usr/lib/python3/dist-packages (from -r requirements.txt (line 3)) (3.0.4)
Collecting cryptography==41.0.6 (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/4d/b4/828991d82d3f1b6f21a0f8cfa54337ed33fdb52135f694130060839cfc33/cryptography-41.0.6.tar.gz (630kB)
    100% |████████████████████████████████| 634kB 2.5MB/s
  Installing build dependencies ... done
Collecting enum34==1.1.6 (from -r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/af/42/cb9355df32c69b553e72a2e28daee25d1611d2c0d9c272aa1d34204205b2/enum34-1.1.6-py3-none-any.whl
Collecting futures==3.2.0 (from -r requirements.txt (line 6))
  Could not find a version that satisfies the requirement futures==3.2.0 (from -r requirements.txt (line 6)) (from versions: 0.2.python3, 0.1, 0.2, 1.0, 2.0, 2.1, 2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.1.5, 2.1.6, 2.2.0, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.1.0, 3.1.1)
No matching distribution found for futures==3.2.0 (from -r requirements.txt (line 6))

So I tried to run python ScrapedIn.py and I receive:

Traceback (most recent call last):
  File "ScrapedIn.py", line 22, in <module>
    from thready import threaded
ModuleNotFoundError: No module named 'thready'

So I try to install it through pip install threaded and it works:

Collecting threaded
  Downloading https://files.pythonhosted.org/packages/13/e4/87977aafea1cb6c1f7064f5bd6eaad0f7fadc30c82b21c0bce695c4455c0/threaded-4.1.0-cp37-cp37m-manylinux1_x86_64.whl (813kB)
    100% |████████████████████████████████| 819kB 1.7MB/s
Installing collected packages: threaded
Successfully installed threaded-4.1.0

I now try again python ScrapedIn.py and I have:

Traceback (most recent call last):
  File "ScrapedIn.py", line 22, in <module>
    from thready import threaded
ModuleNotFoundError: No module named 'thready'

There must be some problem with that library

Current setup/configuration method is incompatible with pipenv

Because of all the pinned dependencies this tool requires, I strongly feel that there should be some built-in expectation for users to be able to run this with pipenv. I've built a LinkedIn scraper before, so I can appreciate how tedious the process is, hence why pipenv compatibility is a big help for both the dev & user in this instance.

Is there any plans to do away with the user/password environment variable method and introduce an option to supply these variables at start? It would be safer as well since python's garbage collector would remove the values from memory at the end of its run (ideally).

Has this been considered?

Not an issue!

Have a look at this. Are you interested in some kind of support? We like your project.

image

[Fatal] Could not authenticate to linkedin. Set credentials in your environment variables.

Hello,

I have configured the credentials in the configuration file and in the environment variables:

cat config.py~:

...
linkedin = dict(
    username = '[email protected]',
    password = '#ExamplePwd123!'
)
...

export LI_USERNAME={[email protected]}
export LI_PASSWORD={#ExamplePwd123!}

and

export [email protected]
export LI_PASSWORD=#ExamplePwd123!

I keep getting the same validation error.

Is the correct endpoint yet?

Not able to deploy it in remote servers

Attempt to run it in a remote server like heroku, encountered this error:

(node:9167) UnhandledPromiseRejectionWarning: Error: linkedin: manual check was required, verify if your login is properly working man
ually or report this issue: https://github.com/leonardiwagner/scrapedin/issues
at page.waitFor.then.catch (/var/app/current/node_modules/scrapedin/src/login.js:62:31)
at process._tickCallback (internal/process/next_tick.js:68:7)
(node:9167) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async
function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 2)
(node:9167) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not
handled will terminate the Node.js process with a non-zero exit code.
(node:9167) UnhandledPromiseRejectionWarning: Error: Protocol error (Runtime.callFunctionOn): Target closed.
at Promise (/var/app/current/node_modules/puppeteer/lib/Connection.js:183:56)
at new Promise ()
at CDPSession.send (/var/app/current/node_modules/puppeteer/lib/Connection.js:182:12)
at ExecutionContext.evaluateHandle (/var/app/current/node_modules/puppeteer/lib/ExecutionContext.js:106:44)
at ExecutionContext. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at ElementHandle.$ (/var/app/current/node_modules/puppeteer/lib/JSHandle.js:378:50)
at ElementHandle. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at DOMWorld.$ (/var/app/current/node_modules/puppeteer/lib/DOMWorld.js:114:34)
at process._tickCallback (internal/process/next_tick.js:68:7)
-- ASYNC --
at Frame. (/var/app/current/node_modules/puppeteer/lib/helper.js:108:27)
at Page.$ (/var/app/current/node_modules/puppeteer/lib/Page.js:300:29)
at Page. (/var/app/current/node_modules/puppeteer/lib/helper.js:109:23)
at page.waitFor.then.catch (/var/app/current/node_modules/scrapedin/src/login.js:60:16)
at process._tickCallback (internal/process/next_tick.js:68:7)

It is not possible to do manual verification as it is not possible to open a browser in such server and provide the verification. Is there any workaround to this issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.