Giter Site home page Giter Site logo

udemy_bot's Issues

Cloudflare prevents complete execution

When I run the bot it has been unable to get past cloudflare protection. I increased the sleep timeout to manually try the captchas but it seems that I'm given an endless loop of more captchas on the udemy site via cloudflare when using selenium.

I tried using undetected_chromedriver as a replacement to chromedriver but I've been experiencing an error with it at line 207:

https://github.com/dimakiss/Udemy_bot/blob/main/Udemy_bot.py#L207
elif is_account_exist(sys.argv[1], sys.argv[2]):

selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 88 Current browser version is 87.0.4280.141 with binary path /Applications/Google Chrome.app/Contents/MacOS/Google Chrome

 

A couple of other thing's I've tried are manually specifying the chrome driver version / binary for undetected_chromedriver:

import undetected_chromedriver as uc
uc.TARGET_VERSION = 87
uc.install(
    executable_path='/usr/local/bin/chromedriver',
)

but it still gives the error above. There isn't currently a chrome 88 available when I check for updates.

Credentials not correct but they are. Login fails

#10
In reference to the above, I successfully resolved the bot error, but now I'm facing a login problem.
I receive the "There was a problem logging in. Check your email and password or create an account." error.

Mail and password are correct. I copied and pasted them on udemy and I logged in successfully

Debugging a bit, I see that in function "is_account_exist" in udemy_bot.py that
is_exist = temp_url == browser.current_url

gives browser.current_url is not defined

Which probably is the culprit of the whole thing.
Any idea of why it doesn't work???

EDIT: Further investigation (printing the elements found) brings to this;

Checking if the email and password are correct
browser_email: input name="email" required="" maxlength="64" minlength="7" placeholder="Email" data-purpose="email" type="email" id="email--1" class="form-control" value="">ù
browser_password: <ùinput type="password" name="password" required="" placeholder="Password" class="textinput textInput form-control" maxlength="64" data-purpose="password" id="id_password" minlength="6">ù
current_url: https://www.udemy.com/join/login-popup/
browser_submit input type="submit" name="submit" value="Log In" class="btn btn-primary " id="submit-id-submit" data-purpose="do-login
is_exist True
There was a problem logging in. Check your email and password or create an account.

So it's all ok, but nonetheless temp_url and browser.current_url remains equal and it won't work

EDIT:
Out of desperation, I tried again to solve something but this damn bot is a hatch of bugs and won't work in any way.
We've already see that now the bot is catching the html elements correctly.
So, what I did is import these libraries:

from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys

Then, I tried to call the submit in 2 ways:

The original way in which the bot calls the submit
browser.find_element_by_id("submit-id-submit").click()

and this one
browser.find_element_by_id("submit-id-submit").send_keys(Keys.ENTER)

then, at the moment of checking if the browser.current_url is changed I tried the selenium webdriverwait, which simulates a real chrome waiting process instead of
sleep(2)
which is a workaround

so I did

try:
        print('browser_submit' + browser.find_element_by_id("submit-id-submit").get_attribute('outerHTML'))
        browser.find_element_by_id("submit-id-submit").send_keys(Keys.ENTER)
    except NoSuchElementException:
        print("No element found submit")
    print('Waiting max 30 seconds for url change')
    wait = WebDriverWait(browser, 30)
    try:
        wait.until(lambda driver: browser.current_url != temp_url)
    except TimeoutException:
        print('url did not change in 30 seconds')

But even in 30 seconds, nothing happened and I fall in the "TimeoutException".

Now, why it doesn't changes url? That's causing the problem?

I tried to skip is_exists function and I got bot messages about scraping X potential courses, and that it added them to my account, but nothing was added in reality.

Please help!

Selenium package gives error installing with pip3

RUN pip3 install -r requirements.txt
---> Running in 88829269d1f6
Collecting bs4==0.0.1
Downloading bs4-0.0.1.tar.gz (1.1 kB)
Collecting requests==2.23.0
Downloading requests-2.23.0-py2.py3-none-any.whl (58 kB)
Collecting lxml>=4.6.2
Downloading lxml-4.6.2-cp39-cp39-manylinux1_x86_64.whl (5.4 MB)
ERROR: Could not find a version that satisfies the requirement selenium==1.25.9
ERROR: No matching distribution found for selenium==1.25.9

Error when validating mail (id="email--1") and error with selenium( DevToolsActivePort file doesn't exist)

As mentioned I tried to run the bot in a docker container, which is equal on running it in a linux environment.
I will avoid to tell you the hell I had to do to make this bot work. I Cried long hours and pulled my hair in every way. Believe me.
In the end, I managed to do it but I got this:

Checking if the email and password are correct
Traceback (most recent call last):
File "/etc/udemybot/Udemy_bot.py", line 203, in
elif is_account_exist(sys.argv[1], sys.argv[2]):
File "/etc/udemybot/Udemy_bot.py", line 182, in is_account_exist
browser = webdriver.Chrome(options=options)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chrome/webdriver.py", line 76, in init
RemoteWebDriver.init(
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in init
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

The only way I found to workaround this is to add in the code

print("Checking if the email and password are correct")
 options = Options()
 options.add_argument("--no-sandbox")
 options.add_argument("--disable-dev-shm-usage")
 options.add_argument("--incognito")
 options.add_argument("--headless")

Have you got a better solution?

Then, after fixing this with a clunky workaround, I get this:

Checking if the email and password are correct
Traceback (most recent call last):
File "/etc/udemybot/Udemy_bot.py", line 203, in
elif is_account_exist(sys.argv[1], sys.argv[2]):
File "/etc/udemybot/Udemy_bot.py", line 186, in is_account_exist
browser.find_element_by_id("email--1").send_keys(email)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 360, in find_element_by_id
return self.find_element(by=By.ID, value=id_)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 976, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="email--1"]"}
(Session info: headless chrome=88.0.4324.150)

What can it be? I know it's the html parser that is missing something but debugging in the container it's hard.
You have any idea of what's going on?

EDIT: I printed the html with
print(browser.page_source)

And what you get is this:

<div aria-hidden="true" style="background-color: rgb(255, 255, 255); border: 1px solid rgb(208, 208, 208); box-shadow: rgba(0, 0, 0, 0.1) 0px 0px 4px; border-radius: 4px; left: -10000px; top: -10000px; z-index: -2147483648; position: absolute; transition: opacity 0.15s ease-out 0s; opacity: 0; visibility: hidden;"><div style="position: relative;"><iframe src="https://assets.hcaptcha.com/captcha/v1/40446ab/static/hcaptcha-challenge.html#id=0lcxfl97hd5&amp;host=www.udemy.com&amp;sentry=true&amp;reportapi=https%3A%2F%2Faccounts.hcaptcha.com&amp;recaptchacompat=off&amp;sitekey=33f96e6a-38cd-421b-bb68-7806e1764460" title="Main content of the hCaptcha challenge" frameborder="0" scrolling="no" style="border: 0px; z-index: 2000000000; position: relative;"></iframe></div><div style="width: 100%; height: 100%; position: fixed; pointer-events: none; top: 0px; left: 0px; z-index: 0; background-color: rgb(255, 255, 255); opacity: 0.05;"></div><div style="border-width: 11px; position: absolute; pointer-events: none; margin-top: -11px; z-index: 1; right: 100%;"><div style="border-width: 10px; border-style: solid; border-color: transparent rgb(255, 255, 255) transparent transparent; position: relative; top: 10px; z-index: 1;"></div><div style="border-width: 11px; border-style: solid; border-color: transparent rgb(208, 208, 208) transparent transparent; position: relative; top: -11px; z-index: 0;"></div></div></div></body></html>

Which looks like some bot protection measure from https://www.botstop.com/?utm_source=hcaptcha1 or something like that

Any idea?

All categories enabled by default, README.md states otherwise

I believe courses from all categories are being added by default, eg.

https://www.udemy.com/course/total-beginner-guitar-lessons/ is added

README.md contains
The current default categories are IT and Software and Development

The config area of udemy_bot.py contains

### CONFIG ###

categories_list = [
    'business',
    'design',
    'development',
    'finance-and-accounting',
    'health-and-fitness',
    'it-and-software',
    'lifestyle',
    'marketing',
    'music',
    'office-productivity',
    'personal-development',
    'photography',
    'photography-and-video',
    'teaching-and-academics'
]
#Personal preference for example
#categories_list=[
#    'development',
#    'it-and-software'
#]
rating_stars = 4.2
rating_people = 200

#### END OF CONFIG ###

Cheers - Scott.

Adding feature

Adding the option of reading from a previous urls text file and make sure that urls are not repeating.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.