Giter Site home page Giter Site logo

script.module.openscrapers's Introduction

#               ██████╗ ██████╗ ███████╗███╗   ██╗                 
#              ██╔═══██╗██╔══██╗██╔════╝████╗  ██║                 
#              ██║   ██║██████╔╝█████╗  ██╔██╗ ██║                 
#              ██║   ██║██╔═══╝ ██╔══╝  ██║╚██╗██║                 
#              ╚██████╔╝██║     ███████╗██║ ╚████║                 
#               ╚═════╝ ╚═╝     ╚══════╝╚═╝  ╚═══╝                 
#                                                                  
#  ███████╗ ██████╗██████╗  █████╗ ██████╗ ███████╗██████╗ ███████╗
#  ██╔════╝██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔════╝██╔══██╗██╔════╝
#  ███████╗██║     ██████╔╝███████║██████╔╝█████╗  ██████╔╝███████╗
#  ╚════██║██║     ██╔══██╗██╔══██║██╔═══╝ ██╔══╝  ██╔══██╗╚════██║
#  ███████║╚██████╗██║  ██║██║  ██║██║     ███████╗██║  ██║███████║
#  ╚══════╝ ╚═════╝╚═╝  ╚═╝╚═╝  ╚═╝╚═╝     ╚══════╝╚═╝  ╚═╝╚══════╝
#                                                                  

Welcome to OpenScrapers Project,

This project is in hopes to unify the community and contribute to one scraper pack for multi scraper add-ons and not having the repo go dead or disappear. The goal is to stop the drama, the egos, everything and work together and make this a great scraper pack so the community can benefit. Addons4Kodi takes no credit for putting this together, and we thank all devs that have contributed to multiple projects over time.

OpenScrapers Repo

You can add the source directory to your own repository for convenience and updates

<dir>
    <info compressed="false">https://raw.githubusercontent.com/a4k-openproject/repository.openscrapers/master/zips/addons.xml</info>
    <checksum>https://raw.githubusercontent.com/a4k-openproject/repository.openscrapers/master/zips/addons.xml.md5</checksum>
    <datadir zip="true">https://raw.githubusercontent.com/a4k-openproject/repository.openscrapers/master/zips/</datadir>
</dir>

How to Import Open Scrapers Into Any Addon

Any multi-source Kodi addon can be altered to use these new scrapers instead of its own, you can follow the instructions below to get things updated. When appling to a different addon, change "name_of_addon" with the name of the addon.

Open the addons/plugin.video.name_of_addon/addon.xml.

Add the following line to the addon.xml file:

<import addon=”script.module.openscrapers”/>

Open addons/script.module.name_of_addon/lib/resources/lib/modules/sources.py

Add the following line to the sources.py file:

import openscrapers

Add it right after the line that says:

`import re

You will also need to change a few lines in the def getConstants(self) function in sources.py file:

Find the line that says:

from resources.lib.sources import sources

Comment out that line by adding a pound/hashtag at the beginning like this:

#from resources.lib.sources import sources

add the following:

from openscrapers import sources

External Scraper Tester

With the help of Jabaxtor, we now have an external scraper tester that can test any scraper folder in the lib\openscrapers\sources_openscrapers and this also means your can bring in scraper folders from other addons and add them to this directory, but you will have to do a lil bit of work to get it working right, read below for more info.

In the root directory of OpenScrapers you will find two files

scrape-test.py and Scraper Tester.bat

scrape-test.py is where all the magic happens.

REQUIREMENTS Python2 latest version install bs4 dependency for Python

pip install bs4

Command Arguments

folders=(name of scraper folder ie:en,en_DebridOnly)

test_type=(1 or 0)

test_mode=(movie or episode)

timeout_mode=(true, y, True, false, False, n)

number_of_tests=(1-500)

Argument Explanations

folders: Specifies the folder or folders you want to test, test multiple with a comma separator

test_type: Specifies if you'd like to test all scrapers in the folder or just a specific one

test_mode: Specifies the type of test you'd like to run such as testing scrapers against a set of movies or episodes

timeout_mode: Specifies if you would like to use timeout of 60 or not. If set to true, True, or y then it will force number_of_tests to 1

number_of_tests: Specifies the amount of sources you'd like to test against the scrapers, such as 10 movies on trakts popular list

Example Scraper Command

Requires python to run

scrape-test.py folders=en,en_DebridOnly test_type=1 test_mode=movie timeout_mode=false number_of_tests=10

This will test all scrapers in en and en_DebridOnly with 10 movies sources from trakts popular movie list and will continue until the scrape finishes

Adding Scrapers from other addons

First copy the scraper folder, usually called something like "en", from an addon like EggScrapers for instance, rename it to something other than what's already lib\openscrapers\sources_openscrapers, for instance, for eggscrapers call the folder: scrapertest-egg

Then you'll need to copy the init.py file from any other folder such as en and add it to this new one

Then open all the scrapers in something like Notepad++ and replace

from resources.lib.modules

with

from openscrapers.modules

in all open files.

This is only because it would need to use the modules from OpenScrapers instead of an external addon

Now you're ready to run your command make the folder argument folder=scrapertest-egg

Scraper Tester Batch

I made an easy to use batch file pre-configured for OpenScrapers, EggScrapers, Yoda, and Scrubs

Once you open it you will get your options to test different addons, pretty easy to follow along :)

Preset folder names in the batch file for external addons are below so please follow last section and use these folder names to test external scrapers from preset addons

scrapertest-egg, scrapertest-yoda, scrapertest-scrubs

Only thing you should know is that if a scraper hangs, it will stall the whole test, you can check by opening the test-results folder and opening the txt file results for the file you're testing. If you see the same set of scrapers repeating over and over, then there's an issue with those scrapers. Close the window or press ctrl+c to terminate the batch so you can move those scrapers out and try again!

Enjoy!

script.module.openscrapers's People

Contributors

123venom avatar a4k-official avatar doko-desuka avatar drinfernoo avatar gateofgator avatar host505 avatar i-a-c avatar jabaxtor avatar kodiultimate avatar nazegnl avatar reddit-reaper avatar rickdoesxmas avatar sraedler avatar thedevfreak avatar tikipeter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

script.module.openscrapers's Issues

Unresolved reference - Anilist

The following scrapers all reference the missing module anilist:

Animeloads
Proxer
Foxx
Pureanime

This module will either need to be found or the providers removed

Using proxy sites

With being in the UK I have to use a lot of proxy sites to get torrents to scrape ,I used this https://limetorrents.unblockit.me/ with openscrapers and others that used to work , but have stopped working sometime ago, is this because of cloudfare v2 change

Torrent scraping broke after c06e458

c06e458
What is this debrid.tor_enabled() suppose to check? It breaks torrent scraping on my add-on, and exodus redux as well.
Removing this check from torrent scrapers makes them work again.
Are we supposed to implement an extra setting or something?
On a side note, this commit adds some lines with tabs, unlike the rest of the file's spaces indentation.

settings... torrents

settings, for toggle torrents on or off still refer to script.module.civitasscrapers

user_agents.py

v196 seem to be having trouble with user agents py, may just be me, had to revert to last cfscrape etc as scrapers cant load user_agents.py

Adjust file "default.py" and "addon.xml

Please excuse my poor English.

Hello,

is there a special reason why in the 'addon.xml' the line
<extension point="xbmc.python.pluginsource" library="lib/default.py">
is not
<extension point="xbmc.python.script" library="lib/default.py">
like other script modules?

I changed this for me and also changed the 'default.py' a little bit. The advantage is that after changing the settings, the settings are always saved when you click on the "OK" button.

As an example my "default.py":

default.py.txt

New function "get_titles_for_search()" for "source_utils.py"

Please transfer the following function "get_titles_for_search()" to the "source_utils.py

def get_titles_for_search(title, localtitle, aliases):
	try:
		titles = []
		if "country':" in str(aliases): aliases = aliases_to_array(aliases)
		if localtitle != '': titles.append(localtitle)
		if title != ''and title != localtitle: titles.append(title)
		[titles.append(i) for i in aliases if i.lower() != title.lower() and i.lower() != localtitle.lower() and i != '']
		titles = [str(i) for i in titles if all(ord(c) < 128 for c in i)]
		return titles
	except:
		return []

This function simplifies the writing of scrapres.
It creates a list from the transferred values in which there are no more duplicates.

as an example you see some codelines from a scraper:

old:

def movie(self, imdb, title, localtitle, aliases, year):
    try:
        url = self.__search([localtitle] + source_utils.aliases_to_array(aliases))
        if not url and title != localtitle: url = self.__search([title] + source_utils.aliases_to_array(aliases))
        return url
    except:
        return

new with "get_titles_for_search()"

def movie(self, imdb, title, localtitle, aliases, year):
    try:
        return self.__search(source_utils.get_titles_for_search(title, localtitle, aliases))
    except:
        return

many thanks

anilist.py

should anilist.py have this changed:
from resources.lib.modules import
to this:
from openscrapers.modules import

dead PL providers

please remove as the following website are dead and don't exist anymore

<setting id="provider.openkatalog" type="bool" label="OPENKATALOG" default="false" />
<setting id="provider.paczamy" type="bool" label="PACZAMY" default="false" />
<setting id="provider.trt" type="bool" label="TRT" default="false" />

show Easynews & Furk links as 'Premium'

Currently Easynews & Furk results are treated as 'direct' sources by addons, this is a problem if using the 'use debrid only' filter (and possibly other sorting/filtering options).

Solution would be to consider these links 'premium' along with torrent/debrid rather than the 'free' links.

v197 en onlineseries

Sorry, I forgot to mention, that there is a copy of onlineseries in en that still calls for dom_parser2, the one in en_debrid_only is proper and i guess the one in en should be removed.
And just a thank you for creating and maintaining openscrapers ;)

the way to add openscrapers to a addon

hi i am having real trouble getting this to work i followed each step 4 times now and im still getting the same issue if you have a telegram group can you send me a invite please so maybe someone can help me get it to work please.

v.195, 2ddl and rapidmoviez

2ddl giving this error: http://2ddl.vg/ returned an error. Could not collect tokens.

and rapidmoviez requires "from openscrapers.modules import dom_parser2" but dom_parser2.py is not in modules, i was able to add my own dom_parser2 but i thought i would let you know.

Foreign GERMAN providers

Hi.
Can you check the german providers list please?
I use exodus redux and it loads no links when i set openscraper to use german providers.
BTW when i use lambdascrapers all works.
I have notized that in lambdascrapers the list of german providers is completely different.

OpenSSL error with cfscrape

Hello. Using openscrapers on Kodi 18.6 under windows 10.

When trying to scrape french yggtorrent website I get this error :

DEPRECATION: The OpenSSL being used by this python install (OpenSSL 1.0.2j 26 Sep 2016) does not meet the minimum supported version (>= OpenSSL 1.1.1) in order to support TLS 1.3 required by Cloudflare, You may encounter an unexpected reCaptcha or cloudflare 1020 blocks

And I can't bypass cloudfare protection.

Any idea ?

Series9

@nazegnl
Series9 giving me this error on latest dev branch

Traceback (most recent call last):
File "C:\Users*\Documents\GitHub\script.module.openscrapers\lib\openscrapers\sources_openscrapers\en\series9.py", line 111, in sources
url = self.searchMovie(data['title'], data['year'])
File "C:\Users*\Documents\GitHub\script.module.openscrapers\lib\openscrapers\sources_openscrapers\en\series9.py", line 93, in searchMovie
url = [i[0] for i in results if cleantitle.get(i[1]) == cleantitle.get(title)][0]
IndexError: list index out of range

Vidics is down

Hi,
Just wanted to pass along that vidics is down. Could someone take a look at it?

Thanks.

Control module does not access Openscrapers settings.

Hi there.
I've written an Easynews scraper for Openscrapers, but there is a problem with the control.py file that should be used to access the settings of openscrapers.

The 'addon' variable (accessing xbmcaddon.Addon) needs to explicitly state the openscrapers id as it's 'id' arg...
addon = xbmcaddon.Addon(id='script.module.openscrapers')
at the moment it is like this...
addon = xbmcaddon.Addon()

This has implications with other variables in the code, such as "setting" which is assigned "addon.getSetting". If 'addon' is not set to openscrapers, then this 'setting' call will call the settings of whichever addon is accessing the scraper. So, for example, if Venom calls the new Easynews scraper, then the 'settings' calls in the Easynews scraper will check Venom's settings instead of Openscrapers settings.

I can fix this with a pull request, I just don't know whether it will affect the scrapers test code incorporated into Openscrapers.

myvideolink

myvideolink may be broken, is anyone else getting errors from it ? thnx

[Suggestion]Adding Headers on requests

As title says, please add headers on the requests of cfscrape to hide kodi headers.
In the case of client.request(url) headers are created inside the request function of client module so for example you dont need to set User-Agent in headers, but on normal requests and cfscrape requests you need to set User-Agent and possibly the baseurl as referer of the scraper to hide that way requests from kodi!
for example:

scraper = cfscrape.create_scraper() headers = {'User-Agent': client.agent(), 'Referer': self.base_url} html = scraper.get(url, headers=headers).text

documentation for developing new scrapers

Hello, I have been developing some scrapers using Bsoup for a while. I am interested in developing scrapers capable of being integrated with openscrapers. Is there documentation with clear instructions on how to develop scrapers to your pattern?

problems with many german scraper sites

Really many german scraper seems to be broken.
I used venom with only the foreign scrapers and set to german indexers within venom.

right now I have only found sources at iload, ddl(.me?) and streamto.
I know the searched series is at least at serienstream (s.to), freikino, hdfilme, kinox.to.

Could someone look into this?

Or anyone has a "easy" guide for making scrapers, than I could try it myself.
(Haven't done this before, nor used python much)

Furk scarper does not return any values

I did a fresh install of the latest Exodus Redux. Disabled all other providers except Furk. Set up my login credentials and API key. Tried a search and got the notification that there was no stream found. My search on Furk.net itself returns plenty results. Did a test with the default providers and got back plenty of results. Looks like the Furk scraper is broken. Can you please look into this?

Digbt.py Crashes Dialog Box with Results

I am having an issues with digbt.py causing the dialog window to not show other links. I have tested it with a working one and that one only available in the folder and it will prevent the dialog box from coming up. if I comment out the self variables it will work correctly(with no links from digbt). I noticed it was a cf based and are always hard to fix at times. I just wanted to see if you can confirm as well.

Thanks

how to i add some scrapers from a addon

as title mentions i would like to be able to add some scrapers to openscrapers they are from a addon i have and i have tested to make sure they work and i have chacked to see if there is any duplicates and i have removed the duplicates as well

v 0.0.0.7 cfscrape

v 0.0.0.7 the updated cfscrape seems to have broken rlsbb, way less premium links with rlsbb not working, i put the cfscrape from v 0.0.0.5 into v 0.0.0.7 and then rlsbb links are scraped and work

openscrapers settings revert on "ok"

Here is a strange one, for example if i open scraper settings and choose disable all torrent providers it flashes and toggles them all off then i click "ok" and it exits out then when i go back in the torrents are all toggled back on, but if i disable all and then hit cancel to close settings and then go back in my changes will have been saved.... its as if Ok and Cancel are acting in reverse ? im using kodi 18.1

CSV Export Seperator

Hey @nazegnl just merged your PR and tested scrape test, all CSV outputs are using ; instead of , so i i have to go into all files and change ; to , lol can you please look at it again?

vidics & xwatchseries

just curious why vidics and xwatchseries are not being used because im getting a lot of links with them for episodes ?

re v199

Awesome job ! thanx to all involved, thumbs up to the new additions in the credits ;)

re v1.106 Some scrapers not working

v1.106
I dont use many free scrapers, i do use most debrid scrapers, of the scrapers i use i have had these issues, P.S. Thank you for all your efforts, im just trying to contribute:
[2020-03-17 05:37:04] [COLOR red][ OPENSCRAPERS DEBUG ][/COLOR]: Error: Loading module: "projectfreetv": cannot import name cfScraper[2020-03-17 05:37:06] [COLOR red][ OPENSCRAPERS DEBUG ][/COLOR]: Request-Error (500): http://www.sceneddl.me/?s=Riviera+S02E10
[2020-03-17 05:37:06] [COLOR red][ OPENSCRAPERS DEBUG ][/COLOR]: Request-Error: (unknown url type: Riviera) => Riviera
[2020-03-17 05:37:06] [COLOR red][ OPENSCRAPERS DEBUG ][/COLOR]: Request-Error: (unknown url type: Riviera) => Riviera
[2020-03-17 05:37:06] [COLOR red][ OPENSCRAPERS DEBUG ][/COLOR]: Request-Error: (unknown url type: Riviera) => Riviera
[2020-03-17 05:37:07] [COLOR red][ OPENSCRAPERS DEBUG ][/COLOR]: MYVIDEOLINK - Exception:
Traceback (most recent call last):
File "/Users/xxxxxxx/Library/Application Support/Kodi/addons/script.module.openscrapers/lib/openscrapers/sources_openscrapers/en_DebridOnly/myvideolink.py", line 107, in sources
posts = zip(client.parseDOM(r1, 'a', ret='href'), client.parseDOM(r1, 'a'), re.findall('((?:\d+.\d+|\d+,\d+|\d+)\s*(?:GB|GiB|MB|MiB))', r2[0]))
[2020-03-17 05:38:03] [COLOR red][ OPENSCRAPERS DEBUG ][/COLOR]: RAPIDMOVIEZ - Exception:
Traceback (most recent call last):
Cloudflare_reCaptcha_Provider: Cloudflare reCaptcha detected, unfortunately you haven't loaded an anti reCaptcha provider correctly via the 'recaptcha' parameter.
[2020-03-17 05:38:03] [COLOR red][ OPENSCRAPERS DEBUG ][/COLOR]: RAPIDMOVIEZ - Exception:
Traceback (most recent call last):
MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?
[2020-03-17 05:39:10] [COLOR red][ OPENSCRAPERS DEBUG ][/COLOR]: RAPIDMOVIEZ - Exception:
Traceback (most recent call last):
Cloudflare_Loop_Protection: !!Loop Protection!! We have tried to solve 3 time(s) in a row.
[2020-03-17 05:39:10] [COLOR red][ OPENSCRAPERS DEBUG ][/COLOR]: RAPIDMOVIEZ - Exception:
Traceback (most recent call last):
Cloudflare_Loop_Protection: !!Loop Protection!! We have tried to solve 3 time(s) in a row.
[2020-03-17 05:39:10] [COLOR red][ OPENSCRAPERS DEBUG ][/COLOR]: RAPIDMOVIEZ - Exception:
Traceback (most recent call last):
Cloudflare_Loop_Protection: !!Loop Protection!! We have tried to solve 3 time(s) in a row.
[2020-03-17 05:39:10] [COLOR red][ OPENSCRAPERS DEBUG ][/COLOR]: RAPIDMOVIEZ - Exception:
Traceback (most recent call last):
Cloudflare_Loop_Protection: !!Loop Protection!! We have tried to solve 3 time(s) in a row.
[2020-03-17 05:39:14] [COLOR red][ OPENSCRAPERS DEBUG ][/COLOR]: RAPIDMOVIEZ - Exception:
Traceback (most recent call last):
Cloudflare_Loop_Protection: !!Loop Protection!! We have tried to solve 3 time(s) in a row.

Seems Rapidmoviez rarely passes CF...

German modules partially broken and incomplete

HD-Streams.org and probably the rest does not give 1080p results and i also think the lower resolutions come from different pages.
In general i think they are largely outdated and we would also benefit from modules for:

- streamkiste.tv
- kinoz.to

Proposal for next update

I'm thinking next update to add the hash to our torrent sources dict. Would make things a little easier for devs doing torrent cached/uncached checking, and or removal.

PubFilmOnline

PubfilmOnline gives this error

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 56 of the file C:\Users\Grim\Documents\GitHub\script.module.openscrapers\lib\openscrapers\sources_openscrapers\en\pubfilmonline.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.

Some scrapers not working in 0.0.2.003

Like RLSbb, SceneRLS,Zoogle (just to name a few)
If i rollback to the previous version, they are working fine.

If you need a log, guide me.

Maybey you can replicate this issue also?

no result "https://movietown.org" branch "develop"

url: https://movietown.org

use: "develop" Kodi on Firetvstick
The module "cfscrap.py" from branch "develop" does NOT return a result.

use: "develop" Kodi on windows
The module "cfscrap.py" from branch "develop" does return a result.

use: "master"
The same module "cfscrap.py" from branch "master" does return in Kodi a result on FireTvSick and on Windows

Sorry for the short text - my english is bad
Thank you

url = https://movietown.org
import openscrapers
from openscrapers.modules import cfscrape
scraper = cfscrape.create_scraper()
sHtmlContent = scraper.get(url).content
print sHtmlContent

OpenScrapers v 0.0.1.109

Thanks for the update , was this update to fix the cloudflare , cfscrape error , for me I'm still not getting any rlsbb links for some reason.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.