Giter Site home page Giter Site logo

Comments (46)

theScrabi avatar theScrabi commented on May 3, 2024

All right, i'll keep that in mind :)

from newpipe.

medavox avatar medavox commented on May 3, 2024

I'll try implementing this tomorrow, using the URL parameter chschtsch pointed out. theScrabi, this will probably mean adding a field to YoutubeSearchEngine (probably in the Uri.Builder at the start of search(String, int):Result). Not sure yet if the StreamingService interface will need an update.

from newpipe.

medavox avatar medavox commented on May 3, 2024

OK, I've implemented the country selection setting as suggested; n the Settings menu, there is a new option: "Content Country". When this is any option other than null, the following is added to the search query:

&gl=<code>&persist_gl=1

where <code> is a standard 2-letter country code, currently stored in values/settings_keys.xml

The only problem is, I can't find a search string which produces noticeably different results for different country codes.

Any ideas?

from newpipe.

theScrabi avatar theScrabi commented on May 3, 2024

Hm, shekhar12020 is doing changes on that part of the code. You need to ask him if it's OK to modify it. Anyway please add a language parameter to that function. Like this:
Result search(String query, String lang, int serviceId);

You also have to add this parameter in the SearchEngine interface.

from newpipe.

theScrabi avatar theScrabi commented on May 3, 2024

@medavox I noticed that it doesn't matter witch country code you add to the url. It will always show the same results.

from newpipe.

medavox avatar medavox commented on May 3, 2024

Which class does that search() method signature exist to? In SearchEngine I have the version before my edits Result search(String query, int page), and this after: Result search(String query, int page, String contentCountry)

I don't have Result search(String query, String lang, int serviceId) in my copy of the project, and I'm up-to-date with the repo!

and OK, I will ask him.

from newpipe.

medavox avatar medavox commented on May 3, 2024

@theScrabi Yes I've noticed this too.

from newpipe.

theScrabi avatar theScrabi commented on May 3, 2024

Oh I'm sorry, I just said you should add such a parameter. It does not matter where you put it at. :)

from newpipe.

medavox avatar medavox commented on May 3, 2024

OK I will add int serviceId; but what about int page?

from newpipe.

medavox avatar medavox commented on May 3, 2024

You know, if this commit has no value, we can always revert it.
If this method doesn't work, then there's no point keeping it. Using the gl=<code> method on YouTube in a desktop browser DOES change the frontpage videos, but nothing else that I can see.

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

Welp, I just played with languages on YouTube and it seems that changing only country code will affect only main YouTube page, but searching videos will work the same with any country selected. (Just as @medavox pointed it out)

I just realized, that my browser is in English, and therefore it requests English content in http headers every time. So in desktop I actually get Russian page of Youtube, but with English user interface.

But since the app might use simplified methods to make http requests, when using the app, I get a page like if both values was set to RU. (Although, my Android is in English as well) So, to make my idea work, we need to either implement requested language selection alongside with country code, or use system language. Or also, we could do the latter, while keeping in mind to add manual language selection of the app (other than system language) later on. (I guess, this decision affects structure of code, and it's always good to make things extendable). Hope you got what i mean?

BTW, guys, good work anyway!

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

So now I'm trying to find out, how to set language rather than country. It is a bit more complex and hl= won't work (YouTube uses ajax and cookies there).

Now I see that country code is not needed for my goal, but if I'm not mistaken, I saw out there some enhancement request to fill the app's start page with one from YouTube. Then, if we'll need this feature (country selection) anyway, let it just be?

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

Check this out!
When I request a search page using wget or curl, by default I get page in Russian with Russian results (because of my ip address), but if I set --header 'Accept-Language: en', then I get page in English, also with proper English results. It works! The only thing we need is this http header in the request.

from newpipe.

medavox avatar medavox commented on May 3, 2024

It sounds like you've found a method that works! Now we just need to translate this into Java.

So far, it looks like we'll need to add the header to the HTTP request, which will be set on the HttpURLConnection in Downloader.download(), using java.net.URLConnection.setRequestProperty(String, String).

I'll take another look tomorrow, although I'll have to be more careful not to upset everyone else's working sets. I don't want to collide with anyone else's changes.

Unless you'd like to try this yourself @chschtsch ?

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

Yes, why not. I'll give it a shot.

from newpipe.

theScrabi avatar theScrabi commented on May 3, 2024

@medavox I'm sorry (again). I wasn't aware that the integer in the search() function was actually the page parameter. I thought it was the serviceId. search() does not need to know its own id, please don't add that parameter. Result search(String query, int page, String contentCountry) is actually the way I wanted it to be, and you already did it that way xD.

from newpipe.

medavox avatar medavox commented on May 3, 2024

@theScrabi Oh good :) I'm glad.
@chschtsch, let me know how you get on. One reason I didn't implement last night (midnight, local time) was because yesterday getting the countryCode (or now language code, I suppose) from the preferences system to the SearchEngine class was not as easy as I'd thought, due to SearchEngine not having access to a Context (to my knowledge) which it could pass to PreferenceManager.getDefaultSharedPreferences(Context) to get the SharedPreferences object.

Obviously I ended up just passing the string itself from higher up the call stack (in VideoItemListFragment.run(), where I did have access to a Context,

from newpipe.

theScrabi avatar theScrabi commented on May 3, 2024

Yes, please try to keep as much android related stuff as possible away from the extractors, and try to abstract such things. Because it will ensure that we could easily port extractors over to other platforms, if it was necessary one day. So the way medavox implemented it (through a parameter in the search() function) is the right one.

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

I tried manually adding this only line con.setRequestProperty("Accept-Language", "en"); to Downloader.download and it worked.

But I probably don't yet understand architecture of the app enough to create a new setting for language code (not touching the existing countryCode setting, as I suggested earlier) and to bind it up with the rest of code.

I don't know about country code, but I think that language code parameter should be assigned right in Downloader class, because this is a low-level stuff and preferred language should be sent in http headers every time the app makes a request. Just as it works in a web browser.

But talking about country code, it could be set in Downloader class as well, since every page have set of url parameters. So it could be like siteUrl + "gl=" and we'll end up always getting a valid address. But it's a bad way to do that and there should be another solution.

What do you think?

from newpipe.

theScrabi avatar theScrabi commented on May 3, 2024

It seems like shekhar12020 has finished his work on SearchEngine and YoutubeSearchEngine: #73

@chschtsch you could add a function Donwload.donwload, with the String contentCountry parameter in addition. So we can call this function from YoutubeSearchEngine.search().

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

@theScrabi, sorry, I understand what you mean, but I still don't quite understand how it all works. To do that, I need some time to learn.

If anybody wants to add language selection before me, this may be useful gist:b0fd4e8b45ad663d1c8a (languages I scraped from YouTube and formatted to XML)

from newpipe.

theScrabi avatar theScrabi commented on May 3, 2024

Also i'm being silly again. Do not add a language parameter to Download.downlaod(), since this function simply loads a page requested by the url given as its parameter. If something should handle language than it is SearchEngine.search().

from newpipe.

medavox avatar medavox commented on May 3, 2024

@theScrabi you're right, a language parameter is outside the scope of Downloader.download(), because it only downloads arbitrary data from a given URL. However, in order to set the HTTP header field Accept-Language, access to the HttpURLConnection object which is instantiated and used in Downloader.download() is needed.

So we need to do something differently:

  • Create a new version of download() with a different method signature? (my preferred choice)
    • eg download(String siteUrl, String languageCode)
  • Refactor Downloader so that the HTTPURLConnection object is externally available?
  • something else...?

You're right about keeping Android and YouTube stuff self-contained. Even if porting to another platform doesn't happen, Android APIs seem to change every year.

from newpipe.

medavox avatar medavox commented on May 3, 2024

OK, I've tweaked the existing code, and (as far as I can tell) you can now search YouTube in different languages!

@chschtsch, I used your technique for adding the accepted languages field to the HTTP header.
@theScrabi, I created another version of Download with a different method signature: Downloader.download(String, String) : String. Common code between the normal and the language-parameter versions of download() has been moved to a private method, dl().

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

Nicely done, mate! It works as it should.

But is it possible to set language globally? And what do you think about this idea? I just upgraded regexes for extracting date and views count, but it's working only with English, since every language has it's own formatting.

English could be used by default for pages where date and views count are extracted. Then it could be converted to date and int objects respectively and then formated again based on user device's language.

from newpipe.

theScrabi avatar theScrabi commented on May 3, 2024

@chschtsch I recognized that regex for date didn't work that proper. So i changed it here. May still not be working.

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

Yeah, that's the point, you made it work with German format :)

from newpipe.

theScrabi avatar theScrabi commented on May 3, 2024

Hm. At first i thought about putting the regex pattern into a language specific string file. But i don't think that's not such a good solution. What do you think?

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

As I mentioned, English could be used by default for extracting.
This way, you always parse page in English language with English-specific pattern, and then you transform it to date object (I think there is some function in Java for that). And then you transform this date object to string based on device language with another Java function.
Same with views (which also vary in fomat over different languages).

from newpipe.

medavox avatar medavox commented on May 3, 2024

Although it would require downloading a video page up to 2 times (once for the english page, and for the preferred locale), I personally think that solution would be better than language-specific regexes, which would be horrible to maintain (if/when YouTube changes the page designs somehow).

That's just my opinion though.
@chschtsch and thanks :) but what do you mean by set language globally? Default it to the device default setting? Because I had considered that.

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

Here is what I have in mind.

English is always default language for downloading pages. Thus, it's global for requests (but not for presenting the content). When page is downloaded, it's parsed with English-specific regexes. This way, we always get the same result, no matter in which country a user lives, or which language he preferred.

Then, all language/country-specific items like date are converted to objects. This is done to be able to convert them to user's country/language system he set as preferred.

Then these items are actually converted to language of user's system. So this way, you don't even need to download page two times, but only one time, because all preferred locale content is generated in the app.


Here is some example to make it clear:
First of all, the app downloads page. Then it scrapes content:

Title: Justin Bieber - Baby ft. Ludacris
Channel name: JustinBieberVEVO
Likes: 3,498,222
Dislikes: 5,259,228
Views: 1,234,539,113 views
Date: Uploaded on Feb 19, 2010
Description: Music video by Justin Bieber performing Baby feat. Ludacris.
#VEVOCertified on April 25, 2010. http://www.youtube.com/vevocertified

Then the app transforms it to another locale. Let's say, German. In most cases, title, channel name and description stay unchanged. But date and all numbers have different format. So at the first stage the app transforms it to objects (int for numbers and date for date) and we have these unformatted:

Likes: 3498222
Dislikes: 5259228
Views: 1234539113
Date: Date(2010 02 19)

And here comes another Java function for formatting. For date I think this would be something like DateFormat df = DateFormat.getDateInstance(DateFormat.LONG, Locale.GERMANY);:

Likes: 3.498.222
Dislikes: 5.259.228
Views: 1.234.539.113
Date: 19.02.2010

And then 'views' and 'published on' in German are added:

Views: 1.234.539.113 Aufrufe
Date: Veröffentlicht am 19.02.2010

And here we go, with single page downloading we have completely formatted content for any locale. This method works for any YouTube page, and later on, when the app will have other services like Vimeo, it won't need changes, since English is de-facto default (if not the only) language for all major online services. That's why I stick so much to using English by default.

from newpipe.

theScrabi avatar theScrabi commented on May 3, 2024

@chschtsch I,m not sure but I think I've seen, that the date is also delivered as json value inside the player arguments. And i think its an integer value rather than a string. If so it would not be a problem to use this instead.

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

Oh, I see. It would be great, if you could extract these from player meta tags rather than parsing other, preformatted div tags in box below the video. Although, functions for formatting these values into user's locale would be needed then anyway.

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

Now, I could implement extracting and formatting, if you would show me how to access locale setting from YoutubeExtractor class.

from newpipe.

medavox avatar medavox commented on May 3, 2024

Currently working on extracting video data from meta tags and json player arguments. The thing slowing me down is converting this info to locale-specific strings.

So far, I've got the meta field interactionCount (which can also be extracted from the player arg JSON field view_count), and datePublished. Both of these need converting to more human-friendly, locale-specific values (adding commas/dots/spaces etc).

Should I commit these changes so you can all see them?

from newpipe.

medavox avatar medavox commented on May 3, 2024

Also API-wise, I think it would be a good idea to split up Extractor.getVideoInfo() into a set of smaller interface methods, such as getVideoTitle() and getDescription(), so that implementing support for a new streaming service is more guided. Implementers would just implement the individual extraction methods for each VideoInfo field.

That way, it's harder to forget to initialise a VideoInfo field when implementing new streaming services. It also allows each method to use fall-back scraping techniques, and abstract this away into a called method.
For instance, as said above, the view count on a Youtube video can be extracted from either the <meta> tag, or the view_count field in the JSON player args. If one fails, the other technique could be silently used instead.

Any thoughts?

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

Concerning converting these values, check out the folowing StackOverflow answers:
http://stackoverflow.com/a/1951875
http://stackoverflow.com/a/4375470

from newpipe.

medavox avatar medavox commented on May 3, 2024

Thanks for these links; I was in the process of writing a long clarifying question to you, when I realised my question was answered in the Locale doc :)

I'll implement this locale-specific formatting for View Count and Publication Date (the two fields I've changed already), then push these changes to origin.

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

Alright! Could you also do this for likes/dislikes?

from newpipe.

medavox avatar medavox commented on May 3, 2024

Certainly; Right now, I'm just deciding where to put the functionality for converting Youtube language codes to java.util.Locales.

from newpipe.

medavox avatar medavox commented on May 3, 2024

OK, I've implemented locale-specific formatting of views, likes and dislikes, and the video publish date;

but it looks kind of weird to have the search language locale number formatting next to the device language strings, eg "123.456 Views".

Should we display the interface strings in the chosen language, or format the numbers and dates as the device language?

"123,456 Views" or
"123.456 Aufrufe"?
or even keep "123.456 Views"?

@theScrabi ?

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

I built the latest version and everything except likes works perfectly! So we can close the issue now.
About likes, it just needs another way to parse them. I'm already trying to solve this.
Log:
11-12 11:12:24.659 2755-10791/org.schabi.newpipe E/class org.schabi.newpipe.youtube.YoutubeExtractor: failed to parse likesString "3 502 129" and dislikesString "" as integers
The problem here, as we can see, is that likes are not in player meta tags, so they are preformatted for my country once again.

from newpipe.

gregarkhipov avatar gregarkhipov commented on May 3, 2024

Problem solved. It just should be replaceAll("[^\\d]", "").

from newpipe.

theScrabi avatar theScrabi commented on May 3, 2024

@medavox about supporting other services. My intention was that VideoItem detail is filled with invalid values at the beginning. Than each individual service could fill in the information it can provide, and leave out the rest. The UI should later on recognize that, and only display the given information.

I don't know if tha's a good way. Your suggestion however would guarantee that programmers implementing a service know what they should provide, and what they can't.

from newpipe.

medavox avatar medavox commented on May 3, 2024

@chschtsch thanks for the fix! The way I implemented this, I thought that the English page would always be downloaded, but apparently not. I couldn't test that, though. That is a more robust replace regex :)

@theScrabi I think that methods in an interface would make the requirements for implementing support for a new site more explicit. It could also help to split up issues with scraping into smaller sub-problems.

Now that this issue is fixed (hooray!), shall we move this refactoring discussion to a new issue?

from newpipe.

theScrabi avatar theScrabi commented on May 3, 2024

Finally :) good work.

from newpipe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.