issung / gchan Goto Github PK
View Code? Open in Web Editor NEWScrape boards & threads from 4chan. Download images, videos and HTML if desired.
License: GNU General Public License v3.0
Scrape boards & threads from 4chan. Download images, videos and HTML if desired.
License: GNU General Public License v3.0
Couple of things:
Write a test first to check the performance difference.
GenerateNewFilename in ImageLink.cs looks a bit scuffed, it has a switch block assigning to result
, and then right below a big chunk of if/elses assigning to result
as well, then some commented out code. Looks like the switch is missing a case too? Update this to the new preffered switch expression pattern matching C# feature, much shorter and easier to read.
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/switch-expression
Example of switch expression being used in GChan here:
GChan/GChan/Controllers/MainController.cs
Line 319 in 8345b0c
Line 239 in 8345b0c
Couple of things:
Tim
property some documentation, explain what it is by reading the documentation here: https://github.com/4chan/4chan-API/blob/master/pages/Threads.mdno
as a parameter, save it in a new public property (give the new property some good documentation too).GenerateNewFilename
method to GenerateFilename
.URL
property to Url
(don't forget to update the ToString
too).Any chance you could add support for that site?
I have a load of threads being scraped, but all of the threads I'm scraping off of /b/ don't appear to be scraping. the threads themselves appear, but FileCount never updates and remains at zero with the designated folder doing the same.
I tested it with out of the box settings (+ Download html) and the https://boards.4channel.org/v/
It doenst give out a error message, just programm crashed. I cant change any settings, because it crashes asap after starting.
In GChan/Utils.cs CreateNewTracker(LoadedData data) a tracker is made by creating one from the url from the db, then setting the properties one by one, this looks nasty and will also be spamming NotifyPropertyChanged events one by one to the UI. This will be impacting startup time greatly when the user has a lot of threads saved.
There is a TODO in the code:
// TODO: Should be making trackers based on the LoadedData (pass loadeddata to constructor).
// Rather than making them and then loading them with more data.
// This would help app-startup ui responsiveness as it would reduce the notify property changed spam greatly.
Will require some refactoring of how trackers get made, not much.
Gchan will freeze, become unresponsive and stop downloading and will need to be manually closed using task manager when adding multiple boards from 4Chan. Next launch will carry over settings if "Save URLs on exit" is enable resulting in clearing files to fix temporarily.
Add board eg. /w/. Will work and start scraping. Add second board eg. /wg/, Gchan will freeze, not respond to any further commands resulting it to be force closed manually with task manager. Gchan will further not open if the "Save URLs on exit" is enabled
Delete "boards.dat" and "threads.dat" from ProgramData Folder. Will restore normal use till next attempt to add multiple boards
System:
Edition Windows 10 Enterprise
Version 20H2
OS build 19042.928
This is a headache from YChan (which got forked into GChan by me).
The 4chan API returns JSON, and then it is converted into XML so that XPATH can be used on it, which is kind of like querying it.
Apparently there is a JSON alternative in Newtonsoft, which we already have as a dependency: https://www.newtonsoft.com/json/help/html/QueryJsonSelectTokenJsonPath.htm\
This will improve performance and also make the code less convoluted.
NOTE: This is less complicated for board searching and thread imagelink searching, the html page scraping is a bit more tricky. Try the prior options first.
I am not good with programming so the complexity of implementing the following ideas is not something i can judge. However i have some ideas that i think can improve the functionality and usability of your software:
Have a small box (or another "cell") to the left of each thread filled with Green for "thread is alive", and Yellow/Red for if a thread has 404'd (i guess this is easy to check since 4chan redirects you when a thread has 404'd) AND if the thread has been archived (I guess one could see if the word "Archived" is present in the .html file at the html element section related to it.)
Why: So you can tell when a thread is ready to be deleted from the list, instead of checking thread in a web browser.
An option to "append" new posts to the html file, instead of "rewriting" the html file for each new post. I think this is a good addition to the program because some users delete their posts, and when the html file is "rewritten" after the post is deleted their post is gone. This can cause some confusion when reading the locally saved thread.
In short: Persistence of deleted posts.
Also i noticed a minor bug:
If you rename a thread that has any title such as "no subject" to the title "Cool thread" and click OK it is saved.
If you then again rename the "Cool thread" to "Magic thread" but click "cancel" it does not revert to "Cool thread", it reverts the name back to "no subject" or whatever the original threads name was from 4chan.
In short: When clicking cancel in the "rename thread" dialog box, it reverts it to the original thread name from 4chan instead of the last name you gave it.
Instinctively in order to revert the thread name back to the original i would delete the name i gave the thread (leaving the text-box empty) and click OK. And cancel would simply perform the action "no change to thread name > close dialog box".
I want to say again that you have made a very good program, i use it a lot! Thank you for making this.
Give the user the choice on whether or not to save thumbnails for threads when the Save HMTL
setting is in use.
This application saves data the user inputs like threads/boards to scrape across application closes/opens.
Originally it was done in a .txt file, very nasty.
Sometime mid 2021 I rewrote that to store the data in an sqlite database in a new class DataController.cs
. This is okay but the implentation is a bit shit and there's no easy way to do migrations (changes do the database schema) which I would like have for the future.
I want to stick with SQLite but use EFCore over the top, it gives us migration ease "for free" and makes working with the database easier. https://learn.microsoft.com/en-us/ef/core/get-started/overview/first-app?tabs=netcore-cli
Need to decide if we try putting the existing Board/Thread classes into the DB as is, or if we have seperate classes for database storage and map them back and fourth. Needs discussion.
Is 8kun functionality available yet?
At the moment the directory select setting dialog uses this stupid little window:
I hate this version of the windows directory select dialog, I much prefer the "full" one, you know the one that looks like this:
I did it once before in my other project SorterExpress, here's a link to the source code of the method that opened it: https://github.com/Issung/SorterExpress/blob/develop/src/SorterExpress/Utilities.cs#L171
From memory it required installing a NuGet package because for some reason its a direct windows API reference or something.
We don't have any tests at the moment but at the very least we want to make sure that before any work gets merged it atleast compiles OK. This is a good intro lession to the concept of CICD.
Note because GChan is .NET Framework and not .NET Core we will need to use a Windows builder, not Linux.
https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-net
New download managers from bugfix/imagelink-download-duplicates-raceconditions-threadsafety
branch will allow cancellation of downloads in progress, and stopping future downloads too.
A few things are needed:
Search code for comments labeled with this issue's number.
Not sure if it is intentional, however clearing the list with the "clear" button does not rename folders according to the options set.
Maybe that is intentional but i thought it was odd, due to the redundancy of right-clicking and choosing "remove" for each item.
Right-clicking and choosing remove does rename folders according to options.
Great program though, good work!
Nowadays writing code without tests makes me nervous.
Tests are a great way to:
My preferred testing framework is xunit.
GChan.Test
.CICD:
Been wanting to do this for a while, any objections from users/contributors?
I got this in the logs
[29/12/2020 00:07:30] - AppDomain_UnhandledException - FormatException - The input string is not in the correct format.
in System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal)
in System.Number.ParseInt64(String value, NumberStyles options, NumberFormatInfo numfmt)
in GChan.Trackers.Thread_8Kun.GetImageLinks()
in GChan.Trackers.Thread_8Kun.Download()
in GChan.Trackers.Thread.Download(Object callback)
in System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
in System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
in System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
in System.Threading.ThreadPoolWorkQueue.Dispatch()
GChan doesn't download any image, it marks everything as gone.
var
where possible.using
on objects where possible to improve memory usage, e.g. WebClient occurences.GetThreadSubject()
is terrible and is copied in multiple places, see if it can be improved.Title is self-explanatory, the URLs aren't saved when GChan exits with the appropriated option ticked.
Also the thread list doesn't updates itself until I click each thread. So if I add several threads appended with a comma they are all added to the queue but all stay at FileCount 0 until I click them, then it reloads.
Edit: Something more I was forgetting about, when the queue is long enough there is no scrollbar and it can't be scrolled with the mouse wheel either. It can be scrolled with the keyboard arrows though.
First time using this app so sorry if these are known problems, and thanks, it works great.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.