gaurav-s-thakur / webscraper_labelled_dataset_collection Goto Github PK
View Code? Open in Web Editor NEWWeb scraper that allows for collection of labelled dataset from websites. Given a file with seed links and multiple labels associated with the link, the scraper collects the text data on these pages and extends to one more level of links and saves all text with the same label for the seed link as seperate records in the target file.