The project's goal is to perform an analysis of posts made in TibiaBR forum. In order to accomplish that it was developed a robot that, through GET requests, extracts the page content using XPaths to read the tags of the HTML pages.
- Scalable process. Run as many instances as you like to speed up the process.
- Available in the languages Brazilian Portuguese and English.
- The solution is structured in smaller projects to allow easier maintenance.
- Customizable configuration file.
- Microsoft Message Queuing (MSMQ)
- MongoDB (NoSQL Database)
ย
This is the first step that should be executed. It is responsible for reading the configuration file and fetching the URLs for each requested section. When running this search, the process also captures all information about the Section.
Input: Configuration File
Output: Records saved in the "Sections" Collection and in the "forumtibiabr_sections" Queue
This step captures all information related to the Topic class.
Input: "forumtibiabr_sections" Queue
Output: Records saved in the "Topics" Collection and in the "forumtibiabr_topics" Queue
Captures information about comments made on topics.
Input: "forumtibiabr_topics" Queue
Output: Records saved in the "Comments" Collection
A class library to assist the classes used in all the steps above.