Giter Site home page Giter Site logo

thesw4rm / amazon-books-scraper Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 33 KB

Scraper I've written in C++/C for the purposes of learning the language. Trying to make from scratch as well as with libraries so I can learn by doing.

CMake 1.46% C 98.54%

amazon-books-scraper's People

Contributors

thesw4rm avatar

Watchers

 avatar

amazon-books-scraper's Issues

noSSLHttpRequest prints out output asynchronously

When sending a successful HTTP request to http://mirror.vcu.edu/, the response is printed out even though there is no explicit command to do so. Also looks like the colour change for logs is not resetting before this happens, hinting at asynchronous print.

Following is the output, where everything after the first LOG: is green.

Hi

Extracting request header data...

	Extracted host: Host = mirror.vcu.edu	Extracted path: Path = /	Extracted ssl: ssl = false
LOG: Socket file descriptor is 3
LOG: Connected socket at descriptor 3 to IP 128.172.15.65 and port 80
LOG: Request header is GET / HTTP/1.1

Host: mirror.vcu.edu




LOG: Sent HTTP request from socket at descriptor 3 to IP 128.172.15.65 and port 80.
LOG: Starting receive operation
LOG: Received HTTP response from socket at descriptor 3 to IP 128.172.15.65dual.  Anyone using this system expressly consents to such
monitoring.<p>

LOG OFF IMMEDIATELY if you do not agree to the conditions stated
in this warning.
<hr>
<b>CRYPTOGRAPHIC SOFTWARE</b><p>

Due to U.S. Exports Regulations, all cryptographic software on this site is subject to the following legal notice:<p>

This site includes publicly available encryption source code which, together with object code resulting from the compiling of publicly available source code, 
may be exported from the United States under License Exception "TSU" pursuant to 15 C.F.R. Section 740.13(e).<p> 

This legal notice applies to cryptographic software only. Please see the <a href="http://www.bis.doc.gov">Bureau of Industry and Security</a> for more informa
tion about current U.S. regulations.<p>

This server is located in Richmond, Virginia, USA. Use in violation of any applicable laws is prohibited.
<hr>
</body></html>
 from it are
for official University business use as authorized by the 
<a href="https://policy.vcu.edu/sites/default/files/Computer%20and%20Network%20Resources%20Use.pdf"> 
Virginia Commonwealth University Computer and Network Resources Use Policy.</a><p>

Monitoring and recording of users' activities may occur
when there is reasonable suspicion of unauthorized activity and
may be used in administrative, civil, and criminal action against an
indivi and port 80.





LOG: Closed socket at descriptor 3

noSSLHttpRequest reallocs the size of the response pointer instead of response itself

Need to change

while (bytesReceived < (RESPONSE_MAX_LEN * sizeof(char)) && bytesReceived > bytesReceivedPrevious) {
        bytesReceivedPrevious = bytesReceived;
        bytesReceived = recv(sockFD, buffer, RESPONSE_BUFFER_SIZE, 0);
        response = realloc(response, sizeof(response) + RESPONSE_BUFFER_SIZE);
        strcat(response, buffer); //Append to the end, safe because recv takes care of limiting buffer size
    }
    response = realloc(response, sizeof(response) + sizeof(char));
    response[strlen(response)] = '\0';

to

while (bytesReceived < (RESPONSE_MAX_LEN * sizeof(char)) && bytesReceived > bytesReceivedPrevious) {
        bytesReceivedPrevious = bytesReceived;
        bytesReceived = recv(sockFD, buffer, RESPONSE_BUFFER_SIZE, 0);
        response = realloc(response, sizeof(*response) + RESPONSE_BUFFER_SIZE);
        strcat(response, buffer); //Append to the end, safe because recv takes care of limiting buffer size
    }
    response = realloc(response, sizeof(*response) + sizeof(char));
    response[strlen(response)] = '\0';

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.