Giter Site home page Giter Site logo

htmlsql's Introduction

htmlSQL - Version 0.5

htmlSQL is a experimental PHP library which allows you to access HTML values by an SQL like syntax. This means that you don't have to write complex functions or regular expressions to extract specific values.

htmlSQL queries look like this:

SELECT href,title FROM a WHERE $class == "list"
       ^ Attributes    ^       ^ search query (can be empty)
         to return     ^
                       ^ HTML tag to search in
                         "*" is possible = all tags

This query should return an array with all links that contain the attribute class="list".

The project has been abandoned

htmlSQL was a experiment I made in 2006. I'm not supporting or extending the library anymore, this repository is only for historical purposes. But feel free to fork, modify and study the source code. If you need a reliable library for data scraping I recommend using other modules.

Related projects:

Related links:

Requirements

  • Any flavor of PHP4+ should do
  • Snoopy PHP class - Version 1.2.3 (optional - required for web transfers)
    You find all Snoopy related documents (copyright, readme, etc) in the snoopy_data/ subdirectory.

Usage

Just include the "snoopy.class.php" and the "htmlsql.class.php" files into your PHP scripts and look at the examples to get an idea of how to use the htmlSQL library. It should be very simple :-)

Background / idea

I had this idea while extracting some data from a website. As I realized that the algorithms and functions to extract links and other tags are often the same - I had the idea to combine all functions to an universal usable library. While drinking a coffee and thinking about that, I thought it would be cool to access HTML elements by using SQL. So I started creating this library...

Warning

The eval() function is used for the WHERE statement. Make sure that all user data is checked and filtered against malicious PHP code. Never trust any user input!

Todo

  • Enhance the HTML parser
  • Test htmlSQL with invalid and bad HTML files
  • Replace the ugly eval() method for the WHERE statement with an own method
  • Add more error checks
  • Add unit tests
  • Add a LIMIT function like in SQL

Author

License

htmlSQL uses a modified BSD license, you find the full license text in the "htmlsql.class.php".

htmlsql's People

Contributors

hxseven avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.