Giter Site home page Giter Site logo

cssq's Introduction

cssq

Filter HTML with a CSS query. Because regular expressions on HTML output is never good. And one-liner scraping is awesome.

You can pass standard input or you can use a URI in the arguments to cssq. Note however it is probably in most cases to copy a curl or similar command and pipe to cssq to let cURL handle all the heavy lifting (cookies, data, request method, etc).

If you pass a URI to cssq the request is made via GET method and nothing beyond this is supported such as authentication or cookies.

Installation

Pip

pip install cssq

Clone the repository

Note: please use a virtualenv.

git clone git://github.com:Tatsh/cssq.git
cd cssq
# Get beautifulsoup4>=4.3.2
# Get requests>=2.4.1
# Get html5lib>=0.999
python setup.py install

If you want to use pip for all this manually:

pip install -e .

Examples

Use a URL

cssq 'http://www.google.com' 'style'

Output (truncated):

<style>#gbar,#guser{font-size:13px;padding-top:1px !important;}#gbar{height:22px}#guser{p...

Note that if only one matching element is found, that element is printed (index 0). If more than one element is found, a set of strings is printed in JSON format.

If JSON is printed, you may want to consider piping to jq for even more complex filtering. Example getting the first element:

cssq 'https://stackoverflow.com/' 'li' | jq .[0]

Output (truncated):

"<li>\n                        <div class=\"related-links\">..."

Piping from cURL

You may want to do this to get your content as opposed to the general anonymous user. For example, your favourites list on YouTube.

To do this easily in Chrome, open the Developer Tools then view the Network tab. Go to your Favourites page and then under the list of downloaded items the first item should be the page itself. You can right-click this item and choose Copy as cURL (note the command may be long. Then paste that into your terminal and pipe it to cssq. You will see a set of <a> tags, JSON encoded.

curl 'https://www.youtube.com/playlist?list=****' \
    -H 'pragma: no-cache' \
    -H 'dnt: 1' \
    ...  | cssq '#pl-video-table .pl-video-title-link'

cssq's People

Contributors

tatsh avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.