Giter Site home page Giter Site logo

soupselect's Introduction

A single function, select(soup, selector), that can be used to select items
from a BeautifulSoup instance using CSS selector syntax.

Currently supports type selectors, class selectors, id selectors, attribute
selectors and the descendant combinator.

soupselect requires BeautifulSoup v3.0.3 or above; it will not work with v2.x

Example usage:

    >>> from BeautifulSoup import BeautifulSoup as Soup
    >>> from soupselect import select
    >>> import urllib
    >>> soup = Soup(urllib.urlopen('http://slashdot.org/'))
    >>> select(soup, 'div.title h3')
    [<h3>
    <span><a href='//science.slashdot.org/'>Science</a>:</span> ...
    </h3>, <h3>
    <a href='//slashdot.org/articles/07/02/28/0120220.shtml'>Star Trek To ...
    </h3>
    ... ]

You can also monkey-patch the BeautifulSoup class itself:

    >>> from BeautifulSoup import BeautifulSoup as Soup
    >>> import soupselect; soupselect.monkeypatch()
    >>> import urllib
    >>> soup = Soup(urllib.urlopen('http://slashdot.org/'))
    >>> soup.findSelect('div.title h3')
    [<h3>
    ...

soupselect's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

soupselect's Issues

A method to return a tag or none might be useful.

Currently the only method you add is soupSelect which returns a list. If I only want one tag, I have to check the length and get the first element or add IndexError exception handling. I think it would cleaner to offer a second method that will return a tag or None instead of a list.

I use the following to patch soupselector (after it patches BeautifulSoup.) Maybe you would like to add it to your code?

(If I had designed it, I would have soup.soupSelect only return one element and .soupSelectAll, to match BeautifulSoup's find and findAll methods.)

def monkeyPatchSoupSelector(BeautifulSoupClass=None):
    def findSelectOne(soup, selector):
        """
        Soup selector method to return a tag or None, so you don't have to use index exceptions.
        """
        tags = soup.findSelect(selector)
        if not tags:
            return None
        return tags[0]
    if not BeautifulSoupClass:
        from BeautifulSoup import BeautifulSoup as BeautifulSoupClass
    BeautifulSoupClass.findSelectOne = findSelectOne

License a bit unclear

Hi Simon,

I love soupselect, it's awesome! Because it's not available on PyPi (hint) I'm bundling it with a Python module I wrote to convert HTML files to Vim help files (html2vimdoc). It took me a while to determine the license of soupselect: I found the module here on GitHub and only later found the Google Code page which mentions the MIT license. You might consider adding a LICENSE file or a mention in the README.

Thanks for creating & publishing soupselect!

- Peter

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.