Giter Site home page Giter Site logo

spider-5's Introduction

spider-5

pyquery也是一个css选择器,id是# ,class是.

初始化

字符初始化

html = '''

'''
from pyquery import PyQuery as pq
doc = pq(html)
print(doc('li'))

URL初始化

from pyquery import PyQuery as pq
doc = pq(url='http://www.baidu.com')
print(doc('head'))

文件初始化

from pyquery import PyQuery as pq
doc = pq(filename='demo.html')
print(doc('li'))

基本css选择器

from pyquery import PyQuery as pq
doc = pq(html)
print(doc('#container .list li'))      /*先查id为container的,再查里边class为list的,再查list里边的li*/

查找元素

子元素

from pyquery import PyQuery as pq
doc = pq(html)
items = doc('.list')
print(items)
lis = items.find('li')
print(lis)

items.children()是直接子元素

父元素,一定只有一个

from pyquery import PyQuery as pq
doc = pq(html)
items = doc('.list')
container = items.parent()
print(container)

祖先元素items.parents()

from pyquery import PyQuery as pq
doc = pq(html)
items = doc('.list')
container = items.parents()
print(container)

兄弟元素

from pyquery import PyQuery as pq
doc = pq(html)
li = doc('.list .item-0.active')
print(li.siblings('.active'))

遍历

from pyquery import PyQuery as pq
doc = pq(html)
list = doc('li').items()
for li in lis:
    print(li)

获得信息

from pyquery import PyQuery as pq
doc = pq(html)
a = doc('.list.active a')
print(a)
print(a.attr('href'))
print(a.attr.href)

获取文本

from pyquery import PyQuery  as pq
doc = pq(html)
a = doc('.list.active a')
print(a.text())

获取HTML

from pyquery import PyQuery as pq
doc = pq(html)
li = doc('.list.active')
print(li)
print(li.html())

dom操作

addClass、removeClass

from pyquery import PyQuery as pq
doc = pq(html)
li = doc('.list.active')
print(li)
li.removeClass('active')
print(li)
li.addClass('active')
print(li)

attr、css

from pyquery import PyQuery as pq
doc = pq(html)
li = doc('.list.active')
print(li)
li.attr('name','link')              /*加了个name属性*/
print(li)
li.css('font-size','14px')       /*style属性*/
print(li)

remove

html = '''
hello,world

sjdfjkhsadkfhdjkf

''' from pyquery import PyQuery as pq doc = pq(html) li = doc('.wrap') print(li.text()) li.find('p').remove() print(li.text())

伪类选择器

from puquery import PyQuery as pq
doc = pq(html)
li = doc('li:first-child')    /*选择第一个li*/
print(li)
li = doc('li:last-child')     /*选择最后一个li*/
print(li)
li = doc('li:nth-child(2)')    /*选择第二个li*/
print(li)
li = doc('li:gt(2)')           /*选择第二个后面的li*/
print(li)
li = doc('li:nth-child(2n)')      /*选择为偶数的li*/
print(li)
li = doc('li:contains(second)')      /*选择内容为‘second’的li*/
print(li)

spider-5's People

Contributors

xiaojun1234 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.