Parsel¶

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

Free software: BSD license
Documentation: https://parsel.readthedocs.org.

Features¶

Extract text using CSS or XPath selectors
Regular expression helper methods

Example:

>>> from parsel import Selector
>>> sel = Selector(text=u"""<html>
        <body>
            <h1>Hello, Parsel!</h1>
            <ul>
                <li><a href="http://example.com">Link 1</a></li>
                <li><a href="http://scrapy.org">Link 2</a></li>
            </ul>
        </body>
        </html>""")
>>>
>>> sel.css('h1::text').get()
'Hello, Parsel!'
>>>
>>> sel.css('h1::text').re('\w+')
['Hello', 'Parsel']
>>>
>>> for e in sel.css('ul > li'):
...     print(e.xpath('.//a/@href').get())
http://example.com
http://scrapy.org

Parsel Documentation Contents¶

Contents:

Indices and tables¶

Read the Docs v: v1.5.2

Versions: master; latest; stable; v1.5.2; v1.5.1; v1.5.0; v1.4.0; v1.3.1; v1.3.0; v1.2.0; v1.1.0; v1.0.3; v1.0.2; v1.0.1; v1.0.0; v0.9.6

Downloads

On Read the Docs: Project Home; Builds

Free document hosting provided by Read the Docs.