Python
Offers HTML which is a fairly good html parser. It is parsed into a genshi stream which offers basic xpath searching but not traversal of the tree
Offers good HTML parsing into a DOM tree. based on libxml2 so is pretty robust
Also handels HTML.
!ovTagSpan
My own. Based more on regex than anything.
Genshi
from urllib2 import urlopen from genshi import HTML page = urlopen('http://finance.yahoo.com/q/pr?s=JAVA').read() gs = HTML(page) print gs.select('//td[@class="yfnc_modtitlew2"]/b/text()')
Javascript
Uses Firefox to do the dom parsing.
Solvent
Solvent Based on Piggybank from Simile
Crowbar
Crowbar uses the screen scrapers generted by Solvent in a command line or as a web service.
