I love using Python's BeautifulSoup library. It's a frontend to markup parsers that makes doing hack-jobs that I enjoy quite simple. Things such as my website and other things I've been working on use it.
For example, on this blog I use it to smash together mark up generated by Markdown into the template I like. In combination with Python, I use it to fill in the list of previous entries that you'll see on the link sections.
def add_previous_entries(view, prev_entries):
for prev_entry in prev_entries:
with prev_entry.open() as prev_entry_fd:
prev_soup = BeautifulSoup(prev_entry_fd.read())
title = prev_soup.find('div', class_='article-header').string
link = view.new_tag('a', href=prev_entry.name)
link.string = title
li = view.new_tag('li')
li.append(link)
view.find('ul', class_='previous-entries').append(li)
return view
In the code above
, the 'view' parameter is just the entry being modified. prev-entries are
Path
objects to a set of previous entries to link to. With that, we create a new bowl-o-soup (known as
prev_soup
in this case), and find the title of the entry. (listed under the div with a class of 'article-header'. We then hook it into the view's list of previous entries with a new link and li, and move on.
In another example that's shown below we can also use it for XML quite easily.
def get_show_ids():
def identifier_strings(tag):
return tag.has_attr('name') and tag['name'] == 'identifier'
soup = BeautifulSoup(request.urlopen(SHOW_IDS_URL).read(), "xml")
show_ids = [x.string for x in soup.find_all(identifier_strings)]
return show_ids
This demonstrates using BeautifulSoup to specify what a valid tag to find is via
find
or
find_all
. In this case, I provided a function to look at he name and ensure it has the correct attribute. I then use the
Tag
's
string
attribute to access the text value contained within that node.
Of course there is plenty more you can do with BeautifulSoup. I recommend you read the pretty solid set of documentation the library includes .
Now go forth, and commit crimes against markup everywhere!