I love using Python's BeautifulSoup library. It's a frontend to markup parsers that makes doing hack-jobs that I enjoy quite simple. Things such as my website and other things I've been working on use it.
For example, on this blog I use it to smash together mark up generated by Markdown into the template I like. In combination with Python, I use it to fill in the list of previous entries that you'll see on the link sections.
def add_previous_entries(view, prev_entries): for prev_entry in prev_entries: with prev_entry.open() as prev_entry_fd: prev_soup = BeautifulSoup(prev_entry_fd.read()) title = prev_soup.find('div', class_='article-header').string link = view.new_tag('a', href=prev_entry.name) link.string = title li = view.new_tag('li') li.append(link) view.find('ul', class_='previous-entries').append(li) return view
In the code above
, the 'view' parameter is just the entry being modified. prev-entries are
objects to a set of previous entries to link to. With that, we create a new bowl-o-soup (known as
in this case), and find the title of the entry. (listed under the div with a class of 'article-header'. We then hook it into the view's list of previous entries with a new link and li, and move on.
In another example that's shown below we can also use it for XML quite easily.
def get_show_ids(): def identifier_strings(tag): return tag.has_attr('name') and tag['name'] == 'identifier' soup = BeautifulSoup(request.urlopen(SHOW_IDS_URL).read(), "xml") show_ids = [x.string for x in soup.find_all(identifier_strings)] return show_ids
This demonstrates using BeautifulSoup to specify what a valid tag to find is via
. In this case, I provided a function to look at he name and ensure it has the correct attribute. I then use the
attribute to access the text value contained within that node.
Of course there is plenty more you can do with BeautifulSoup. I recommend you read the pretty solid set of documentation the library includes .
Now go forth, and commit crimes against markup everywhere!