Monday, July 13, 2015

lxml

Brief update: I've finished a few of my data parsers that will be used to format the raw data into something easily imported into my database.  I've run into a couple of snags when it comes to the final output, but I'm definitely on the right track.

That said, none of it would have been possible without the lxml library:

http://lxml.de/

The lxml library allows you to do cool things with xml in Python.  For my purposes, I use it to do the following:


  • build an XML tree object in Python
  • parse and step through the tree to find specific data
  • add this data to lists
Once the data is in lists, I can covert it to a csv format.  However, I've run into some issues with this and been forced to re-examine my approach.  Maybe I don't need .csv files.  Maybe I don't even need lists.

I'm going to put more work in and see how easily I could put the data back into an XML format, which can also be uploaded into a database.  I like the idea of putting it back into XML because it does not rely so heavily on this tabular, structured paradigm.  Not only that, but there's more brains behind the XML.  If I need to go back and trouble-shoot my parsings in XML, there are tools that can help me step through the data the same way they helped me create it.  .csv files leave me with less options after the fact.

Ultimately, I just need to get through the data as quickly as possible and get it into my database, but my own personal quirks are holding up the process.  This is because my code has to be as pretty as possible.  If I feel like I'm using too many (or just misusing) control structures because I've not thought everything through, it starts to look sloppy in my eyes.  The one thing about Python is that it really makes me want to write as few lines as humanly possible to achieve my goal.  That is, after all, what libraries like lxml are intended to help me do.  Why reinvent the wheel every time I need to complete a task?  Unfortunately, I do try to reinvent the wheel sometimes, and obsess way too much over making it as perfectly round as possible.


No comments:

Post a Comment