Skip to main content

Posts

Showing posts from November, 2014

Screen Scraping in Ruby with Watir and Nokogiri

I was given an interesting challenge to scrape some data from a specific site.  Not to write a completed, packaged solution, but rather just to scrape the data.  The rub being, the site uses Javascript paging, so one couldn't simply use something like Mechanize.  While a self-contained product would require inclusion of V8 (as the Javascript would need to be run and evaluated), to just scrape the data allows making use of whatever is easy and available.  Enter Watir . Watir allows "mechanized/automated" browser control.  Essentially, we can script a browser to go to pages, click links, fill out forms, and what have you.  It's mainstay is in testing, but it's also pretty damned handy in cases where we need some Javascript on a page processed... like in this case.  Keep in mind though, it is literally automating a browser, so you'll see your browser open and navigate to pages, etc. when the script runs.  But, there is also a headless browser option.  This is

Install current SBCL on OS X

You must have Command Line Tools installed. If you don't , this tutorial is not for you. Google: installation of XCode and Command Line Tools. Normally, I use brew to install things (when it offers a solution), but in this case the keg version was a couple minor version's off. And, there had been sufficient addition's that motivated me to want the current release. So, building from source was the path of least resistance. First, what not to do : The note's caution against using OS X's Terminal , as their make.sh script pukes a shit-ton of text during the build, and according to them, it can slow the build. I did not experience an issue with this, compared to other builds I've done in the past.   BUT , they also say build can be accomplished with other LISP's installed (you must have a lisp installed prior to building). OMFG , unless you want to wait a month of Sunday's, my experience building with CLISP was slower than the Molasses in January.  D