In the mind of...
An everchanging life

Getting Tidy To Work On 64-Bit Linux Systems

Tech

Image from Riebart (Flickr)

For one of the tools I’m making for Intigril’s new site, I need to do some scraping and parsing.  There’s a great Rails plug-in for this called scrAPI.  This plug-in uses a separate plug-in called Tidy.  Tidy is used to clean-up bad HTML so that it’s easier to parse.  A while ago, I wrote an application that utilized both of these plug-ins to do some extensive scraping and parsing.  Everything was working fine until my shared host decided to upgrade their system (isn’t it funny how upgrades usually tend to break everything).

Well, they ended up going to a 64-bit server, which caused a whole lot of issues for me.  It turns out that in order to use Tidy, you have to have the compiled binary for it loaded onto your system.  scrAPI comes with that binary when you install it, but that binary is compiled for 32-bit systems.  When you try to use it on 64-bit systems, you’ll get an error that says the file can’t be loaded.  It’ll read something like this:

Scraper::Reader::HTMLParseError: Scraper::Reader::HTMLParseError: Unable to load /var/lib/gems/1.8/gems/scrapi-1.2.0/lib/scraper/../tidy/libtidy.dylib

It took me a while to figure out what this really meant, but ultimately, it means you’ll need to get a different version of the file “libtidy.so” – one that’s compiled for 64-bit architectures.  This seemed easy enough – after all, all I had to do was find a file and toss it onto the server.

It’s never really that simple, though, is it?

It turned out that my host already had a copy of this file on their server (in fact, they had several copies of this file, which I found by using the command “find / -name ‘libtidy.so’”.  Once I found the right file, I thought that all I’d have to do next would be to tell Tidy to use that file by setting the path in my production.rb config file, as such:

Tidy.path = "/path/to/libtidy.so"

When I did that, though, I got the following error the next time I tried to run my code:

/path/to/gems/tidy-1.1.2/lib/tidy/tidybuf.rb:40: [BUG] Segmentation fault

Things went from bad to worse.  It turns out that this was a known bug, and there was a patch for it.  You can see the details on this page.  I read all of the posts on that page and made the following changes, as suggested:

  1. Added the line "extern void tidyBufInit(void*)" to the ‘load’ method in the file tidy-1.1.2/lib/tidy/tidylib.rb
  2. Added the following method to the same file (Tidylib):

    #tidyBufInit, using default allocator
    def buf_init(buf)
    tidyBufInit(buf)
    end

  3. Added the following line to the initialize method in the tidybuf.rb file:

    Tidylib.buf_init(@struct)

  4. Added the following field to the TidyBuffer struct:

    "TidyAllocator* allocator"

I saved those changes into the version of Tidy that I had frozen to my Rails app.  Once I made those changes and re-deployed my code, everything was working perfectly.

Tags: , , , , , , , , , ,

Leave a Comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>