On a recent project, I had to implement a CSV parser that would gracefully handle malformed files. I’m talking about files with unescaped quotes, wacky UTF-8 chars, and various other abominations of nature.

I originally assumed FasterCSV would handle this automagically, but it turns out that the library’s most commonly used methods are pretty strict when it comes to handling CSV files.

For example, parsing a malformed file one line at a time will result in an exception being thrown, even before any rows are yielded to the block:

FasterCSV.foreach("malformed.csv") do |row|
  # use row here...
end

Not cool! I managed to get around this by manually looping over each row and rescuing a malformed CSV exception if one gets thrown:

FasterCSV.open("malformed.csv", "rb") do |output|
  loop do
    begin
      break unless row = output.shift
      # use row here...
    rescue FasterCSV::MalformedCSVError => e
      # handle malformed row here...
    end
  end
end

Anyone have a better way to do this?

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • LinkedIn