Using the 'ruby-readability' gem on Rails
July 4, 2024
In the last post we looked at a solution for parsing the text content of a webpage, such as a news article, using the mozilla/readability.js node package on Rails backend.
This time I’ll introduce the ‘readability’ gem to do the same task – in a more Ruby way.
I didn’t manage to get the gem to parse many websites, and I ended up going with the solution presented in the previous post.
Install the gem
Run the following in you Rails project folder:
bundle add ruby-readability
I’ll be using Faraday for my http requests, but you could use another gem. To install Faraday, run:
bundle add faraday
Using ‘ruby-readability’
Get the web page with Faraday
response = Faraday.get(‘example.com/article‘)
Parse the text content you want from the body of the page:
content = Readability::Document.new(response.body).content
Readability::Document instance attributes
You have the following methods available:
.images
.author
.title
.content
In my experience however, the parsing doesn’t always manage to identify the previous sections from the content.