I recently started up FreshRSS in my docker environment. I was super excited about the web scraping feature.
Now that I’m setting it up, it looks like that it is able to scrape single web pages, but I am unable to figure out how to get it to crawl into the actual article to scrape the full content.
Is anyone aware of how to do this. For example, runescape.com/m=news/ This page has a list of articles with a thumbnail, title, category, date, and a short description of the article. Would it be possible for FreshRSS to crawl into the article link and scrape the contents within?
I haven’t tried it yet and it depends on an outside service, but maybe https://morss.it/ can help?
Note: it can be selfhosted
I didn’t know about morss.it. It’s amazing, thank you.
Thanks for that link! That site is able to pull the full article so it makes me think it is possible! I will try to adapt it to FreshRSS.
Thanks again!
It is definitely possible, as RSS readers like ReadYou can do it. Maybe try FreshRSS in conjuction with an RSS reader?