Links in twitter feeds in Liferea

I use Liferea to consume feeds. In turn, I consume twitter by RSS. However, twitter’s RSS feeds suck. Urls are not clickable, user names are not links, nothing. It’s flat text.

Using Liferea’s ability to locally parse feeds and a little inspiration, I hacked up a sed script to make my twitter feed all pretty. It works great for me, YMMV.

I published the script here, under the GPL. To use it, save the source into a file somewhere, make that file executable, then choose “Use conversion filter” in Liferea and select the file you just created. If you have problems, you could try leaving a comment here, I might be able to help.

16 thoughts on “Links in twitter feeds in Liferea”

  1. Very nice idea, but it didn’t work for me… (?). Liferea says this:
    /home/winkler/bin/twitter_liferea.sh exited with status 127
    I’m using Ubuntu 9.10…

    1. Wow, I’m amazed anybody picked up on this!

      Hmm. Try doing this, see what happens:
      wget -q -O - http://twitter.com/statuses/user_timeline/3038571.rss > twitter-feed-raw.xml
      wget -q -O - http://twitter.com/statuses/user_timeline/3038571.rss | liferea_add_links > twitter-feed-parsed.xml

      Now compare the two files and see what’s going on. Or, if you get some kind of error from the liferea_add_links script, maybe that’ll help explain what’s going wrong. You can post the output of this on pastebin if you like, that might help me debug.

      1. Hi Callun, thanks.
        Internet is big — I was surprised someone had the same issue I was having! Your suggestion via wget helped to discover the issue: The problem was that I had saved directly from the browser, and the end-of-line character was in DOS format (^M). Just used dos2unix and it solved.

        Now there is a second problem: Liferea says:
        There were errors while parsing this feed!
        Details
        Could not detect the type of this feed! Please check if the source really points to a resource provided in one of the supported syndication formats!

        XML Parser Output:
        The URL you want Liferea to subscribe to points to a webpage and the auto discovery found no feeds on this page. Maybe this webpage just does not support feed auto discovery.Could not determine the feed type.

        You may want to validate the feed using FeedValidator.

        The diff output, as you suggested — thanks! — is below (just the first lines; should be enough, I think):
        ander@vertigo:~/bin $ diff orig.txt modf.txt
        11,12c11,12
        < chmac: @photomatt That's a shame. After pushing the GPL so hard on #thesiswp I was optimistic. Can I do anything to help the process?
        < chmac: @photomatt That's a shame. After pushing the GPL so hard on #thesiswp I was optimistic. Can I do anything to help the process?
        ---
        > chmac: @photomatt That's a shame. After pushing the GPL so hard on <a href="http://search.twitter.com/search?q=%23thesiswp">#thesiswp</a> I was optimistic. Can I do anything to help the process?
        > chmac: <a href="http://twitter.com/photomatt">@photomatt</a> That's a shame. After pushing the GPL so hard on <a href="http://search.twitter.com/search?q=%23thesiswp">#thesiswp</a> I was optimistic. Can I do anything to help the process?
        21c21
        < chmac: @photomatt You see my last? http://twitter.com/chmac/statuses/19950395529 Any idea when email subscribe might be released?
        ---
        > chmac: <a href="http://twitter.com/photomatt">@photomatt</a> You see my last? <a href="http://twitter.com/chmac/statuses/19950395529">http://twitter.com/chmac/statuses/19950395529</a> Any idea when email subscribe might be released?
        39c39
        < chmac: @hostroute Having real difficulty becoming a customer. Slow support desk response, .ac registration doesn't work, UK / US billing issues. :(
        ---
        > chmac: <a href="http://twitter.com/hostroute">@hostroute</a> Having real difficulty becoming a customer. Slow support desk response, .ac registration doesn't work, UK / US billing issues. :(

        Thanks for the help!

        1. Ok, so it looks like the parsing is working correctly. I’m not sure what’s going wrong with Liferea otherwise. You could try using the same wget commands but with your actual feed url from Liferea. Then tell Liferea to get the feed from a local file, and see what happens. I played around like that until I had it all working.

        2. Thank you Callum! I’ll try to futz around a bit more and’ll post here once I discover what may have gone wrong…

      2. Ok, discovered the issue: Many of my friends, and also myself, often tweet in Portuguese, which may contain some special characters, such as á à ó é í ú ç, etc. These characters are codified with an ampersand plus a hash (&#), followed by a number, e.g. ç is &#231 and á is &#225.

        The sed script is replacing what starts with # for the search feature in Twitter (which is cool). However, if this is done when these characters are present, then it leaves two consecutive ampersands (&&) which seems to be illegal in RSS, then Liferea complains and nothing loads.

        So, as a quick fixture, I’m removing that part of the script 🙁 which won’t allow the search feature, but that’s a minor issue compared with the big plus which is the HTTP links working!

        Thanks a lot again!

        1. Aha, great, thanks for chasing down that bug. Ok, maybe there’s a simpler solution. I’ll search for any hashtag with a space before it instead of anything#tag only #tag will trigger. I’ve updated the script here, can you test with your feeds and let me know if it works?

  2. Thank you! It works nicely now! Very simple and smart solution, indeed! There is a rare case in which the word to be searched itself contain a special character, eg. #sagitário, such as a friend has just tweeted, then the link becomes #sagit — whatever, these cases are rare and don’t worth pursuing a rule such as “replace all # except those preceded by &”. The vast majority are working perfectly. Your script is short, elegant and fast, all with just one sed line.

    Btw, you seemed to have picked the Calluna font to your blog, which seems a wise choice, not only because it’s free to use, has some advanced OpenType features, is pretty, but also, it’s similar to your name. Very cool!

    Best,

    1. Hmm, good catch. Ok, I think a simple solution is to add &#; to the list of characters to be included in the hastag. Just ran a quick test, it works with #sagitário. 🙂 Would you like to try the latest code and see if it works for you?

        1. Thanks to you for helping improve the script.

          It’s not quite perfect (yet!). It will think #behave; is a hashtag including the ;. I think most people usually end a hashtag with a space or a colon, maybe a full stop (period), so I think the script will work fine 99% of the time. Good enough for now. 🙂

          1. I see. Not sure how to add this to a single-line command, but maybe a search/replace changing the semi-colon for nothing, then the remaining is replaced for the actual link, as it is now. If there is no semi-colon, then it doesn’t change the results. Just thoughts…
            Anyway, it’s so rare that I think the script will work in virtually 100% of the cases.
            Nice!

            1. There’s probably an elegant option around allowing &#[0-9]*; to be part of the hashtag, but it’s more taxing than my brain can handle right now. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *