This small function receive a text as input and returns an html text with links if the source text contains urls (http://www… but also ftp://… and every other protocol), emails, twitter’s usernames (with @ at the beginning) and also twitter tags (with # at the beginning).
Those replaces are possible with the php preg_replace function:
function parse_twitter($t) { // link URLs $t = " ".preg_replace( "/(([[:alnum:]]+:\/\/)|www\.)([^[:space:]]*)". "([[:alnum:]#?\/&=])/i", "<a href=\"\\1\\3\\4\" target=\"_blank\">". "\\1\\3\\4</a>", $t); // link mailtos $t = preg_replace( "/(([a-z0-9_]|\\-|\\.)+@([^[:space:]]*)". "([[:alnum:]-]))/i", "<a href=\"mailto:\\1\">\\1</a>", $t); //link twitter users $t = preg_replace( "/ +@([a-z0-9_]*) ?/i", " <a href=\"http://twitter.com/\\1\" target=\"_blank\">@\\1</a> ", $t); //link twitter arguments $t = preg_replace( "/ +#([a-z0-9_]*) ?/i", " <a href=\"http://twitter.com/search?q=%23\\1\" target=\"_blank\">#\\1</a> ", $t); // truncates long urls that can cause display problems (optional) $t = preg_replace("/>(([[:alnum:]]+:\/\/)|www\.)([^[:space:]]". "{30,40})([^[:space:]]*)([^[:space:]]{10,20})([[:alnum:]#?\/&=])". "</", ">\\3...\\5\\6<", $t); return trim($t); }
[…]A set of regular expressions to retrieve URLs, emails, twitter’s usernames and argument[…]
Another great script Giulio. Just having a few “problems” and regular expressions are always confusing me so I don’t know how to fix it.
I made a test page at http://blog.atgp.nl/parsetest.php so you can see what I mean. If I put two or more hash tags or twitter usernames right after each other the script seems to skip every second one.
Maybe it’s just a small issue?
Ah never mind I found a solution that works:
+@([a-z0-9_]*) ?/i”
remove the space into
+@([a-z0-9_]*)?/i”
Now all works correct. Removed the testpage.
Mmm are you sure? I’ve not tested it. That space is followed by a ? which means that this expression matches even if the previous space there isn’t. If you remove the space, the ? means that the expression matches even if the block ([a-z0-9_]*) there isn’t. mmm.
As said: regular expressions are not my thing :-) so I don’t really understand what I’m doing – it’s mainly a case of trial and error.
But, it seems to work the way I made the changes. I’ve put the testpage back online so you can see for yourself:
http://blog.atgp.nl/parsetest.php
I don’t mind an alternative solution :-)
Ok, it works! That’s enaugh! :-)