Posted on: 02/19/04 06:11pm
By: asmaloney
I noticed a problem with the "What's Related" box - it would only add links up to the first image. So I took a look at the function, made it more efficient, and fixed the problem.
The main difference in the match is that this one doesn't recognize any tags in between <a href=""> and </a>. So, for example, <a href="..."><b>foo</b></a> will not be matched. Maybe I'll futz around with it to handle this if anyone's interested.
function COM_extractLinks( $fulltext, $maxlength = 26 )
{
$rel = array();
preg_match_all( "/(<a href=[^>]+>

([^<]*)(</a>

/i", $fulltext, $matches );
for ( $i=0; $i< count( $matches[0] ); $i++ )
{
// if link is too long, shorten it and add ... at the end
if ( ( $maxlength > 0 ) && ( strlen( $matches[2][$i] ) > $maxlength ) )
{
$matches[2][$i] = substr( $matches[2][$i], 0, $maxlength - 3 ) . '...';
$matches[0][$i] = $matches[1][$i] . $matches[2][$i] . $matches[3][$i];
}
$rel[] = COM_checkHTML( $matches[0][$i] );
}
return $rel;
}
[Note I did not post this in HTML mode because it changed some of the code into smilies...]
A better COM_extractLinks
Posted on: 02/28/04 04:09pm
By: vinny
In case you're curious, this is what we went with:
function COM_extractLinks( $fulltext, $maxlength = 26 )
{
$rel = array();
preg_match_all( "/(<a.*?href="(.*?)".*?>(.*?)(</a>/", $fulltext, $matches );
for ( $i=0; $i< count( $matches[0] ); $i++ )
{
$matches[3][$i] = strip_tags( $matches[3][$i] );
if ( !strlen( trim( $matches[3][$i] ) ) ) {
$matches[3][$i] = strip_tags( $matches[2][$i] );
}
// if link is too long, shorten it and add ... at the end
if ( ( $maxlength > 0 ) && ( strlen( $matches[3][$i] ) > $maxlength ) )
{
$matches[3][$i] = substr( $matches[3][$i], 0, $maxlength - 3 ) . '...';
}
$rel[] = $matches[1][$i] . $matches[3][$i] . $matches[4][$i];
}
return( $rel );
}
Or you can take a look at it in lib-common.php (without the smilely faces) at:
lib-common.php[*1] .
A better COM_extractLinks
Posted on: 02/28/04 04:36pm
By: Blaine
I'm just curious Is there any reason you are not using the [ code ] bb tags when posting code in the forum?
If it is not working well, I'd like to know.
A better COM_extractLinks
Posted on: 02/28/04 05:34pm
By: asmaloney
Vinny - thanks for posting that. Don't we want a case-insensitive match though?
Blaine - I did that because posting it using CODE translates smilies
e.g.
preg_match_all( "/(<a href=[^>]+>([^<]*)(</a>/i", $fulltext, $matches );
A better COM_extractLinks
Posted on: 02/28/04 05:36pm
By: Blaine
Yeh, it appear that updates of recent to GL have effected this feature.
A better COM_extractLinks
Posted on: 02/28/04 10:44pm
By: vinny
Blaine,
I put my code snippet in the code tags, but it put the smilely's in there anyway.
Also, I'm sure you noticed but the QUOTE tags are acting funny as well.
-Vinny
A better COM_extractLinks
Posted on: 02/28/04 10:49pm
By: Blaine
[QUOTE BY= vinny] I put my code snippet in the code tags, but it put the smilely's in there anyway.
Also, I'm sure you noticed but the QUOTE tags are acting funny as well.[/QUOTE]
With the geeklog.net upgrade, the allowable HTML was changed. I need the pre tags for the code block formatting. Dirk fixed it a few hours ago. Let's see if that fixed both the quotes and code formatting.
A better COM_extractLinks
Posted on: 02/28/04 11:17pm
By: vinny
I added the case insensitive flag to the regex for COM_extractLinks. (Good catch asmaloney). It should show up in -rc2, and if not there then in the final release of 1.3.9.
-Vinny
A better COM_extractLinks
Posted on: 07/19/04 10:05am
By: Anonymous
GL 1.3.9sr1 still has the problem that links are only added up to the first image. Perhaps someone could finally get this messy preg_match_all() sorted out?
A better COM_extractLinks
Posted on: 07/19/04 11:12am
By: vinny
I've just tested this is Gl 1.3.9sr1 and it works with the exception of when you have a link like this:
<a href="link1">[image1]</a>
which just won't show up in the whats related field, though links after this image still will. I'll work on this last little bug related to COM_extractLinks(). If you can demonstrate another bug, please post a URL so I can see it.
Thanks,
Vinny
A better COM_extractLinks
Posted on: 07/20/04 05:12am
By: asmaloney
I still have this problem too. I have a story with images [which themselves are links to unscaled versions] and none of the links on the page show up in What's Related.
Here's
an example[*2] from my site.
A better COM_extractLinks
Posted on: 07/20/04 04:29pm
By: vinny
Your problem has nothing to do with images, you use single quotes instead of double quotes in your links i.e.
<a href='link1'>link1</a>
--instead of--
<a href="link1">link1</a>
The HTML spec calls for the use of double quotes. I'll see about accepting both when 1.3.10 is relased though.
-Vinny
A better COM_extractLinks
Posted on: 07/20/04 05:16pm
By: asmaloney
[QUOTE BY= vinny] Your problem has nothing to do with images, you use single quotes instead of double quotes in your links i.e.
[/QUOTE]
Heh. That was one of the first things I checked. Using Firefox if you select some text on the page and use the context menu to 'View Selection Source', it shows double quotes even though the page source shows single quotes... I guess I out-Foxed myself.
[QUOTE BY= vinny]
The HTML spec calls for the use of double quotes. I'll see about accepting both when 1.3.10 is relased though.
[/QUOTE]
Yet the W3C validator validates them alright.
Thanks for catching that for me.
A better COM_extractLinks
Posted on: 07/21/04 03:38pm
By: vinny
The next version of Geeklog (1.3.10) will have a COM_extractLinks that supports single quotes and also nested HTML tags (including images, i.e. [imageX]).
-Vinny