Highlighting search terms
An article at A List Apart from August 2004, “Enhance Usability by Highlighting Search Terms,” suggests that websites highlight the search terms found in the referrer information. For example, if someone Googles “widget” then clicks a resulting link to your site, you might want to highlight every instance of “widget” on your page. The script promoted by authors Brian Suda and Matt Riggott uses PHP; I’m more comfortable with Perl, so not finding a robust, easy-to-use Perl version I wrote my own.Controversy
Some writing in the article’s comments questioned the purpose and method of the PHP highlighting script. The principle objections, in my opinion:- Highlighting is an unnecessary distraction
- Highlighting should be done on the client side (with Javascript) to save server resources
- The script breaks, for example when the greater than symbol “>” appears in an image’s alt tag
Highlighting is an unnecessary distraction
This is probably the strongest objection, and my response is pretty weak: I happen to like it. A better response is that you can add to the top of the page a brief explanation of the highlighting along with a link to the same page, so users can click the link and be rid of the highlighting.Highlighting should be done on the client side (with Javascript) to save server resources
Another good objection. However, as the authors point out, this approach works only on Javascript-enabled browsers. More importantly, Javascript doesn’t have the thorough HTML parsing capability I’d like to see (although it might approach the regular expression ability offered by the article’s PHP script). This pertains to the next objection:The script breaks, for example when the greater than symbol “>” appears in an image’s alt tag
The problem here has to do with what the authors have the PHP script doing. As they admit, “Implementing a full SGML/XML parser was well beyond the scope of our project.” I didn’t find that very satisfying. In fact, my early Perl version of their script did wacky things to the text within the alt and title tags. Here’s where Perl modules come in handy.Perl HTML::Parser class
I’m not presuming to weigh in on the Perl vs. PHP dispute. My own opinion is that each has its value for different applications. The many Perl modules available for free download from CPAN make Perl a pleasure to use. The HTML module does all the heavy lifting for my script, and for that I’m thankful—I doubt my ability to come up with an efficient yet thorough regular expression to deal with HTML. I can let the minds behind the HTML::Parser module worry about that, and as circumstances change, I only have to update my module reference. Perl modules are easy to install. If your web host doesn’t have the HTML::Parser module (they probably do have it) and doesn’t install it when you ask, without a doubt you should switch hosts. This is a feature that the cheapest hosts offer.The script—a subroutine
I’ll discuss it more thoroughly at the bottom, but here’s what the script does in general: It is a subroutine that takes the HTML to be parsed as its sole argument; then it returns that HTML with text highlighted (if it should be). So for example, if your Perl variable “$myhtml” contains the page you want to be parsed, you could include a line such as the following in your code. It would then replace $myhtml with the relevant search terms highlighted. I do something similar just before printing the HTML to the browser.$myhtml = &decide_texthighlight($myhtml);
#------------------------------------------
#determine if search terms should be highlighted
# script with explanations and updates at https://austinmatzko.com/blog/2005/02/19/perl-text-highlighting/
sub decide_texthighlight {
#argument: text to highlight if applicable
#uses HTML::Parser
#returns text with highlighting
#------------------------------------------
#--------------------------------
# Variables to set
#--------------------------------
my $highlightstarttag = '<span class="texthighlight">';
my $highlightendtag = '</span>';
# tags containing text that should not be highlighted
my @ignoretags = (
'title',
'script'
);
# A list of search query keys used by various search engines. You probably don't
# need to change these unless you want to add your own site's unique key.
# Google uses 'q' and Yahoo, 'p', for example.
my @querykeys = (
'q',
'p',
'ask',
'searchfor',
'key',
'query',
'search',
'keyword',
'keywords',
'qry',
'searchitem',
'kwd',
'recherche',
'search_text',
'search_term',
'term',
'terms',
'qq',
'qry_str',
'qu',
's',
'k',
't',
'va'
);
#-------------------------------------
# end variables you need to set
#-------------------------------------
my $content = $_[0];
my %form;
my $num_ignoretags = 0;
# look for search terms if the referrer line contains '?'
if ($ENV{'HTTP_REFERER'} =~ m/?/g) {
my $buffer = $ENV{'HTTP_REFERER'};
#remove everything leading up to and including '?'
$buffer =~ s/(^.*?)//;
my @pairs = split(/&/, $buffer);
foreach my $pair (@pairs) {
my ($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$form{$name} = $value;
}
my $searchtext;
foreach (@querykeys) {
if (exists $form{$_}) {
$searchtext = $form{$_};
}
}
#take 'and's and 'or's out of $searchtext
$searchtext =~ s/(?: and | or )/ /gi;
my @words = split /W+/, $searchtext;
#-------------------------------------
# set up the package for parsing text
#-------------------------------------
my $html;
package HTMLStrip;
use base "HTML::Parser";
sub start {
my ($self, $tag, $attr, $attrseq, $origtext) = @_;
# add in original start tags
$html .= $origtext;
# determine if the tag is one to ignore
$num_ignoretags = grep(/^$tag$/i, @ignoretags);
}
sub text {
my ($self, $text) = @_;
#if not within a tag to ignore
if ($num_ignoretags < 1) {
#replace all the search terms in the content with highlighted search terms
foreach (@words) {
#make sure the search 'word' isn't some garbage or blank space
if ($_ =~ m/w/) {
$text =~ s/($_)/$highlightstarttag$1$highlightendtag/gi;
}
}
}
$html .= $text;
}
sub end {
my ($self, $tag, $origtext) = @_;
# add in original end tags
$html .= $origtext;
$num_ignoretags = 0;
}
#invoke the package
my $p = new HTMLStrip;
$p->parse($content);
$p->eof;
$content = $html;
}
return $content;
} #end sub decide_texthighlight
Explanations
#--------------------------------
# Variables to set
#--------------------------------
my $highlightstarttag = '<span class="texthighlight">';
my $highlightendtag = '</span>';
.texthighlight {
color:black;
font-weight: bold;
background-color:#ffff66;
}
# tags containing text that should not be highlighted
my @ignoretags = (
'title',
'script'
);
my @querykeys = (
'q',
'p',
'ask',
'searchfor',
'key',
'query',
'search',
'keyword',
'keywords',
'qry',
'searchitem',
'kwd',
'recherche',
'search_text',
'search_term',
'term',
'terms',
'qq',
'qry_str',
'qu',
's',
'k',
't',
'va'
);
my $content = $_[0];
my %form;
my $num_ignoretags = 0;
if ($ENV{'HTTP_REFERER'} =~ m/?/g) {
my $buffer = $ENV{'HTTP_REFERER'};
#remove everything leading up to and including '?'
$buffer =~ s/(^.*?)//;
my @pairs = split(/&/, $buffer);
foreach my $pair (@pairs) {
my ($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$form{$name} = $value;
}
$buffer =~ s/(^.*?)//;
qnow is associated with the value
"widget"in the %form hash.
my $searchtext;
foreach (@querykeys) {
if (exists $form{$_}) {
$searchtext = $form{$_};
}
}
$searchtext =~ s/(?: and | or )/ /gi;
my @words = split /W+/, $searchtext;
my $html;
package HTMLStrip;
use base "HTML::Parser";
sub start {
my ($self, $tag, $attr, $attrseq, $origtext) = @_;
# add in original start tags
$html .= $origtext;
# determine if the tag is one to ignore
$num_ignoretags = grep(/^$tag$/i, @ignoretags);
}
sub text {
my ($self, $text) = @_;
#if not within a tag to ignore
if ($num_ignoretags < 1) {
#replace all the search terms in the content with highlighted search terms
foreach (@words) {
#make sure the search 'word' isn't some garbage or blank space
if ($_ =~ m/w/) {
$text =~ s/($_)/$highlightstarttag$1$highlightendtag/gi;
}
}
}
$html .= $text;
}
Oh, how I love widgets!would become
Oh, how I love <span class="texthighlight">widget<span>s!Then we add the text to our $html variable.
sub end {
my ($self, $tag, $origtext) = @_;
# add in original end tags
$html .= $origtext;
$num_ignoretags = 0;
}
#invoke the package
my $p = new HTMLStrip;
$p->parse($content);
$p->eof;
$content = $html;
}
return $content;
One Comment
I want to have the specified string inside the print command to be get hilighted either bold or anytype of hilighting.please send me the required details pertaining to the perl.