PDA

View Full Version : Tutorial [PHP] : How to make a Word Filter and Smiley Parser



Mentor
13-05-2009, 08:41 PM
Intro: Iszak recently commented to me on how old and out of date most the tutorials i have posted around are, on which he had quite a point. The majority were written a very long time ago and include some shameful abuse of the php language. To fix this i've decided to start rewriting a few of the old ones to bring them a little more up to date.
This tutorial is a rewrite of: http://thybag.co.uk/?p=Tutorials&ind=40 in it I aim to provide a much quicker and more efficient way of proving the same functionality.

------------------------------------------------
Introduction
Hello, in this tutorial i hope to show how to write a simple, but effective bad word filter and smiley parser in php5. So before we begin, its probably a good idea to clarify what exactly a smiley parser or even a bad word filter is.

A smiley parser, is a function which when given a line of text, will replace smiley or emoteicon symbols (such as :) :( :D and all the rest of them) with representative images. This is useful for example if you wanted to show smiley's within a guestbook, forum post, or comment on your site.

A badword filter on the other hand is much more self explanatory, its main use is simply to remove bad or objectionable language from some text. This is again useful for handling comments, posts or guestbook entries.

To do this we are going to create a class called SimpleParser which will be used by calling two methods within it parseText and unParseText.

Functions at work

Before we create a class, we should probably understand the basic principles the smiley and badword parser will work upon. The main function at work here is str_replace.


<?php
//Will print out "I'm going to eat some bacon"
echo str_replace('tomato','bacon',"I'm going to eat some tomato");
?>

str_replace is a simple string replace function, it replaces all instances of the first value (tomato in this case) and replaces it with the second value (bacon) when its found within the string provided as the 3rd value.

A slight variation on this function is str_ireplace which does exactly the same thing but is case insensitive.

An important ability of str_replace and str_ireplace is that instead of a string (such as bacon or tomato) they can be passed an array as well. This means we can pass an entire list of replacements to a single str_replace function.

Basic badword filter

The usefulness of the ability to pass an array can be seen in this basic badword filter here

<?php
function parseBadWords($text){
//List of words to replace
$badWordList = array('orange','apple','carrot','grape','pea');
//replace each word above with the word bacon
return str_replace($badWordList,'bacon',$text);
}
echo parseBadWords("i like to eat lots of applepie, but my favourite thing to do is to is simply to chew on raw carrot");
?>

If you run the example you'll see the result is
"i like to eat lots of baconpie, but my favourite thing to do is to is simply to chew on raw bacon"
showing each word in the badwords list that was found in the text has been replaced with the good word bacon. Rather than just replacing fruit and veg with bacon, this could easily be changed to replace swearwords with some stars. There's no limit to how many words can be added to the badword array.

Basic smiley parser
We've seen now we can feed an array directly in to the str_replace function and have it perform a replacement for each word within it. In the smiley parse we can make even better use of this functionality by using two arrays at once. The first being the smiley symbols we want to replace and the second be the list of replacements.

<?php
function parseSmileys($text){
//List of words to replace
$smileys = array(':)',':(',':p',':D');
$replacements = array('<img src="smileyfolder/smile.gif" alt=":)" />','<img src="smileyfolder/sad.gif" alt=":(" />','<img src="smileyfolder/heh.gif" alt=":p" />','<img src="smileyfolder/biggrin.gif" alt=":D" />');

return str_ireplace($smileys,$replacements,$text);
}
echo parseSmileys("Happy :) Very Happy :D sad :( ... :p ");
?>
The resulting output from the above function will contain the html image tags instead of the two characters making up the smiley symbol. Additionally str_ireplace was used here instead of a simple str_replace, this is simply because it doesn't really matter whether a user typed :P or :p both so we may as well allow both to work.

This may work, but its hardly perfect, having two arrays after all is a little bit messy. The solution to this is simply to combine all the values in to one array.

<?php
function parseSmileys($text){
//List of words to replace
$smileyList = array(
//'smileySymbol' => 'smiley image code';
':)' => '<img src="smileyfolder/smile.gif" alt=":)" />',
':(' => '<img src="smileyfolder/sad.gif" alt=":(" />',
':D' => '<img src="smileyfolder/biggrin.gif" alt=":D" />',
':p' => '<img src="smileyfolder/heh.gif" alt=":p" />',
);
return str_ireplace(array_keys($smileys),$smileys,$text);
}
echo parseSmileys("Happy :) Very Happy :D sad :( ... :p ");
?>
You'll see the above code produces the same output as before, meaning the only real change here is under the hood.
Rather than using two arrays the above code simply uses one, with the array key being the smiley symbol and the array value being the code to replace with. array_keys() is then used on the smiley array to return an array of just the keys (this being the smiley symbols in the example) which can then be run against the normal array in the function.


Going Object Oriented
Now we have all the basic functions needed works out, we can finally start putting them together to create are SimpleParser class. The result is this.

<?php
class SimpleParser
{

## First off, the data this class needs in order to work.

//List of Smileys
private $smileyList = array(
//'smileySymbol' => 'smiley image code';
':)' => '<img src="smileyfolder/smile.gif" alt=":)" />',
':(' => '<img src="smileyfolder/sad.gif" alt=":(" />',
':D' => '<img src="smileyfolder/biggrin.gif" alt=":D" />',
';)' => '<img src="smileyfolder/wink.gif" alt=";)" />',
':o' => '<img src="smileyfolder/ohmy.gif" alt=":o" />',
':p' => '<img src="smileyfolder/heh.gif" alt=":p" />',
':/' => '<img src="smileyfolder/hmm.gif" alt=":o" />',
':[' => '<img src="smileyfolder/ouoh.gif" alt=":o" />',
);
//Bad word list
private $badWordList = array('orange','apple','carrot','grape','pea');
//Word to Replace with
private $goodWord = 'bacon';

## Main Functions to interact with class
public function parseText($text,$smileys=1,$badwords=1){
$text = str_replace('://','#link#',$text);
//run
if($smileys==1){
$text = $this->parseSmiley($text);
}
if($badwords==1){
$text = $this->parseBadWords($text);
}
//fix
return str_replace('#link#','://',$text);
}
//get back orignal string
public function unParseText($text){
return $this->unParseSmiley($text);
}

## Functions to perform actions

private function parseSmiley($text){
return str_ireplace(array_keys($this->smileyList),$this->smileyList,$text);
}
private function unParseSmiley($text){
return str_replace($this->smileyList,array_keys($this->smileyList),$text);
}
private function parseBadWords($text){
return str_replace($this->badWordList,$this->goodWord,$text);
}
}
?>

To test it all works, we need to then create a new instance of this parser and call the methods in it. (note the file u need to run this in ether needs to "include" the file the simpleParser class is in, or have the simpleParser class within it.)

</php
//Create an instance of simpleParser
$parser = new SimpleParser();
//Parse the text and store
$text = "Hello appleface :D This code should test the above example for removing bad words such as apple and carrot as well as generating smileys like :p :( and :/, cool huh? http://thybag.co.uk";
//Run tests
echo "parse everything:<br/>";
$example = $parser->parseText($text);
echo $example;
echo "<br/>Just badwords:<br/>";
echo $parser->parseText($text,0,1);
echo "<br/>Just smileys:<br/>";
echo $parser->parseText($text,1,0);
echo '<br/>And finally turn the smileys back to text<br/>';
echo $parser->unParseText($example);//Note this will not unfilter bad words
?>

Now in addition to having the interactions go through the parseText and unParseText methods, the more observant amoung you may have noticed a few other significant changes.

The parseText function for example

<?php
public function parseText($text,$smileys=1,$badwords=1){
$text = str_replace('://','#link#',$text);
//run
if($smileys==1){
$text = $this->parseSmiley($text);
}
if($badwords==1){
$text = $this->parseBadWords($text);
}
//fix
return str_replace('#link#','://',$text);
}
?>
Contains two additional string replaces. Why? Well lets say we want are smiley parser to work with the smiley :/, a simple string replace sounds fine right? But what if we had a link in the text? An image being generated within a URL is not the result people are going to want. To avoid this, the function makes use of a "slightly hacky" workaround, in that the first replacement swaps out every :// with the text #link#, Then once all the smileys have been added, swaps #link# back to the original ://, thus avoiding it getting partly replaced by a smiley.

it should be noted that there are much better ways of getting around this, regex (preg_replace) for one would make this easy to solve, but for the purpose of this tutorial i wanted to avoid that and keep to just string replacement.
Another minor modification is the use of the 2 extra parameters on the parseText function, By setting these in the function call, its possible just to parseSmileys or just filter badwords, rather than doing both. The "$smileys=1" in the parameter field just means that if a parameter isn't provided it should just use 1 as its value.

An unParseText method has also been added, This seemed like it may as well be put in because it needed so little extra code. Because the smiley array is now separate from the function and instead part of the class, its quite easy for another function in the class to quickly make use of its data.

The unParseText function works simply by doing the opposite of what the parseSmileys function does, taking the keys as the values to replace too and using the image code as the values to be replaced.

Since the data is within a class, public and private scopes have been added to the functions and variable. Private simply means that only functions within the Class can use,change or work with the function or value. Public on the other hand means everything can access the function, both in and outside the Class.

This concludes the "How to make a Word Filter and Smiley Parser" Tutorial. I hope this was informative and helpful to at least some of you. If you have any questions, feel free to ask.


-----------------------------------------------
This is the first draft of the tutorial so please point out any errors, mistakes or just chunks of pointless waffle if you see them. Any and all feedback is welcomed and will hopefully go to improve both this and other future tutorials.
Mentor

Meti
14-05-2009, 11:54 AM
I suck at PHP, and don't know whether this is good or not, but I guess it is :P
Good job.

Jam-ez
14-05-2009, 03:05 PM
Well laid out, clean and beautiful. I can't see anything wrong with any of the code and you've commented it out really well!

+repadar. :)

Source
14-05-2009, 03:25 PM
I only skimmed over it, as I get bored of reading :P However looks like a really great tutorial to help people out.

I did notice once thing; on this code:


</php
//Create an instance of simpleParser
$parser = new SimpleParser();
//Parse the text and store
$text = "Hello appleface :D This code should test the above example for removing bad words such as apple and carrot as well as generating smileys like :p :( and :/, cool huh? http://thybag.co.uk";
//Run tests
echo "parse everything:<br/>";
$example = $parser->parseText($text);
echo $example;
echo "<br/>Just badwords:<br/>";
echo $parser->parseText($text,0,1);
echo "<br/>Just smileys:<br/>";
echo $parser->parseText($text,1,0);
echo '<br/>And finally turn the smileys back to text<br/>';
echo $parser->unParseText($example);//Note this will not unfilter bad words
?> surely you need to require/include the class file before initiating an instance of it; like so:


<?php
//require the class
require_once "path/to/class.php";

//Create an instance of simpleParser
$parser = new SimpleParser();

//Parse the text and store
$text = "Hello appleface :D This code should test the above example for removing bad words such as apple and carrot as well as generating smileys like :p :( and :/, cool huh? http://thybag.co.uk";

//Run tests
echo "parse everything:<br/>";
$example = $parser->parseText($text);
echo $example;

echo "<br/>Just badwords:<br/>";
echo $parser->parseText($text,0,1);

echo "<br/>Just smileys:<br/>";
echo $parser->parseText($text,1,0);

echo '<br/>And finally turn the smileys back to text<br/>';
echo $parser->unParseText($example);//Note this will not unfilter bad words

?>

EDIT: Woops, just saw the text related to that issue. Damn my laziness.

Mentor
14-05-2009, 04:07 PM
I only skimmed over it, as I get bored of reading :P However looks like a really great tutorial to help people out.

I did notice once thing; on this code:


</php
//Create an instance of simpleParser
$parser = new SimpleParser();
//Parse the text and store
$text = "Hello appleface :D This code should test the above example for removing bad words such as apple and carrot as well as generating smileys like :p :( and :/, cool huh? http://thybag.co.uk";
//Run tests
echo "parse everything:<br/>";
$example = $parser->parseText($text);
echo $example;
echo "<br/>Just badwords:<br/>";
echo $parser->parseText($text,0,1);
echo "<br/>Just smileys:<br/>";
echo $parser->parseText($text,1,0);
echo '<br/>And finally turn the smileys back to text<br/>';
echo $parser->unParseText($example);//Note this will not unfilter bad words
?> surely you need to require/include the class file before initiating an instance of it; like so:


<?php
//require the class
require_once "path/to/class.php";

//Create an instance of simpleParser
$parser = new SimpleParser();

//Parse the text and store
$text = "Hello appleface :D This code should test the above example for removing bad words such as apple and carrot as well as generating smileys like :p :( and :/, cool huh? http://thybag.co.uk";

//Run tests
echo "parse everything:<br/>";
$example = $parser->parseText($text);
echo $example;

echo "<br/>Just badwords:<br/>";
echo $parser->parseText($text,0,1);

echo "<br/>Just smileys:<br/>";
echo $parser->parseText($text,1,0);

echo '<br/>And finally turn the smileys back to text<br/>';
echo $parser->unParseText($example);//Note this will not unfilter bad words

?>

EDIT: Woops, just saw the text related to that issue. Damn my laziness.
Well, you do still have a point. I think i'll change it to include saving the class then including it in the test since it just makes it that bit more obvious.
Pity habbox don't let you edit posts within a reasonable time scope anymore <This gripes a good few years past its hey day, but meh>



@everyone: Thanks for the great feedback :)

Dentafrice
16-05-2009, 01:00 PM
Great tutorial :) Gave it a read over this evening. Nice to see some quality posts again.

Nice idea on the #link# concept too, for an easy fix :) +REP.

Mentor
18-05-2009, 01:57 PM
Thanks, anyone get on preference on what thybag tutorial theyed like me to have a go at redoing next / a new tutorial that'd be of use?

VirtualG
28-06-2009, 08:34 AM
intresting... Good-ish tut

Dentafrice
28-06-2009, 12:55 PM
intresting... Good-ish tut
What was the point in bumping this thread, and you consider his tutorial, "good-ish".. I would love to see you do better, or even attempt to have the knowledge Mentor has.

Jam-ez
28-06-2009, 01:45 PM
What was the point in bumping this thread, and you consider his tutorial, "good-ish".. I would love to see you do better, or even attempt to have the knowledge Mentor has.

The fact that he posted in the top six tutorials makes this even worth. Oh by the way, I made my own class taking reference to this guide, thanks a lot Mentor. :)

Chippiewill
06-07-2009, 06:18 PM
What was the point in bumping this thread, and you consider his tutorial, "good-ish".. I would love to see you do better, or even attempt to have the knowledge Mentor has.

It be hardly bumping :P. Wow, a nice tutorial Mentor :D

Want to hide these adverts? Register an account for free!