&#$*@%$# Profanity Filtering
Our online kids game has recently started to pick up a good number of users and yesterday we got our first email from a mother complaining about some profanity used in the in-game chat. I feel like this is a milestone of some sort, but now we need to do something about it.
An obvious first pass would store a dictionary of dirty words in a hashtable and strikeout matching words in the chat text (possibly a word at a time, or possibly using a "sliding window") However, this makes it easy to get around the filter by adding spurious letters ect.
Recognizing that there is no way to analyze chat strings for meaning, what is the best approach to solving this problem?
Shedletsky's Bits: A Blog | ROBLOX | Twitter
Time held me green and dying
Though I sang in my chains like the sea...
The best profinaty excluders i have seen appear to use the lookup table method, only they use entries in the table that are twists on the common words, like:
phuck
$hit
etc
Dave
phuck
$hit
etc
Dave
The best thing you can do is probably run some regular expressions over the chat to clean up profanity and common letter substitutions, "[s5\$]h[i1!]t" for example.
If you plan on implementing any form of censor be sure to keep it client-side and optional, of course.
If you plan on implementing any form of censor be sure to keep it client-side and optional, of course.
Ra
You might be able to check the levenshtein distance from a word entered and the words in your table. Of course you would also have the try to combine small words, so people can't type "s h i t", "shi t", etc. You also need to make sure normal words won't get filtered (maybe you should also have an always allowed table for words which is to close to "forbidden" words, hit should for instance be in that one so it won't be mistaken for shit by the algorithm).
I'd go with the simple table route as well. As soon as you throw in foreign languages + evolving slang + ability to type words in creative ways is a massive problem to take on.
An additional possibility is to give people a way to 'suggest' new words to add to the filter. You could then review the list and add words as you see fit (assuming you can put out patches, etc). In addition to filtering blatant profanity, this would give put the burden of making the filter more comprehensive on the users. It would also give frustrated players an outlet for them to combat the problem.
An additional possibility is to give people a way to 'suggest' new words to add to the filter. You could then review the list and add words as you see fit (assuming you can put out patches, etc). In addition to filtering blatant profanity, this would give put the burden of making the filter more comprehensive on the users. It would also give frustrated players an outlet for them to combat the problem.
be careful what you filter too, cause I don't want to be going to MrWinkytail party. Might need to make a whitelist of words that aren't dirty that contain dirty words :P
AMP Minibowling - Free asynchronous multiplayer mobile minigolf+bowling
[twitter]eedok[/twitter]
Make sure that you allow Scunthorpe (town in the UK), unlike some email filters.
"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley
don't fgreot taht meessd up wrods are radelbae as long as the first and last letters are correct and all the letters are present :-)
Fukcing, btich and such are easily understandable and can be tricky to catch
Fukcing, btich and such are easily understandable and can be tricky to catch
Quote: Original post by Yvanhoe
don't fgreot taht meessd up wrods are radelbae as long as the first and last letters are correct and all the letters are present :-)
Fukcing, btich and such are easily understandable and can be tricky to catch
"taht is idened vrey intnritseeg, but uopn fheturr ietiignsatovn by me, it was dcisreeovd taht tihs olny wkors wehn the wdros uesd are eeecxtpd by the rdeear, or snoud salmiir to the oiioanrgl wrods dtpsiee ctaaechrr rrrgnneiaag. As you'll ntcioe you wree ublnae to ilmtdmaieey rgcznioee smoe of the wdros I uesd in my rlpey bsucaee tehy wree upctnxeeed, aglhtuoh I am qtuie pstvsiioe it is not bsacuee you wluod not rczngioee tehm nlrloamy." - http://www.albinoblacksheep.com/text/order.php
Quote: Original post by Ultimape
"taht is idened vrey intnritseeg, but uopn fheturr ietiignsatovn by me, it was dcisreeovd taht tihs olny wkors wehn the wdros uesd are eeecxtpd by the rdeear.
Agreed. This is not a golden rule but as the precedent posts were just pointing out the ways to circumvent sub$titutions. Levenshtein distance may help, but I wanted to point out that letters scrambling can be used in this goal too.
Some have pointed that it may be easier to use a white list, the problem is, many of the young folks you want to protect often have very random grammr skilz.
Maybe an automatic spellchecker could make this possible ?
But remember : a true and efficient system of censorship to "protect" children is very hard, if not impossible, to make. Its true clients are parents who need to think that a software will be able to protect their kids without them behind their back.
If you are really interested in protecting children from abusive behaviors, make a simple filter for profanity as a facade and put efforts in the detection of predator behaviors. These often use correct grammar, few profanity, contact lots of different children, and ask them specific questions, including their location in real life. Plus, they probably will spend more time on the chat than on the game. This is the real danger, don't get to worked on profanity. Kids will eventually learn from their schoolmates what a dick or an anal probe is. Thing is, they better learn this from other kids than from single lonely pedophile old men.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement