Advertisement

&#$*@%$# Profanity Filtering

Started by August 02, 2006 10:25 AM
30 comments, last by Limitz 18 years, 3 months ago
Aren't Internet Predators just an urban legend? A mass hysteria?

I've always known kids to be a lot smarter than the tv commercials try to portray them, and find it highly inplausible that any would fall for that kind of thing... even if such predators do exist, they arent actually a threat...
Quote: Original post by Anonymous Poster
Aren't Internet Predators just an urban legend? A mass hysteria?


Well I started making a few searches to prove my point, as I think I know this is a valid concern. And from what I see on US websites, this IS a mass hysteria! I mean, in my opinion, maybe a hundred children a year are "predated" via internet, which is enough to raise concern when you run a kid-targeted website, but US media claim that 20% of children are offered sexual propositions via internet. Further debunking lower this stat to 3% which still seems a hell lot for me.

So I would say, yes there is a mass-hysteria but no, this is not a urban legend. Just an issue which has been over-hyped by medias. Definitely an issue of concern if I was to run a kid-targeted website.
Advertisement
I don't think allowing users to add censored words is safe; griefers would submit common words.
You could manually inspect lists of candidate bad words to blacklist or whitelist, coming from
a) user submissions
b) automatic searching chat transcripts for words that are similar to existing censored words.

Omae Wa Mou Shindeiru

One thing that might help would be a very simple client-side trick: When somebody types a banned word, don't filter it on their side. This way, you see whatever you type, and unless you're talking with a friend via another medium (voice chat, etc), you're not likely to even know you're being filtered so people won't try to get around the filter (and won't know without help or multiple accounts whether they were successfull).
Then, instead of removing the word for other people, simply replace with with a random unlikely-to-ever-be-inappropriate word. For added humor, select words like "fluffy", "shiny", "pretty", "clown", "flower" etc

As for the method to use, I would use a three-table approach: First is a table of character transformations that turns "$" into "S", "@" into "a" etc; Second is a table that performs letter-group replacements in an attempt to account for typos, misspellings, etc; Third is a list of banned words.

Using the first two tables, you can generate a list of possibilities for each word, and you can then check the dictionary to see if the word is present or not.

You might want to employ phonetic algorithms, such as Double Metaphone, as another dictionary to detect misspellings that allow the same or similar pronunciation.

A more fullproof method would be to simply have a 'white list' instead of a 'blacklist' - a filter based on allowing only specific words. If you can find a fairly complete wordlist, you can combine it with a name list based on census data and then the only thing missing is fictional names / places / etc that you can add manually as desired.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Here is my suggestion:

Start with a "hard" list. Every word on that list is censored. If a word is "close" to a word on the hardlist, log it off somewhere and look at the log once in a week to add new words to the list.

Dont go too nazi on the filter; people (especially kids) will always find a way to go around it anyway.
Went to a conference on chat issues, starring some industry heavy hitters who had online games (specifically ESPN and Disney). Their conclusion was that there are only two things that are foolproof for keeping chat clean for kids. . .

1. Moderators to approve everything
2. Canned chat (i.e. You have a combobox with canned phrases rather than an edit field).

Disney actually used canned chat to their advantage. In their online games, one of the rewards was to give users more canned phrases as a reward.

(my byline from the Gamedev Collection series, which I co-edited) John Hattan has been working steadily in the casual game-space since the TRS-80 days and professionally since 1990. After seeing his small-format games turned down for what turned out to be Tandy's last PC release, he took them independent, eventually releasing them as several discount game-packs through a couple of publishers. The packs are actually still available on store-shelves, although you'll need a keen eye to find them nowadays. He continues to work in the casual game-space as an independent developer, largely working on games in Flash for his website, The Code Zone (www.thecodezone.com). His current scheme is to distribute his games virally on various web-portals and widget platforms. In addition, John writes weekly product reviews and blogs (over ten years old) for www.gamedev.net from his home office where he lives with his wife and daughter in their home in the woods near Lake Grapevine in Texas.

Advertisement
I'm going to do the straight word list approach. I have been looking around for a dirty word list to use as a black list, but it's been surprisingly difficult to find one on the internet (I have one now with maybe 500 words). A word list meant for filtering (i.e. with common misspellings and leetspeak) would be nice.

Actually I think I will have a separate filter for leetspeak, which when triggered will send your character to the n00b server.

Shedletsky's Bits: A Blog | ROBLOX | Twitter
Time held me green and dying
Though I sang in my chains like the sea...

http://www.georgecarlin.com/dirty/2443.html

George carlin's 2443 dirty words!

might be useful ;)
Note: This post does not answer your question, but it does give you something to think about.


A game that I worked on (E rated) had a profanity filter that was a huge .txt file full of the "words you cannot say"... "dirtywords.txt". Open up this file and you'll learn every possible way to 1337speek the word "fuck".

Tiger Woods Golf, Grand Theft Auto: San Andreas and Elder Scrolls: Oblivion have all been reprimanded for data that was "hidden" on the disc that was not submitted to the ESRB.

The ESA have now said there is something like an $11,000 fine per-unit-sold if any hidden "graphic" content is found again. It's probably a good idea to encrypt these dirty words files to make sure no kid puts it in their PC and finds it, then shows it to mommy.

Thats all. Now back to our regularly scheduled censoring. :)

Check out my new game Smash and Dash at:

http://www.smashanddashgame.com/

On a humerous side note...

I think it would be interesting to email back the mother who complained, and claim: we have examined the chat logs and discovered that your child was the one who first used that word in our game, in fact another mother is complaining that her son has learned dirty language from speaking with yours. You have been banned, when you teach your children about good language you may submit a request to unban. Have a nice day.

just to see what kind of responce you get

This topic is closed to new replies.

Advertisement