Advertisement

&#$*@%$# Profanity Filtering

Started by August 02, 2006 10:25 AM
30 comments, last by Limitz 18 years, 6 months ago
Quote:
Original post by Anonymous Poster
Aren't Internet Predators just an urban legend? A mass hysteria?


Well I started making a few searches to prove my point, as I think I know this is a valid concern. And from what I see on US websites, this IS a mass hysteria! I mean, in my opinion, maybe a hundred children a year are "predated" via internet, which is enough to raise concern when you run a kid-targeted website, but US media claim that 20% of children are offered sexual propositions via internet. Further debunking lower this stat to 3% which still seems a hell lot for me.

So I would say, yes there is a mass-hysteria but no, this is not a urban legend. Just an issue which has been over-hyped by medias. Definitely an issue of concern if I was to run a kid-targeted website.
I don't think allowing users to add censored words is safe; griefers would submit common words.
You could manually inspect lists of candidate bad words to blacklist or whitelist, coming from
a) user submissions
b) automatic searching chat transcripts for words that are similar to existing censored words.

Omae Wa Mou Shindeiru

Advertisement
One thing that might help would be a very simple client-side trick: When somebody types a banned word, don't filter it on their side. This way, you see whatever you type, and unless you're talking with a friend via another medium (voice chat, etc), you're not likely to even know you're being filtered so people won't try to get around the filter (and won't know without help or multiple accounts whether they were successfull).
Then, instead of removing the word for other people, simply replace with with a random unlikely-to-ever-be-inappropriate word. For added humor, select words like "fluffy", "shiny", "pretty", "clown", "flower" etc

As for the method to use, I would use a three-table approach: First is a table of character transformations that turns "$" into "S", "@" into "a" etc; Second is a table that performs letter-group replacements in an attempt to account for typos, misspellings, etc; Third is a list of banned words.

Using the first two tables, you can generate a list of possibilities for each word, and you can then check the dictionary to see if the word is present or not.

You might want to employ phonetic algorithms, such as Double Metaphone, as another dictionary to detect misspellings that allow the same or similar pronunciation.

A more fullproof method would be to simply have a 'white list' instead of a 'blacklist' - a filter based on allowing only specific words. If you can find a fairly complete wordlist, you can combine it with a name list based on census data and then the only thing missing is fictional names / places / etc that you can add manually as desired.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Here is my suggestion:

Start with a "hard" list. Every word on that list is censored. If a word is "close" to a word on the hardlist, log it off somewhere and look at the log once in a week to add new words to the list.

Dont go too nazi on the filter; people (especially kids) will always find a way to go around it anyway.
Went to a conference on chat issues, starring some industry heavy hitters who had online games (specifically ESPN and Disney). Their conclusion was that there are only two things that are foolproof for keeping chat clean for kids. . .

1. Moderators to approve everything
2. Canned chat (i.e. You have a combobox with canned phrases rather than an edit field).

Disney actually used canned chat to their advantage. In their online games, one of the rewards was to give users more canned phrases as a reward.

(my byline from the Gamedev Collection series, which I co-edited) John Hattan has been working steadily in the casual game-space since the TRS-80 days and professionally since 1990. After seeing his small-format games turned down for what turned out to be Tandy's last PC release, he took them independent, eventually releasing them as several discount game-packs through a couple of publishers. The packs are actually still available on store-shelves, although you'll need a keen eye to find them nowadays. He continues to work in the casual game-space as an independent developer, largely working on games in Flash for his website, The Code Zone (www.thecodezone.com). His current scheme is to distribute his games virally on various web-portals and widget platforms. In addition, John writes weekly product reviews and blogs (over ten years old) for www.gamedev.net from his home office where he lives with his wife and daughter in their home in the woods near Lake Grapevine in Texas.

I'm going to do the straight word list approach. I have been looking around for a dirty word list to use as a black list, but it's been surprisingly difficult to find one on the internet (I have one now with maybe 500 words). A word list meant for filtering (i.e. with common misspellings and leetspeak) would be nice.

Actually I think I will have a separate filter for leetspeak, which when triggered will send your character to the n00b server.

Shedletsky's Bits: A Blog | ROBLOX | Twitter
Time held me green and dying
Though I sang in my chains like the sea...

Advertisement
http://www.georgecarlin.com/dirty/2443.html

George carlin's 2443 dirty words!

might be useful ;)

Note: This post does not answer your question, but it does give you something to think about.


A game that I worked on (E rated) had a profanity filter that was a huge .txt file full of the "words you cannot say"... "dirtywords.txt". Open up this file and you'll learn every possible way to 1337speek the word "fuck".

Tiger Woods Golf, Grand Theft Auto: San Andreas and Elder Scrolls: Oblivion have all been reprimanded for data that was "hidden" on the disc that was not submitted to the ESRB.

The ESA have now said there is something like an $11,000 fine per-unit-sold if any hidden "graphic" content is found again. It's probably a good idea to encrypt these dirty words files to make sure no kid puts it in their PC and finds it, then shows it to mommy.

Thats all. Now back to our regularly scheduled censoring. :)

Check out my new game Smash and Dash at:

http://www.smashanddashgame.com/

Quote:
Original post by JBourrie
Note: This post does not answer your question, but it does give you something to think about.


A game that I worked on (E rated) had a profanity filter that was a huge .txt file full of the "words you cannot say"... "dirtywords.txt". Open up this file and you'll learn every possible way to 1337speek the word "fuck".
You wouldn't still have that file, would you? That'd save some work.

(my byline from the Gamedev Collection series, which I co-edited) John Hattan has been working steadily in the casual game-space since the TRS-80 days and professionally since 1990. After seeing his small-format games turned down for what turned out to be Tandy's last PC release, he took them independent, eventually releasing them as several discount game-packs through a couple of publishers. The packs are actually still available on store-shelves, although you'll need a keen eye to find them nowadays. He continues to work in the casual game-space as an independent developer, largely working on games in Flash for his website, The Code Zone (www.thecodezone.com). His current scheme is to distribute his games virally on various web-portals and widget platforms. In addition, John writes weekly product reviews and blogs (over ten years old) for www.gamedev.net from his home office where he lives with his wife and daughter in their home in the woods near Lake Grapevine in Texas.

If the parent think it is inappropiate, tell her to take the kid of the game. Fact is that the censors do nothing but upset people. Either that, or you could just have a button that disables chat all together, with a password protection that the mother sets.

The words is really a small problem and quite useless to put any energy into to trying to stop. There are far worse problems to attend to with the chatfunctions of games.

I've played a over-average number of mmorpgs since 1997 and the "problem" with dirty words seem rather trivial when you compare it to the acctual content of the chats. I can't even count the number of discussions that have been inappropiate or the rising number of people that I have heard about that have been fooled into webcam chats where their tops or whatnot have been off. The last couple of years that problem have been rising alot, if i listen to what people in different guilds etc say.

So instead of worrying over what words are being used, parents should be more worried about whats acctually being said inbetween the words.

And on top of that, all filters i have seen in all games i have played have been extremly annoying, not because you cannot type bad words, but because they filter the wrong words, mess up chats or lines simply disappear.
Domine non secundum peccata nostra facias nobis

This topic is closed to new replies.

Advertisement