Advertisement

text classification

Started by August 17, 2008 12:54 PM
-1 comments, last by kavelot 16 years, 3 months ago
hello I need to do text classification with thousands of helpdesk question/answers I already "summarized" the questions based on the answer given (since it is pretty standard), so what I currently have is something like "I already paid this invoice" -> [paid invoice] "you're charging me for that, but I already paid that" -> [paid invoice] "this isn't due anymore" -> [paid invoice] etc now I "just" need to relate the words to its category for example, "I already paid" in a sentence would probably mean [paid invoice] category I tried doing that with naive-bayes, but it has many problems: 1) it doesn't know some words in a sentence is more important than others... for example, "I *DIDN'T* paid" is very different from "I paid", but naive-bayes doesn't handle that correct 2) texts usually have like 2 or 3 "keywords" statistically linked words, and naive-bayes basically consider only one I can see some modifications that can be done: a) for each phrase, check if it's afirmative or negative b) don't do it so naive... there's thousands of texts and words, so I can't consider all of them, but I could concatenate 2 words for example and then analyze those concatenated words... for example, in the first example above, I would concatenate "Ialready", "Ipaid", "alreadypaid", etc... but it would require a lot of computer power and I'm not sure of the results any other ideas for solving that?

This topic is closed to new replies.

Advertisement