text classification
hello
I need to do text classification with thousands of helpdesk question/answers
I already "summarized" the questions based on the answer given (since it is pretty standard), so what I currently have is something like
"I already paid this invoice" -> [paid invoice]
"you're charging me for that, but I already paid that" -> [paid invoice]
"this isn't due anymore" -> [paid invoice]
etc
now I "just" need to relate the words to its category
for example, "I already paid" in a sentence would probably mean [paid invoice] category
I tried doing that with naive-bayes, but it has many problems:
1) it doesn't know some words in a sentence is more important than others... for example, "I *DIDN'T* paid" is very different from "I paid", but naive-bayes doesn't handle that correct
2) texts usually have like 2 or 3 "keywords" statistically linked words, and naive-bayes basically consider only one
I can see some modifications that can be done:
a) for each phrase, check if it's afirmative or negative
b) don't do it so naive... there's thousands of texts and words, so I can't consider all of them, but I could concatenate 2 words for example and then analyze those concatenated words... for example, in the first example above, I would concatenate "Ialready", "Ipaid", "alreadypaid", etc... but it would require a lot of computer power and I'm not sure of the results
any other ideas for solving that?
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement