text classification
hello I need to do text classification with thousands of helpdesk question/answers I already "summarized" the questions based on the answer given (since it is pretty standard), so what I currently have is something like "I already paid this invoice" -> [paid invoice] "you're charging me for that, but I already paid that" -> [paid invoice] "this isn't due anymore" -> [paid invoice] etc now I "just" need to relate the words to its category for example, "I already paid" in a sentence would probably mean [paid invoice] category I tried doing that with naive-bayes, but it has many problems: 1) it doesn't know some words in a sentence is more important than others... for example, "I *DIDN'T* paid" is very different from "I paid", but naive-bayes doesn't handle that correct 2) texts usually have like 2 or 3 "keywords" statistically linked words, and naive-bayes basically consider only one I can see some modifications that can be done: a) for each phrase, check if it's afirmative or negative b) don't do it so naive... there's thousands of texts and words, so I can't consider all of them, but I could concatenate 2 words for example and then analyze those concatenated words... for example, in the first example above, I would concatenate "Ialready", "Ipaid", "alreadypaid", etc... but it would require a lot of computer power and I'm not sure of the results any other ideas for solving that?
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement