Advertisement

NLP questions and statements.

Started by July 10, 2006 08:08 AM
3 comments, last by ErUs 18 years, 4 months ago

def IsQuestion( message as string ) as bool:
  strlist = self.Tokenize(message)
  if( strlist.Length < 2 ): 
    return false
		
  word = strlist.GetValue(0) as string
  word = word.ToLower()
  if( word == "whats" or word == "what's" or word == "wats" ): 
    return true
		
  if( word == "what" or word == "wat" ):
    word = strlist.GetValue(1) as string
    word = word.ToLower()
    if( word == "is" or word == "are" ):
    return true
		
  return false


def IndexOfSubject( strings as (string) ) as int:
  i = 0
  while( i < strings.Length ):
  tstr = strings.GetValue(i) as string
  tstr = tstr.ToLower()
  j = 0
  if( tstr.Length < 2 ):
    return -1
  if( tstr.Substring(0,2) == "wh" or tstr.Substring(0,2) == "wa" ):
    j = 1
    if( (i+j) == strings.Length ):
      return -1
    tstr = strings.GetValue(i+j) as string
    tstr = tstr.ToLower()
    if( tstr == "is" ):
      if( (i+j+1) == strings.Length ):
        return -1
    tstr = strings.GetValue(i+j+1) as string
    tstr = tstr.ToLower()
    if( tstr == "a" or tstr == "an" or tstr == "the" or tstr == "teh" ):
      if( (i+j+2) == strings.Length ):
        return -1
    return i+j+2;
					
    if( tstr == "a" or tstr == "an" or tstr == "the" or tstr == "teh" ):
      if( (i+j+1) == strings.Length ):
        return -1
      return i+j+1;	
													
    return (i+j)
  return -1

I am trying to build a system that can check if an array of words is a question, and if so, get the subject of the question. My current method is way to hacky, i am looking for some documentation on the subject but i cannot find anything that isnt really broad on the whole subject of NLP. Please help out :) [Edited by - ErUs on July 10, 2006 8:35:53 AM]
-www.freewebs.com/tm1rbrt -> check out my gameboy emulator ( worklog updated regularly )
Made the two earlyer functions a bit better:
	def IsQuestion( message as string ) as bool:		# Based on the idea that a question is "* wh* [is [the]] %s"		strlist = self.Tokenize(message)				i = 0		while( i < (strlist.Length -1) ): #-1 because we need more that "whats"			word = strlist.GetValue(i) as string			word = word.ToLower()			if( word.Length < 2 ):				i++				continue # what and variants ALLWAYS have more than 2 chars			if( word.Substring(0,2) == "wh" or word.Substring(0,2) == "wa" ):				if( (i+1) < word.Length): #make sure there is space for subject					return true # it still may not be question but IndexOfSubject will return -1 if so				else: 					return false			i++		return false

	def IndexOfSubject( strings as (string) ) as int:		# Based on the idea that a question is "* wh* [is [the]] %s"		i = 0		while( i < (strings.Length -1) ): #-1 because we need more that "whats"			word = strings.GetValue(i) as string			word = word.ToLower()			if( word.Length < 2 ):				i++				continue # what and variants ALLWAYS have more than 2 chars			if( word.Substring(0,2) == "wh" or word.Substring(0,2) == "wh" ):													j = 1				while( (i+j) != word.Length and j < 4): #make sure there is space for subject														word = strings.GetValue(i+j) as string					word = word.ToLower()										if( IsRestrictedWord( word ) or word == "" ):						break										return (i+j)					j++			i++	

but i still dont know how to check for a statement, like "tom is human"
-www.freewebs.com/tm1rbrt -> check out my gameboy emulator ( worklog updated regularly )
Advertisement
A context-free grammar represents simple sentences in natural language fairly well.

A context-free grammar is a set of rules like these:

Statemet ::= Subject NominativeVerb Attribute | Subject TransitiveVerb DirectObject;
Subject ::= NominalGroup
NominalGroup ::= Article NounWithAdjectives | NounWithAdjectives
NounWithAdjectives ::= Noun | Adjective NounWithAdjectives
Attribute ::= Adjective
DirectObject ::= NominalGroup
...

You need to have a dictionary that tells you which roles each word can take (Noun, Adjective, Article, NominativeVerb...). Then there is a straight-forward algorithm to determine whether a sentence conforms with the definition of Statement, or Question (to be defined). The algorithm might be sort of slow, since its running time is O(n^3), where n is the number of words.

Getting the grammar right might be tricky (enroll a linguist in your project :) ). Many sentences will be ambiguous (there will be several ways of describing them using the rules). You can improve your parser by knowing about number and gender of nouns and verbs... And I am sure that you can spend several lifetimes trying to get everything right, especially if you want your program to know what pronouns refer to.

I don't have good links, but Wikipedia seems to have some info. Good luck.

There isn't really a simple way to do what you want as long as you're sticking to these special-case hacks, really. The only way you can guarantee a question is a question is if it's punctuated as such. Otherwise, how could you distinguish "tom is human." from "tom is human?"

However, if you really want to persevere, despite that rather large problem, and aren't willing to go down the 'proper' NLP route, there are a few further hacks you can employ. In English, generally a statement is subject/verb/object whereas a question is verb/subject/object, with the verb generally prefixed by one of the various pronouns or adverbs that signify a question (eg. who, where, why, what, how). The verb is nearly always 'to be', eg. 'is' ior 'are'. Then the rest of the sentence is generally the object of the question (the 'subject' is the 'who', 'where', etc... just to clarify the terminology). The object is usually a noun phrase, which is likely to be an article/determiner such as "a", "the", "some", followed by the noun you want.

By the way, what language is this? VB.Net?
thanks Kylotan,

the language is 'Boo'

i dont really wanna go down the 'propper' NLP route because i think its a bit over-kill for the simple question / statement info-bot i am trying to write.

Some good tips here thanks :)
-www.freewebs.com/tm1rbrt -> check out my gameboy emulator ( worklog updated regularly )

This topic is closed to new replies.

Advertisement