Advertisement

ANN's and gene analysis

Started by August 14, 2002 05:12 PM
3 comments, last by NewDeal 22 years, 5 months ago
I have no idea whether this is a new idea or general practice. My though is this. We know that ANN''s are quite magnificent at spotting patterns, or discerning relations between input/output. What if one was to combine genetic information with a database of, for example, hair color. My guess is that a sofisticated NN quite fast would be able to tell people''s haircolor from their genes. Replace hair color with genetic diseases and we might have something useful. I realise that this network would have to be quite large, but it would be possible no? Anyway, just thought i''d share my idea.
Interesting idea, though I don''t think it is very practical.

The size of the net would be daunting in terms of processing and correct patern recognition. With ANNs there must be a appropriate number of training paterns provided to the network - a general rule of thumb is P = W/e where P is the number of training pairs, W is the number of weights in the net, and e being the expected error.* The size of the genome would create a needed pattern far exceding current samples sizes and, of course, putting the computational needs even higher.

A better way, considering the lack of samples (compaired to the number of samples needed for the net) and computational limitations would be a brute force algorithm. Simply search for like characteristics within subset of the population (say red hair) then eleminate all the matches that are also similar within the gerneral population (those without red hair).

Their are complicated issues at play (such as sex determenate traits) that would make it more difficult for both the neural net and a brute force algorithm to work - though I belive that the brute force would normally preform better in these cases. A neural net may converge to a minima that ignores some trait while a brute force algorithm would end up leaving (at least) the sex gene and the actual decive gene - a better responce.

Of course, neural nets are supposed to be some form of optimization - something to advoid the brute force method - but I feel that the input space is just to large to allow fast/correct results. This is one of those times when the dimensinality of the problem is just to high and can''t compair to even a simplistic method.

- mongrelProgrammer

*This is a rule of thumb with some theoretical basis (Baum & Haussler, 1989)
- I hate these user ratings. Please rate me down. (Seriously) -
Advertisement
Oh well. Did a search and apparently entire books have been written on the subject.

Still interesting though (some day i might even think up something original )

Other confounding issues also arise. You mentioned hair color as an example - suppose you and I have the same hair color and we also happen to have the exact same eye color. The net won''t be able to recognize between the two of us which gene is for hair and which is for eye color. Now, let''s throw another person in the mix with the same hair color as ours but different eye color. This gives the ability to recognize the differences, but consider that the genome is composed of 3 billion nucleotide pairs representing 30,000 to 100,000 genes. We effectively need to be able to separate each of these to avoid false classifications.

The take home message is that there will be a lot of random correlations and hence a huge amount of noise in the system. The signal we''re trying to detect is tiny and only slightly above noise. We would have to have an extremely sensitive classifier to get around these problems and such a classifier may not exist.

-Kirk
What your talking about is finding correlations in gene expression data. Pattern recognition (classification) is used on this sort of problem, but neural networks are not generally the tool of choice because of their limitations: the size of the network scales with the size of the data set.

If you''re interested in finding out more about this area, look for publications on clustering, classification and gene expression.

Also look up The Human Genome Project, which uses a direct implementation of these sorts of ideas.

Cheers,

Timkin

This topic is closed to new replies.

Advertisement