Quote: Right now, you have 35 input variables (the 5x7 raster array). Using the horizontal and vertical projection profiles (row and column averages), though, would get you down to 12 input variables (5 vertical averages + 7 horizontal ones). Even throwing in a few more features, you could probably cut the number of input variables in half and still retain the most important information in the images.
Quote: I had not thought of doing something like this. I am wondering how I could define each section differently. How would I add up the values so that if we are looking at a horizontal line it is 01001 it would look different from 10010. As well my actual goal was to make it have 112 or 138 inputs with 28 outputs, would what you are saying still work?
The horizontal sum of 01001 and 10010 would be identical: summaries always discard information. The idea is that, given enough features, you can still tell classes apart. Consider the following letter 'F':
11111
10000
10000
11100
10000
10000
10000
Row sums would not be sufficient to distinguish this from a 'reverse F':
11111
00001
00001
00111
00001
00001
00001
Column sums, however, would be very different (especially those on the extreme left and right).
Naturally, other summaries could be used, such as the center of gravity of the '1' pixels, sums of other pixel regions, counts of 0-1 transitions, etc.
Another route to consider is simple pixel selection. Likely, some pixels provide more information about class than others. In my experience with character/shape recognition in raster images, it is often the case that some pixel locations provide no information at all!
-Will Dwinnell
Data Mining in MATLAB