Advertisement

If a neural net can abstract an input from other inputs, is it better to calculate that input for it so it doesn't have to abstract it? Does making the input layer wider reduce the depth needed?

Started by May 20, 2023 03:54 PM
6 comments, last by alvaro 1 year, 6 months ago

Hello, just diving into more understandings about neural nets and concepts.

Probably a little premature to think about optimization and compute times, but I guess I was looking for pointers regarding efficient architecture of neural nets.

If I'm approaching the classical “teach a rigged skeleton to walk from scratch” problem, there are a lot of inputs per joint, per joint velocity, momentum, etc. But then there are also a lot of inferred details. Giving it the COG of individual limbs might allow it to infer the overall COG, but it's clearly easy enough to give it the overall COG as an input. But that got me thinking about many other combinations, like whether I should give individual groups of joints their own COG, etc. as input. It might be important information that would save how much abstract learning its doing with depth. But then I realize one could get rather infinitely granular with every permutation of inputs to combine. If I'm giving it only inputs about its joints and which direction its facing, it could infer how far its feet is from the ground, but is it better to tell it? Then why not tell it how far every joint is above the ground? And also relative to every other joint, etc, etc? So is it better to give the neural net inputs it may never fully use than to have it abstract it through more layers/neurons?

The overall intent is to eventually teach it to box other AIs and use a RL tournament to continuously refine the fighters, fwiw, so I'm hoping to eventually incorporate collisions, force applied, how weight and momentum and different body shapes contribute to energy efficiency, and giving it all of the opponents inputs as its “sight," etc, which adds even more complexity to its inputs if it can also see how fast its opponents hands are moving, where they are relative to their face, etc. etc.

If anybody has recommended tutorials/walk-throughs of these, appreciate the links, thanks.

What works and what doesn't work in neural networks is not completely understood, so you just have to try.

I have just started playing around with RL, but I have some experience with more traditional supervised learning, mainly with CNNs. In my experience, adding a richer set of inputs is inexpensive in terms of computations, because the intermediate layers of the network tend to have many more channels than the input. If your network is large enough that you don't have trouble keeping the GPU busy, a richer set of inputs won't hurt, and it can probably help in the early stages of learning. If computing the extra inputs is becoming a bottleneck, maybe it'll be better to let them out and keep the input lean.

My instinct would be to provide orientations and angular velocities for every bone and positions and velocities for every bone and/or for every joint. But, as I said, it's very hard to predict what works.

Advertisement

What you described is part of the reason machine learning isn't used for this type of things, at least normally.

We don't want machines learning to have to re-invent the walk cycle. Creating fun means that animators make something that is fun and unique, this character walks with a spring in their step, that character wobbles, etc.

It is certainly possible to help encourage something like that to evolve, but doing it through reinforcement is going to be a long, slow, painful process where the learning algorithm will probably figure out thousands of variations of ways to create locomotion, likely many thousand variations of inchworms and snakes and tumblers. Trying to make something with human-like or animal-like bipedal or quadrupedal motion is an extremely unlikely combination through reinforcement.

That said, when it comes to which input sets to provide, for other systems I've worked on, mostly gesture recognition type work, we try to provide sanitized and normalized direct inputs like linear and angular velocity, linear and angular displacement from the start point, plus a few computed values like sum of the angles and sum of the angles squared, which would otherwise be difficult/impossible to infer from the input set. The curse of dimensionality is real, every input you add just makes learning an order of magnitude more difficult, until you hit obscenely high numbers and then it's just a huge training problem.

Adding more inputs lets you compute or measure cheaply and exactly something that the neural network would estimate approximately and less cheaply after spending significant resources to learn how.

“Saving” inputs can hurt the learning and representation of actually necessary quantities because a quantity that could be immediately available appears only at a successive layer, requiring some weights dedicated to compute it that could be available to represent something useful instead.

At worst, unnecessary inputs can be recognized automatically and pruned as redundant in the “release” version of the neural network, possibly after transforming the network to ignore them.

Omae Wa Mou Shindeiru

@frob Have you seen all these recent papers (Two Minute Papers covers a lot of them) where neural nets/auto-encoders are primed first with mocap data? It's a little above my capabilities to grok just yet and I'm not sure how far from base vanilla neural nets these are, but it seems like just a dash of this data to seed the NN allows it to learn the motions as well as synthesize combinations or transitions.

Are these ways to “jump-start” traditional feed-forward NNs or are they something much more complex?


Pretraining is increasingly pretty standard. That's the P in GPT.

If you can afford to do it, have the experience and the budget for it, go for it. EA and Nvidia are multi-billion dollar companies with a global presence and a lot of skilled, experienced people. But it isn't commonly done, as mentioned, because it removes control of the fun first and foremost. You'll notice in the second video: “Simulation time: 10 years, real time 10 days, samples: 10 billion”. If you as an individual are letting an AI training system run continuously for nearly two weeks, and working on the general logic and planning for months on end, you're not building your game but instead building an AI system.

Most developers don't have the ability to make ANY system that approaches the goal, but if you have access to a pretrained model and the piles of mocap data and the experience needed - - none of them were evident in the early posts - - that is a different story.

Advertisement

I totally agree with frob. But I think the original question was from someone who is just trying to understand these systems better. Playing around with this type of problem can give you a deeper understanding of the techniques and a lot of intuition, much more than just reading the papers. But if you are trying to use this in a game, training it yourself is not feasible. One these large companies could pack the trained controller into a library, but as far as I know hey haven't.

This topic is closed to new replies.

Advertisement