If I implement move & remove as a single move unit, I could have for example 9x9 moves that I need to generate. Has this performance issues instead of first generating 9 moves, make the move, and then generate another 9 moves for the remove?
9x9 moves? Can you have that many mill-completing moves available? Since you are worrying about performance, you should primarily be concerned with the average performance of the program in positions where the game hasn't been decided yet. So I expect you typically will have very few mill-completing moves and the actual list of moves will be short.
If generating a lot of useless moves (because a beta cut-off will prove them useless) becomes an issue, you could use an object that generates the moves in stages, using a pattern that old geese like me call a "co-routine". That way your code just asks for a next move from the object, and the object can internally only generate the removals as they are needed. But it's almost certainly not worth the effort in this case.
If you are worried about performance, the alpha-beta algorithm performs much much better if you are clever about how you sort the moves. Having all the moves available in a list will allow you more flexibility as to the types of heuristics you can use for move ordering, so that's another argument for generating the complete list.