Python is not the optimal language for CPU-intensive operations because it intrinsically has layers of indirection between the language and the processor - the bytecode interpreter, the C objects representing your variables, the system calls implementing those objects, etc.
Therefore optimising Python is often a case of reducing the number of Python operations and replacing them with C operations. This can be done a few ways:
1) Replacing algorithms implemented in Python with standard library or built-in functions. For example, sometimes an explicit loop can be replaced with a call to the map() function - this moves the looping part into C code, and therefore is sometimes faster.
2) 3rd party C libraries like numpy (see above). For heavy mathematical lifting this is the number 1 choice.
3) Replacing the standard Python interpreter with one that compiles down to machine code. PyPy is the main choice here. Note that some extensions won't work with it.
4) Creating extensions in C that your Python program can use. You can make a module in C that Python can import and it'll usually run a lot faster than the equivalent Python code. Cython can help you a lot here.
5) Ditching Python entirely in favour of a faster language. Python is great but sometimes it's the wrong tool for the job. This might be a good time to consider whether you're using the right language. Consider also the target platforms and deployment method - Python is not great for mobile, won't run on the web, and deployment is a mess.