Machine Learning is a large field, and often gets mixed in with Deep Learning (Which is a subset of the field). Machine Learning really is open to anyone who can code, and has a basic understanding of linear algebra (though a more fundamental understanding doesn't hurt).
It all comes down to the problem you are trying to solve, the model that best represents that problem, and the entailing solution, and how much visibility you want into the process of that algorithm.
There are two general classes of Machine Learning that algorithms typically fall into. Supervised Learning, and Unsupervised learning. Supervised learning requires that you present the algorithm with labeled training examples for the feature extraction, and processing. Unsupervised learning involves no training data, but an unmarked data set, and letting the algorithm attempt to cluster similarities in the dataset together.
Two common algorithms that are used heavily in Machine Learning (Under the context of supervised learning) is Naive Bayes, and ID3/C4.5. The former being quite good at giving classification results with minimum training data, and excels at generalization, and doesn't suffer as much to over-fitting (Too much correlation with your training data entailing losing the ability to generalize effectively about new data). One draw back is the naive part due to variables in a feature set having independence from one another. This normally would pop up in a dataset where features are highly correlated by nature, though, Naive Bayes can still give good results in any case.
ID3/C4.5 are decision trees. You know the kind management bring up in conferences? These are admittedly a bit different then that, but follow the same general logic of: If x then this, if y then this. Yeah, they can also be considered under the machine learning umbrella due to how they are built from using labeled training data, and distilling said data with entropy formulas to find optimal splits in the tree. One common issue for most decision trees however, is over-fitting your training data. (The process of which your tree has become perfect at classifying your training data, but is unable to classify/or generalize new data as the "progression-ary" logic of stepping down the trees nodes/Selected features has become too correlated to the initial training data). This problem is normally combated with keeping a subset of the training data on hand, and running it against the tree as if it were unseen data to find nodes that you can potentially prune to give better generalization results.
The cool thing about both of these algorithms is that I feel even a motivated beginner could homegrow/code these out without substantial help. However, there are all sorts of other algorithms, random forests, SVMs, linear regression, and the list goes on, and on.
ANNs usually fall into the deep-learning categories, and excel in certain tasks (image detection). However, using them for Sentiment Analysis, or Text classification is usually a bit much when a Bag Of Words/Bayesian Approach can give similar results, with similar accuracy, less probability of over-fitting, less black-boxish (though arguably multiplying thousands of probabilities together when computing the algo isn't much better ) and easier in general to code/tweak.
I'd recommend checking this out as well. It gives a good, non mathematical perspective of a typical bayesian classifier. It's beginner oriented, and the author articulates himself well enough that what he covers is easily transferable to code.
https://monkeylearn.com/blog/practical-explanation-naive-bayes-classifier/