I use Scikit Learn with Python for all of my ML projects. It's really easy and straightforward to use and get of the ground.
I've done quite a bit of text mining work. If you're interested in tweets and sentiment, this was a good dataset:
http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/
I've done a lot of random stuff in machine learning. What specifically are you interested in?
Ooh, this is definitely the sort of thing I'm looking for. I'm not sure if I'm looking for any particular type of data set at this point other than something that'll help me maintain interest. I guess if I had to choose, I'd pick stock prices and brain scan data. Brain scans seem just inherently interesting, and stock prices because it seems like being able to predict even incrementally better than what I can do without machine learning would have some tangible value.
Hmm, well stock prices have tons of datasets out there, since it's one of the most popular applications for ML out there. A google search turned up this, which looks pretty good:
https://archive.ics.uci.edu/ml/datasets/dow+jones+index
http://pages.swcp.com/stocks/
There's a ton more out there for stocks. Brain scans also seem to have sources:
http://www.oasis-brains.org
http://brain-development.org/ixi-dataset/
A note on brain scan data, you will probably have to do some image processing on those, depending on which method you use. Methods like Principle Component Analysis (PCA) and Cascade classifiers are really popular. For image based applications, definitely also have a look at OpenCV, (available in many different languages, including Python and C++) in conjunction with an ML library.
Generally, the more obscure the problem, the less data there is. I recently did some work on bank strength testing with machine learning techniques. The idea was that we wanted to predict if a bank would fail based on historical data, given some variables. We used a support vector machine model to classify banks as safe or unsafe. Problem is that there isn't a ton of data out there, especially for banks after the financial crisis, and especially for banks that failed. The results were pretty decent all things considered, but it's a good example of where data shortages can occur.
A lot of ML tho is just about formulating the problem in a way that it's solvable through ML techniques. Things like selecting variables, finding good data with those variables, etc. are where a lot of the work is. Then comparing the different methods. ML is really hit or miss though, in that either works really well or just is a terrible idea for any given application. Still really interesting though, one way or another!