Here is my understanding of Machine Learning, in beginners words.
- Create an algorithm.
- Create a program to exercise your algorithm.
- Tell the program the algorithm's parameters it can change.
- Feed the program some samples to run against the algorithm and mark the points of interest.
- Next, let the program loose on your algorithm, with more sample sets, letting the program tweak the parameters of the algorithm to find the optimal values to derive the expected outcome of the sample sets.
Ok, but how the hell do you do that?
At first, as I've covered in past posts, I had to sit and play with the sample sets to find patterns that I could boil down into an algorithm that would detect strokes. Next, I had to figure out what values I could tweak to make the stroke detection more accurate. Then I manually tweaked those values until I found some optimal values for my stroke detection algorithm.
As we can see, at the moment I am the machine. Next it will be time to replace my manual work with the machine part of machine learning.
In that vein, I have done some research into what I need to do to get the Machine learning aspect running after I finish prepping Paddle Mate for release.
Why get ready to ship Paddle Mate if it is not perfect? I want to collect more samples from users who opt in and send their kayaking sensor data to me. I can then start to improve Paddle Mate with a wider set of samples. This will really let me exercise the machine learning aspect to improve my algorithm.
Ok, here are some things I have found in my research. I am putting them here so I can refer back to them later.
- Learn R: R is an integrated suite of software facilities for data manipulation, calculation, and graphical display. - From the An Introduction to R: 1.1 The R environment
- R for Mac OS X Developer's Page
- R Manuals
- Not sure if I will need this but here is Rcpp for optimizing sections of an R program with C++.
- R-Bloggers: R News and Tutorials where I found:
- In-depth introduction to machine learning in 15 hours of expert videos which points to:
- An Introduction to Elements of Statistical Learning with Applications in R is a Stanford course available for free online.
Now why did I choose R?
- It is free.
- I know how to program so learning another programming language does not daunt me.
- It has good source material (the Introduction to Elements of Statistical Learning book and videos) which I can use to learn this.
- Did I mention it is free? Have you looked into the cost of Matlab or Mathmatica? Screw that, I don't need a second mortgage to learn this stuff.
Ok, back to productizing Paddle Mate.