Saturday, August 15, 2015

Machine Learning Research for Paddling Stroke Detection

Part of the rationale behind the creation of Paddle Mate has been to get a better understanding of what Machine Learning really is. The best way to achieve an understanding of something is to do it, hence I came up with Paddle Mate as a platform to build on as I teach it to detect paddle strokes.

Here is my understanding of Machine Learning, in beginners words.

  • Create an algorithm.
  • Create a program to exercise your algorithm.
  • Tell the program the algorithm's parameters it can change.
  • Feed the program some samples to run against the algorithm and mark the points of interest.
  • Next, let the program loose on your algorithm, with more sample sets, letting the program tweak the parameters of the algorithm to find the optimal values to derive the expected outcome of the sample sets.
Ok, but how the hell do you do that?

At first, as I've covered in past posts, I had to sit and play with the sample sets to find patterns that I could boil down into an algorithm that would detect strokes. Next, I had to figure out what values I could tweak to make the stroke detection more accurate. Then I manually tweaked those values until I found some optimal values for my stroke detection algorithm.

As we can see, at the moment I am the machine. Next it will be time to replace my manual work with the machine part of machine learning.

In that vein, I have done some research into what I need to do to get the Machine learning aspect running after I finish prepping Paddle Mate for release.

Why get ready to ship Paddle Mate if it is not perfect? I want to collect more samples from users who opt in and send their kayaking sensor data to me. I can then start to improve Paddle Mate with a wider set of samples. This will really let me exercise the machine learning aspect to improve my algorithm.

Ok, here are some things I have found in my research. I am putting them here so I can refer back to them later.
Now why did I choose R?
  • It is free.
  • I know how to program so learning another programming language does not daunt me.
  • It has good source material (the Introduction to Elements of Statistical Learning book and videos) which I can use to learn this.
  • Did I mention it is free? Have you looked into the cost of Matlab or Mathmatica? Screw that, I don't need a second mortgage to learn this stuff.
Ok, back to productizing Paddle Mate.

No comments: