Machine Learning Experiment With A Limited Number of Rap Data.
A. Background
As I am taking the Machine Learning, I wanted to make a machine learning that can generate the rap lyrics. So, I needed the data of the certain rapper. I picked Eminem, because of the article “Eminem Has the Largest Vocabulary in the Music Industry, According to Study.”
B. Data
So, I modified the code of https://github.com/FrancescoGuarneri/AzLyricsAPI to get Eminem’s lyrics. I made the song list text file first and ran it with for loop.
After this, I combined all the text files with cat function.
There were 199,800 words and 26,000 lines of lyrics.
C. Tensor-Flow
After this I used the char-rnn-tensor-flow to train with it. It was my first Eminem lyrics from the learning machines. I posted this picture on the facebook, and one of my friends insisted this is not Eminem. The reason is because of the N-word. Eminem never uses N-word. So, I realized it is because of the featuring rappers’ lyrics.
D. Over-Fitting
Now, I wanted to test the over-fitting. Car crash test uses one to four cars to test its probability. For example, if it was a Lamborghini, they would have used just one car. So, I thought of this sampling method to the learning machines. Since the learning machine cannot recreate Eminem with limited number of data, I wanted to test multiple of same lyrics in sampling.
Since the Learning Machine cannot reproduce Lamborghini, I am adding hundreds of exact Lamborghini to the Machine Learning. Learning Machine will be able to reproduce Lamborghini, but then we might not be able to call it reproduce.
So, I used 10 of same lyrics samples to the tensor flow and this is the lyrics. Here I can see that the patterns of the sentence is better than the one sample.
After testing this over-fitting I realized that over-fitting made the lyrics some what boring and lost the creativity. So, I decide to stick with the original sample. So, it was drawing the Lamborghini but cannot recreate a sports car creatively.