Machine Learning Experiment With A Limited Number of Rap Data.

 

A. Background

As I am taking the Machine Learning, I wanted to make a machine learning that can generate the rap lyrics. So, I needed the data of the certain rapper. I picked Eminem, because of the article “Eminem Has the Largest Vocabulary in the Music Industry, According to Study.”

B. Data

So, I modified the code of https://github.com/FrancescoGuarneri/AzLyricsAPI to get Eminem’s lyrics. I made the song list text file first and ran it with for loop. screen-shot-2016-12-10-at-22-49-57

 

After this, I combined all the text files with cat function.

screen-shot-2016-11-28-at-15-02-23

screen-shot-2016-12-10-at-22-51-07

There were 199,800 words and 26,000 lines of lyrics.

screen-shot-2016-12-10-at-23-09-48

screen-shot-2016-11-28-at-15-02-50

C. Tensor-Flow

After this I used the char-rnn-tensor-flow to train with it. It was my first Eminem lyrics from the learning machines. I posted this picture on the facebook, and one of my friends insisted this is not Eminem. The reason is because of the N-word. Eminem never uses N-word. So, I realized it is because of the featuring rappers’ lyrics.

screen-shot-2016-11-30-at-13-05-33

D. Over-Fitting

Now, I wanted to test the over-fitting. Car crash test uses one to four cars to test its probability. For example, if it was a Lamborghini, they would have used just one car. So, I thought of this sampling method to the learning machines. Since the learning machine cannot recreate Eminem with limited number of data, I wanted to test multiple of same lyrics in sampling.

maxresdefault

Since the Learning Machine cannot reproduce Lamborghini, I am adding hundreds of exact Lamborghini to the Machine Learning. Learning Machine will be able to reproduce Lamborghini, but then we might not be able to call it reproduce.

So, I used 10 of same lyrics samples to the tensor flow and this is the lyrics. Here I can see that the patterns of the sentence is better than the one sample. 

figure-3-the-5-conventional-english-language-patterns-the-5-patterns-are-denoted-as-p1

screen-shot-2016-11-30-at-16-56-45

After testing this over-fitting I realized that over-fitting made the lyrics some what boring and lost the creativity. So, I decide to stick with the original sample. So, it was drawing the Lamborghini but cannot recreate a sports car creatively. 

okok