I spent a little time today refining my code to clean up the resultant data.  For one I found I wasn’t doing cleaning on anything.  I had doing all of the cleaning prior to making a corpus of the words, thus no effect.  It still looked cool though.   I’ve also worked out removing standard and custom stop words and cleaned out the URLs using a little function with a gsub.   Above is the current result using #BernieSanders hashtag.  I’m not necessarily all that in to one candidate or the other, it’s just that this one is in the news lately and seems to have enough interesting activity.  The tweet count pulled was increased to 250.  Right now I can’t get stemCompletion to work, so many words here are truncated a little.   It is interesting to study though.   It seems like quite a cross section.   Something I find odd is that the word “amp” is a pretty common one and I’m not sure why that is.  It is a point for further research.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s