WhatsApp Sentimental Analysis using R

WhatsApp seems to become increasingly dominant not just as a messaging service but also as a social network — due to its group chat. So as a Data Analyst, I thought why not get my hands dirty on WhatsApp chats.

So for Whatsapp Chat Analysis, We have to first download the chats. Retrieving chat logs from the Android or iOS app is very straightforward: Simply choose More in the menu of a chat, then Export chat and export the history to a txt file (Without media)

So let’s get started…..


chat <- rwa_read("yourchat.txt")

chat <- chat[!is.na(chat$author),] #Removing rows having NA as author

chat <- subset(chat, !(text=="<Media omitted>" ))# Removing the messaged which contains Media

After Importing the chat in Rstudio, Now I will see who is the most frequent user in the group chat. This will be done by simply counting the total sent messages by each user.

chat %>%
  count(author) %>%
  ggplot(aes(x = reorder(author, n), y = n)) +
  geom_bar(stat="identity", color='skyblue',fill='steelblue') +
  geom_text(aes(label = scales::comma(n)), hjust = -0.1)+
  ylab("") + xlab("Mailay Boys") +
  coord_flip() +
  ggtitle("Number of messages")+ theme_bw()


How does it look if we compare favorite words? I use the excellent tidytext package to get this task done.

Let’s Analyse the most commonly used words by every user.

chat %>%
  unnest_tokens(input = text,
                output = word) %>%
  count(author, word, sort = TRUE) %>%
  group_by(author) %>%
  top_n(n = 10, n) %>%
  ggplot(aes(x = reorder_within(word, n, author), y = n, fill = author)) +
  geom_col(show.legend = FALSE) +
  ylab("") +
  xlab("") +
  coord_flip() +
  facet_wrap(~author, ncol = 8, scales = "free_y") +
  scale_x_reordered() +
  ggtitle("Most often used words")


Now let’s count the occurence of any word

Here I am checking the count of word ‘BC’.

chat %>%unnest_tokens(input = text, output = word) %>%
       filter(str_detect(word, "bc")) %>%
       count(author,word, sort = TRUE) %>%
       subset(word=="bc") %>%
       ggplot(aes(x=reorder(author,n),y =n))+geom_bar(stat = "identity", fill= "#d9534f")+
       geom_text(aes(label = scales::comma(n)), hjust = -0.1)+
       ggtitle("BC used")+theme_bw()


Now lets see how many unique words a person used.

chat %>%
  unnest_tokens(input = text,
                output = word) %>%

  group_by(author) %>%
  summarise(Unique_Words = n_distinct(word)) %>%

  ggplot(aes(x = reorder(author, Unique_Words),
             y = Unique_Words,
             fill = author)) +
  geom_bar(stat = "identity",show.legend = "FALSE") +
  geom_text(aes(label = scales::comma(Unique_Words)), hjust = -0.1) +
  ylab("unique words") +
  xlab("") +


Now lets jump onto analyzing sentiment of the whole chat by each person. In the group chat, we have mostly used ‘Urdu’ so the analysis is not accurate. But you can use this where you have used ‘English’ only.

Sentiment <- get_nrc_sentiment(chat$text)
Sentiment <- cbind(Sentiment,chat$text)
Sentiment <- cbind(Sentiment,chat$author)
Sentiment_Analysis <-aggregate(Sentiment[,c(1:10)], by=list(Players=Sentiment$`chat$author`), FUN=sum)
Sentiment_Analysis %>% tidyr::gather("id", "value", 2:11) %>% 
       ggplot(., aes(Players, value))+
       facet_wrap(~id, nrow = 2)+


So above all is my take on whatsapp chat sentiment analysis. Did project to enhance my skills in R language. And due to this project I was able to learn new things. 😊

😃 On a lighter note: Why did the sentiment analyzer go to therapy? Because it couldn’t handle all the mixed feelings!

Share the Post:

Related Posts