News headlines text analysis | DataScience+
[ad_1]
Within the current tutorial, I present an introductory textual content evaluation of a ABC-news information headlines dataset. I’ll take a look to the commonest phrases therein current and run a sentiment evaluation on these headlines by benefiting from the next sentiment lexicons:
The NRC sentiment lexicon from Saif Mohammad and Peter Turney categorizes phrases into classes of optimistic, destructive, anger, anticipation, disgust, worry pleasure, disappointment, shock and belief.
The Bing sentiment lexicon from Bing Liu and others categorizes phrases into optimistic or destructive sentiment class.
The AFINN sentiment lexicon from Finn Arup Nielsen assigns phrases with a rating from -5 to five, with destructive scores indicating destructive sentiment and optimistic scores indicating optimistic sentiment.
For extra details about these sentiment lexicons, see references listed out on the backside.
Packages
I’m going to benefit from the next R packages.
suppressPackageStartupMessages(library(stringr))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(tidytext))
suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(textdata))
suppressPackageStartupMessages(library(widyr))
suppressPackageStartupMessages(library(ggplot2))
Packages variations are herein listed.
packages <- c("stringr", "dplyr", "tidytext", "tidyr", "textdata", "widyr", "ggplot2")
model <- lapply(packages, packageVersion)
version_c <- do.name(c, model)
information.body(packages=packages, model = as.character(version_c))
## packages model
## 1 stringr 1.4.0
## 2 dplyr 0.8.4
## Three tidytext 0.2.2
## 4 tidyr 1.0.2
## 5 textdata 0.3.0
## 6 widyr 0.1.2
## 7 ggplot2 3.2.1
Working on Home windows-10 the next R language model.
R.model
## _
## platform x86_64-w64-mingw32
## arch x86_64
## os mingw32
## system x86_64, mingw32
## standing
## main 3
## minor 5.3
## 12 months 2019
## month 03
## day 11
## svn rev 76217
## language R
## model.string R model 3.5.3 (2019-03-11)
## nickname Nice Reality
Notice
Earlier than operating this code, be certain to have downloaded the lexicon of the emotions baselines in use by executing the next operations:
get_sentiments("nrc")
get_sentiments("bing")
get_sentiments("afinn")
and accepting all prescriptions as requested by the interactive menu exhibiting up.
Getting Information
I then obtain our information dataset containing thousands and thousands of headlines from:
“https://www.kaggle.com/therohk/million-headlines/downloads/million-headlines.zip/7”
Its uncompression produces the abcnews-date-text.csv file. I load it into the news_data dataset and take a look at.
news_data <- learn.csv("abcnews-date-text.csv", header = TRUE, stringsAsFactors = FALSE)
dim(news_data)
## [1] 1103663 2
head(news_data)
## publish_date headline_text
## 1 20030219 aba decides towards group broadcasting licence
## 2 20030219 act fireplace witnesses should pay attention to defamation
## 3 20030219 a g requires infrastructure safety summit
## 4 20030219 air nz employees in aust strike for pay rise
## 5 20030219 air nz strike to have an effect on australian travellers
## 6 20030219 formidable olsson wins triple bounce
tail(news_data)
## publish_date headline_text
## 1103658 20171231 beautiful photographs from the sydney to hobart yacht
## 1103659 20171231 the ashes smiths warners close to miss enliven boxing day take a look at
## 1103660 20171231 timelapse: brisbanes new 12 months fireworks
## 1103661 20171231 what 2017 meant to the youngsters of australia
## 1103662 20171231 what the papodopoulos assembly might imply for ausus
## 1103663 20171231 who's george papadopoulos the previous trump marketing campaign aide
Token Evaluation
It’s time to extract the tokens from our dataset. Choose the column named as headline_text and unnesting the phrase tokens decide the next.
news_df <- news_data %>% choose(headline_text)
news_tokens <- news_df %>% unnest_tokens(phrase, headline_text)
head(news_tokens, 10)
## phrase
## 1 aba
## 1.1 decides
## 1.2 towards
## 1.3 group
## 1.Four broadcasting
## 1.5 licence
## 2 act
## 2.1 fireplace
## 2.2 witnesses
## 2.3 should
tail(news_tokens, 10)
## phrase
## 1103662.7 ausus
## 1103663 who
## 1103663.1 is
## 1103663.2 george
## 1103663.Three papadopoulos
## 1103663.4 the
## 1103663.5 former
## 1103663.6 trump
## 1103663.7 marketing campaign
## 1103663.8 aide
It’s fascinating to generate and examine a desk reporting what number of instances every token exhibits up throughout the headlines and its proportion with respect the overall.
news_tokens_count <- news_tokens %>% depend(phrase, type = TRUE) %>% mutate(proportion = n / sum(n))
The highest-10 phrases which seem most.
head(news_tokens_count, 10)
## # A tibble: 10 x 3
## phrase n proportion
## <chr> <int> <dbl>
## 1 to 214201 0.0303
## 2 in 135981 0.0192
## Three for 130239 0.0184
## Four of 80759 0.0114
## 5 on 73037 0.0103
## 6 over 50306 0.00711
## 7 the 49810 0.00704
## Eight police 35984 0.00509
## 9 at 31723 0.00449
## 10 with 29676 0.00420
And those which seem much less incessantly:
tail(news_tokens_count, 10)
## # A tibble: 10 x 3
## phrase n proportion
## <chr> <int> <dbl>
## 1 zweli 1 0.000000141
## 2 zwitkowsky 1 0.000000141
## Three zydelig 1 0.000000141
## Four zygar 1 0.000000141
## 5 zygiefs 1 0.000000141
## 6 zylvester 1 0.000000141
## 7 zynga 1 0.000000141
## Eight zyngier 1 0.000000141
## 9 zz 1 0.000000141
## 10 zzz 1 0.000000141
There is a matter in having doing that means. The difficulty is that there are phrases which should not have related position in easing the sentiment evaluation, the so referred to as cease phrases. Herein under the cease phrases wihin our dataset are proven.
information(stop_words)
head(stop_words, 10)
## # A tibble: 10 x 2
## phrase lexicon
## <chr> <chr>
## 1 a SMART
## 2 a's SMART
## Three ready SMART
## Four about SMART
## 5 above SMART
## 6 in accordance SMART
## 7 accordingly SMART
## Eight throughout SMART
## 9 truly SMART
## 10 after SMART
To take away cease phrases as required, we benefit from the anti_join operation.
news_tokens_no_sp <- news_tokens %>% anti_join(stop_words)
head(news_tokens_no_sp, 10)
## phrase
## 1 aba
## 2 decides
## 3 group
## 4 broadcasting
## 5 licence
## 6 act
## 7 fireplace
## 8 witnesses
## 9 conscious
## 10 defamation
Then, counting information tokens once more after having eliminated the cease phrases.
news_tokens_count <- news_tokens_no_sp %>% depend(phrase, type = TRUE) %>% mutate(proportion = n / sum(n))
head(news_tokens_count, 10)
## # A tibble: 10 x 3
## phrase n proportion
## <chr> <int> <dbl>
## 1 police 35984 0.00673
## 2 govt 16923 0.00317
## Three courtroom 16380 0.00306
## Four council 16343 0.00306
## 5 interview 15025 0.00281
## 6 fireplace 13910 0.00260
## 7 nsw 12912 0.00242
## Eight australia 12353 0.00231
## 9 plan 12307 0.00230
## 10 water 11874 0.00222
tail(news_tokens_count)
## # A tibble: 6 x 3
## phrase n proportion
## <chr> <int> <dbl>
## 1 zygiefs 1 0.000000187
## 2 zylvester 1 0.000000187
## Three zynga 1 0.000000187
## Four zyngier 1 0.000000187
## 5 zz 1 0.000000187
## 6 zzz 1 0.000000187
Then, I filter out tokens having greater than 8,000 counts.
news_token_over8000 <- news_tokens_count %>% filter(n > 8000) %>% mutate(phrase = reorder(phrase, n))
nrow(news_token_over8000)
## [1] 32
head(news_token_over8000, 10)
## # A tibble: 10 x 3
## phrase n proportion
## <fct> <int> <dbl>
## 1 police 35984 0.00673
## 2 govt 16923 0.00317
## Three courtroom 16380 0.00306
## Four council 16343 0.00306
## 5 interview 15025 0.00281
## 6 fireplace 13910 0.00260
## 7 nsw 12912 0.00242
## Eight australia 12353 0.00231
## 9 plan 12307 0.00230
## 10 water 11874 0.00222
tail(news_token_over8000, 10)
## # A tibble: 10 x 3
## phrase n proportion
## <fct> <int> <dbl>
## 1 day 8818 0.00165
## 2 hospital 8815 0.00165
## Three automotive 8690 0.00163
## Four coast 8411 0.00157
## 5 calls 8401 0.00157
## 6 win 8315 0.00156
## 7 lady 8213 0.00154
## Eight killed 8129 0.00152
## 9 accused 8094 0.00151
## 10 world 8087 0.00151
It’s fascinating to point out the proportion as per-thousands by the use of an histogram plot.
news_token_over8000 %>%
ggplot(aes(phrase, proportion*1000, fill=ceiling(proportion*1000))) +
geom_col() + xlab(NULL) + coord_flip() + theme(legend.place = "none")
Information Sentiment Evaluation
On this paragraph, I concentrate on every single headline to guage its particular sentiment as decided by every lexicon. Therefore the output shall decide if every particular headline has obtained optimistic or destructive sentiment.
head(news_df, 10)
## headline_text
## 1 aba decides towards group broadcasting licence
## 2 act fireplace witnesses should pay attention to defamation
## 3 a g requires infrastructure safety summit
## 4 air nz employees in aust strike for pay rise
## 5 air nz strike to have an effect on australian travellers
## 6 formidable olsson wins triple bounce
## 7 antic delighted with document breaking barca
## 8 aussie qualifier stosur wastes 4 memphis match
## 9 aust addresses un safety council over iraq
## 10 australia is locked into conflict timetable opp
I’ll analyse solely the primary 1000 headlines only for computational time causes. The token checklist of such is as follows.
news_df_subset <- news_df[1:1000,,drop=FALSE]
tkn_l <- apply(news_df_subset, 1, operate(x) { information.body(headline_text=x, stringsAsFactors = FALSE) %>% unnest_tokens(phrase, headline_text)})
Eradicating the cease phrases from the token checklist.
single_news_tokens <- lapply(tkn_l, operate(x) {anti_join(x, stop_words)})
str(single_news_tokens, checklist.len = 5)
## Listing of 1000
## $ 1 :'information.body': 5 obs. of 1 variable:
## ..$ phrase: chr [1:5] "aba" "decides" "group" "broadcasting" ...
## $ 2 :'information.body': 5 obs. of 1 variable:
## ..$ phrase: chr [1:5] "act" "fireplace" "witnesses" "conscious" ...
## $ 3 :'information.body': Four obs. of 1 variable:
## ..$ phrase: chr [1:4] "calls" "infrastructure" "safety" "summit"
## $ 4 :'information.body': 7 obs. of 1 variable:
## ..$ phrase: chr [1:7] "air" "nz" "employees" "aust" ...
## $ 5 :'information.body': 6 obs. of 1 variable:
## ..$ phrase: chr [1:6] "air" "nz" "strike" "have an effect on" ...
## [list output truncated]
As we are able to see, to every headline is related an inventory of tokens. The sentiment of a headline is computed as based mostly on the sum of optimistic/destructive rating of every token of.
single_news_tokens[[1]]
## phrase
## 1 aba
## 2 decides
## 3 group
## Four broadcasting
## 5 licence
Bing lexicon
On this paragraph, the computation of the sentiment related to the tokens checklist is proven for Bing lexicon. I first outline a operate named as compute_sentiment() whose goal is to output the positiveness rating of a particular headline.
compute_sentiment <- operate(d) {
if (nrow(d) == 0) {
return(NA)
}
neg_score <- d %>% filter(sentiment=="destructive") %>% nrow()
pos_score <- d %>% filter(sentiment=="optimistic") %>% nrow()
pos_score - neg_score
}
The internal be a part of on bing lexicon of every single headline tokens checklist is given as enter to the compute_sentiment() operate to find out the sentiment rating of every particular headline.
sentiments_bing <- get_sentiments("bing")
str(sentiments_bing)
## Courses 'tbl_df', 'tbl' and 'information.body': 6786 obs. of 2 variables:
## $ phrase : chr "2-faces" "irregular" "abolish" "abominable" ...
## $ sentiment: chr "destructive" "destructive" "destructive" "destructive" ...
single_news_sentiment_bing <- sapply(single_news_tokens, operate(x) { x %>% inner_join(sentiments_bing) %>% compute_sentiment()})
The result’s a vector of integers every factor worth at i-th place is the sentiment related to the i-th information
str(single_news_sentiment_bing)
## Named int [1:1000] NA -1 1 -1 -1 2 Zero NA NA NA ...
## - attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...
Right here is the abstract, please observe that:
- the median is destructive
- NA’s present up
abstract(single_news_sentiment_bing)
## Min. 1st Qu. Median Imply third Qu. Max. NA's
## -3.000 -1.000 -1.000 -0.475 1.000 2.000 520
Gathering the leading to a knowledge body as follows.
single_news_sentiment_bing_df <- information.body(headline_text=news_df_subset$headline_text, rating = single_news_sentiment_bing)
head(single_news_sentiment_bing_df, 10)
## headline_text rating
## 1 aba decides towards group broadcasting licence NA
## 2 act fireplace witnesses should pay attention to defamation -1
## 3 a g requires infrastructure safety summit 1
## 4 air nz employees in aust strike for pay rise -1
## 5 air nz strike to have an effect on australian travellers -1
## 6 formidable olsson wins triple bounce 2
## 7 antic delighted with document breaking barca 0
## 8 aussie qualifier stosur wastes 4 memphis match NA
## 9 aust addresses un safety council over iraq NA
## 10 australia is locked into conflict timetable opp NA
NRC lexicon
On this paragraph, the computation of the sentiment related to the tokens checklist is proven for NRC lexicon. With respect the earlier evaluation based mostly on bing lexicon, some extra pre-processing is required as defined in what follows. First we get the NRC sentiment lexicon and see what are the emotions threin current.
sentiments_nrc <- get_sentiments("nrc")
(unique_sentiments_nrc <- distinctive(sentiments_nrc$sentiment))
## [1] "belief" "worry" "destructive" "disappointment" "anger" "shock"
## [7] "optimistic" "disgust" "pleasure" "anticipation"
To have as output a optimistic/destructive sentiment outcome, I outline a mapping of abovelisted sentiments to a optimistic/destructive string outcome as follows.
compute_pos_neg_sentiments_nrc <- operate(the_sentiments_nrc) {
s <- distinctive(the_sentiments_nrc$sentiment)
df_sentiments <- information.body(sentiment = s,
mapped_sentiment = c("optimistic", "destructive", "destructive", "destructive",
"destructive", "optimistic", "optimistic", "destructive",
"optimistic", "optimistic"))
ss <- sentiments_nrc %>% inner_join(df_sentiments)
the_sentiments_nrc$sentiment <- ss$mapped_sentiment
the_sentiments_nrc
}
nrc_sentiments_pos_neg_scale <- compute_pos_neg_sentiments_nrc(sentiments_nrc)
Above operate is used to provide the only headline textual content sentiment outcomes. Such result’s given as enter to the compute_sentiment() operate.
single_news_sentiment_nrc <- sapply(single_news_tokens, operate(x) { x %>% inner_join(nrc_sentiments_pos_neg_scale) %>% compute_sentiment()})
str(single_news_sentiment_nrc)
## Named int [1:1000] 1 -Four 1 2 -2 2 Four NA 5 -2 ...
## - attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...
Right here is the abstract, please observe that:
- the median is the same as zero
- NA’s present up
abstract(single_news_sentiment_nrc)
## Min. 1st Qu. Median Imply third Qu. Max. NA's
## -9.0000 -2.0000 0.0000 -0.3742 2.0000 9.0000 257
single_news_sentiment_nrc_df <- information.body(headline_text=news_df_subset$headline_text, rating = single_news_sentiment_nrc)
head(single_news_sentiment_nrc_df, 10)
## headline_text rating
## 1 aba decides towards group broadcasting licence 1
## 2 act fireplace witnesses should pay attention to defamation -4
## 3 a g requires infrastructure safety summit 1
## 4 air nz employees in aust strike for pay rise 2
## 5 air nz strike to have an effect on australian travellers -2
## 6 formidable olsson wins triple bounce 2
## 7 antic delighted with document breaking barca 4
## 8 aussie qualifier stosur wastes 4 memphis match NA
## 9 aust addresses un safety council over iraq 5
## 10 australia is locked into conflict timetable opp -2
AFINN lexicon
On this paragraph, the computation of the sentiment related to the tokens checklist is proven for AFINN lexicon.
sentiments_afinn <- get_sentiments("afinn")
colnames(sentiments_afinn) <- c("phrase", "sentiment")
str(sentiments_afinn)
## Courses 'spec_tbl_df', 'tbl_df', 'tbl' and 'information.body': 2477 obs. of 2 variables:
## $ phrase : chr "abandon" "deserted" "abandons" "kidnapped" ...
## $ sentiment: num -2 -2 -2 -2 -2 -2 -3 -3 -3 -3 ...
## - attr(*, "spec")=
## .. cols(
## .. phrase = col_character(),
## .. worth = col_double()
## .. )
As we are able to see, the afinn lexicon gives a rating for every token. We simply must sum up every headline tokens rating to acquire the sentiment rating of the headline underneath evaluation.
single_news_sentiment_afinn_df <- lapply(single_news_tokens, operate(x) { x %>% inner_join(sentiments_afinn)})
single_news_sentiment_afinn <- sapply(single_news_sentiment_afinn_df, operate(x) {
ifelse(nrow(x) > 0, sum(x$sentiment), NA)
})
str(single_news_sentiment_afinn)
## Named num [1:1000] NA -2 NA -2 -1 6 Three NA NA -2 ...
## - attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...
Right here is the abstract, please observe that:
- the median is destructive
- NA’s present up
abstract(single_news_sentiment_afinn)
## Min. 1st Qu. Median Imply third Qu. Max. NA's
## -9.000 -3.000 -2.000 -1.148 1.000 7.000 508
single_news_sentiment_afinn_df <- information.body(headline_text=news_df_subset$headline_text, rating = single_news_sentiment_afinn)
head(single_news_sentiment_afinn_df, 10)
## headline_text rating
## 1 aba decides towards group broadcasting licence NA
## 2 act fireplace witnesses should pay attention to defamation -2
## 3 a g requires infrastructure safety summit NA
## 4 air nz employees in aust strike for pay rise -2
## 5 air nz strike to have an effect on australian travellers -1
## 6 formidable olsson wins triple bounce 6
## 7 antic delighted with document breaking barca 3
## 8 aussie qualifier stosur wastes 4 memphis match NA
## 9 aust addresses un safety council over iraq NA
## 10 australia is locked into conflict timetable opp -2
Evaluating outcomes
Having obtained for every information three potential outcomes as sentiment analysis, we want to examine their congruency.
As congruence we imply the truth that all three lexicons specific the identical optimistic or destructive outcome, in different phrases the identical rating signal indipendently from its magnitude. If NA values are current, the congruence shall be computed till a minimum of two non NA values can be found, in any other case is the same as NA.
Moreover we compute the ultimate information sentiment as based mostly upon the sum of every lexicon sentiment rating.
compute_congruence <- operate(x,y,z) {
v <- c(signal(x), signal(y), signal(z))
# if just one lexicon studies the rating, we can not test for congruence
if (sum(is.na(v)) >= 2) {
return (NA)
}
# eradicating NA and nil worth
v <- na.omit(v)
v_sum <- sum(v)
abs(v_sum) == size(v)
}
compute_final_sentiment <- operate(x,y,z) {
if (is.na(x) && is.na(y) && is.na(z)) {
return (NA)
}
s <- sum(x, y, z, na.rm=TRUE)
# optimistic sentiments have rating strictly better than zero
# destructive sentiments have rating strictly lower than zero
# impartial sentiments have rating equal to zero
ifelse(s > 0, "optimistic", ifelse(s < 0, "destructive", "impartial"))
}
news_sentiments_results <- information.body(headline_text = news_df_subset$headline_text,
bing_score = single_news_sentiment_bing,
nrc_score = single_news_sentiment_nrc,
afinn_score = single_news_sentiment_afinn,
stringsAsFactors = FALSE)
news_sentiments_results <- news_sentiments_results %>% rowwise() %>%
mutate(final_sentiment = compute_final_sentiment(bing_score, nrc_score, afinn_score),
congruence = compute_congruence(bing_score, nrc_score, afinn_score))
head(news_sentiments_results, 40)
## Supply: native information body [40 x 6]
## Teams: <by row>
##
## # A tibble: 40 x 6
## headline_text bing_score nrc_score afinn_score final_sentiment congruence
## <chr> <int> <int> <dbl> <chr> <lgl>
## 1 aba decides towards group broadcas~ NA 1 NA optimistic NA
## 2 act fireplace witnesses should pay attention to de~ -1 -4 -2 destructive TRUE
## Three a g requires infrastructure protectio~ 1 1 NA optimistic TRUE
## Four air nz employees in aust strike for pay ri~ -1 2 -2 destructive FALSE
## 5 air nz strike to have an effect on australian tra~ -1 -2 -1 destructive TRUE
## 6 formidable olsson wins triple bounce 2 2 6 optimistic TRUE
## 7 antic delighted with document breaking b~ 0 4 Three optimistic FALSE
## Eight aussie qualifier stosur wastes 4 me~ NA NA NA <NA> NA
## 9 aust addresses un safety council ove~ NA 5 NA optimistic NA
## 10 australia is locked into conflict timetable~ NA -2 -2 destructive TRUE
## # ... with 30 extra rows
Is can be helpful to exchange the numeric rating with similar {destructive, impartial, optimistic} scale.
replace_score_with_sentiment <- operate(v_score) {
v_score[v_score > 0] <- "optimistic"
v_score[v_score < 0] <- "destructive"
v_score[v_score == 0] <- "impartial"
v_score
}
news_sentiments_results$bing_score <- replace_score_with_sentiment(news_sentiments_results$bing_score)
news_sentiments_results$nrc_score <- replace_score_with_sentiment(news_sentiments_results$nrc_score)
news_sentiments_results$afinn_score <- replace_score_with_sentiment(news_sentiments_results$afinn_score)
news_sentiments_results[,2:5] <- lapply(news_sentiments_results[,2:5], as.issue)
head(news_sentiments_results, 40)
## Supply: native information body [40 x 6]
## Teams: <by row>
##
## # A tibble: 40 x 6
## headline_text bing_score nrc_score afinn_score final_sentiment congruence
## <chr> <fct> <fct> <fct> <fct> <lgl>
## 1 aba decides towards group broadcas~ <NA> optimistic <NA> optimistic NA
## 2 act fireplace witnesses should pay attention to de~ destructive destructive destructive destructive TRUE
## Three a g requires infrastructure protectio~ optimistic optimistic <NA> optimistic TRUE
## Four air nz employees in aust strike for pay ri~ destructive optimistic destructive destructive FALSE
## 5 air nz strike to have an effect on australian tra~ destructive destructive destructive destructive TRUE
## 6 formidable olsson wins triple bounce optimistic optimistic optimistic optimistic TRUE
## 7 antic delighted with document breaking b~ impartial optimistic optimistic optimistic FALSE
## Eight aussie qualifier stosur wastes 4 me~ <NA> <NA> <NA> <NA> NA
## 9 aust addresses un safety council ove~ <NA> optimistic <NA> optimistic NA
## 10 australia is locked into conflict timetable~ <NA> destructive destructive destructive TRUE
## # ... with 30 extra rows
Tabularizations of every lexicon ensuing sentiment and ultimate sentiments are herein proven.
desk(news_sentiments_results$bing_score, news_sentiments_results$final_sentiment, dnn = c("bing", "ultimate"))
## ultimate
## bing destructive impartial optimistic
## destructive 278 15 14
## impartial 16 6 11
## optimistic 6 7 127
desk(news_sentiments_results$nrc_score, news_sentiments_results$final_sentiment, dnn = c("nrc", "ultimate"))
## ultimate
## nrc destructive impartial optimistic
## destructive 353 10 4
## impartial 18 13 6
## optimistic 25 16 298
desk(news_sentiments_results$afinn_score, news_sentiments_results$final_sentiment, dnn = c("afinn", "ultimate"))
## ultimate
## afinn destructive impartial optimistic
## destructive 326 10 12
## impartial 3 1 6
## optimistic 4 9 121
Tabularization of congruence and ultimate sentiments is herein proven.
desk(news_sentiments_results$congruence, news_sentiments_results$final_sentiment, dnn = c("congruence", "ultimate"))
## ultimate
## congruence destructive impartial optimistic
## FALSE 67 33 45
## TRUE 292 0 132
Conclusions
We analyzed the information headlines to find out their sentiments whereas benefiting from three sentiments lexicons. We outlined some fundamentals of the methodology for such goal. We additionally had the prospect to match the outcomes obtained by the use of all three lexicons and set forth a ultimate sentiment analysis. In case you are concerned about understanding rather more about textual content evaluation, see ref. [4].
References
[1] NRC sentiment lexicon
[2] BING sentiment lexicon
[3] AFINN sentiment lexicon
[4] Text mining with R
[ad_2]
Source link