Make Top Words Table
fst_get_top_words.Rd
Creates a table of the most frequently-occurring words (unigrams) within the data.
Usage
fst_get_top_words(
data,
number = 10,
norm = "number_words",
pos_filter = NULL,
strict = TRUE
)
Arguments
- data
A dataframe of text in CoNLL-U format.
- number
The number of top words to return, default is `10`.
- norm
The method for normalising the data. Valid settings are `"number_words"` (the number of words in the responses, default), `"number_resp"` (the number of responses), or `NULL` (raw count returned).
- pos_filter
List of UPOS tags for inclusion, default is `NULL` which means all word types included.
- strict
Whether to strictly cut-off at `number` (ties are alphabetically ordered), default is `TRUE`.
Examples
fst_get_top_words(conllu_dev_q11_1_nltk, number = 15, strict = FALSE)
#> Note:
#> Words with equal occurrence are presented in alphabetical order.
#> With `strict` = FALSE, words occurring equally often as the `number` cutoff word will be displayed.
#>
#> words occurrence
#> 1 ihminen 0.048
#> 2 asia 0.024
#> 3 elintaso 0.023
#> 4 köyhä 0.021
#> 5 paljon 0.020
#> 6 huono 0.019
#> 7 köyhyys 0.016
#> 8 tarvita 0.015
#> 9 kehitys 0.014
#> 10 maa 0.014
#> 11 apu 0.014
#> 12 ruoka 0.012
#> 13 alhainen 0.012
#> 14 kaikki 0.010
#> 15 kehittyä 0.010
#> 16 suuri 0.010
#> 17 taso 0.010
cb <- conllu_cb_bullying
pf <- c("NOUN", "VERB", "ADJ", "ADV")
fst_get_top_words(cb, number = 5, norm = "number_resp", pos_filter = pf)
#> Note:
#> Words with equal occurrence are presented in alphabetical order.
#> By default, words are presented in order to the `number` cutoff word.
#> This means that equally-occurring later-alphabetically words beyond the cutoff word will not be displayed.
#>
#> words occurrence
#> 1 lyödä 0.152
#> 2 lyöminen 0.128
#> 3 paha 0.104
#> 4 sanoa 0.080
#> 5 tehdä 0.075