Make Top N-grams Table 2
fst_get_top_ngrams2.Rd
Creates a table of the most frequently-occurring ngrams within the data. Equivalent to `fst_get_top_ngrams()` but does not print message.
Usage
fst_get_top_ngrams2(
data,
number = 10,
ngrams = 1,
norm = "number_words",
pos_filter = NULL,
strict = TRUE
)
Arguments
- data
A dataframe of text in CoNLL-U format.
- number
The number of n-grams to return, default is `10`.
- ngrams
The type of n-grams to return, default is `1`.
- norm
The method for normalising the data. Valid settings are `'number_words'` (the number of words in the responses, default), `'number_resp'` (the number of responses), or `NULL` (raw count returned).
- pos_filter
List of UPOS tags for inclusion, default is `NULL` which means all word types included.
- strict
Whether to strictly cut-off at `number` (ties are alphabetically ordered), default is `TRUE`.
Examples
fst_get_top_ngrams2(conllu_dev_q11_1_nltk)
#> words occurrence
#> 1 ihminen 0.048
#> 2 asia 0.024
#> 3 elintaso 0.023
#> 4 köyhä 0.021
#> 5 paljon 0.020
#> 6 huono 0.019
#> 7 köyhyys 0.016
#> 8 tarvita 0.015
#> 9 kehitys 0.014
#> 10 maa 0.014
fst_get_top_ngrams2(conllu_dev_q11_1_nltk, number = 10, ngrams = 1)
#> words occurrence
#> 1 ihminen 0.048
#> 2 asia 0.024
#> 3 elintaso 0.023
#> 4 köyhä 0.021
#> 5 paljon 0.020
#> 6 huono 0.019
#> 7 köyhyys 0.016
#> 8 tarvita 0.015
#> 9 kehitys 0.014
#> 10 maa 0.014