Skip to contents

Creates a table of the most frequently-occurring ngrams within the data. Equivalent to `fst_get_top_ngrams()` but does not print message.

Usage

fst_get_top_ngrams2(
  data,
  number = 10,
  ngrams = 1,
  norm = "number_words",
  pos_filter = NULL,
  strict = TRUE
)

Arguments

data

A dataframe of text in CoNLL-U format.

number

The number of n-grams to return, default is `10`.

ngrams

The type of n-grams to return, default is `1`.

norm

The method for normalising the data. Valid settings are `'number_words'` (the number of words in the responses, default), `'number_resp'` (the number of responses), or `NULL` (raw count returned).

pos_filter

List of UPOS tags for inclusion, default is `NULL` which means all word types included.

strict

Whether to strictly cut-off at `number` (ties are alphabetically ordered), default is `TRUE`.

Value

A table of the most frequently occurring n-grams in the data.

Examples

fst_get_top_ngrams2(conllu_dev_q11_1_nltk)
#>       words occurrence
#> 1   ihminen      0.048
#> 2      asia      0.024
#> 3  elintaso      0.023
#> 4     köyhä      0.021
#> 5    paljon      0.020
#> 6     huono      0.019
#> 7   köyhyys      0.016
#> 8   tarvita      0.015
#> 9   kehitys      0.014
#> 10      maa      0.014
fst_get_top_ngrams2(conllu_dev_q11_1_nltk, number = 10, ngrams = 1)
#>       words occurrence
#> 1   ihminen      0.048
#> 2      asia      0.024
#> 3  elintaso      0.023
#> 4     köyhä      0.021
#> 5    paljon      0.020
#> 6     huono      0.019
#> 7   köyhyys      0.016
#> 8   tarvita      0.015
#> 9   kehitys      0.014
#> 10      maa      0.014