Skip to contents

Find top and unique top n-grams for different groups of participants. Data is split based on different values in the `field` column of formatted data. Results will be shown within the plots pane.

Usage

fst_ngrams_compare(
  data,
  field,
  number = 10,
  ngrams = 1,
  norm = NULL,
  pos_filter = NULL,
  strict = TRUE,
  use_svydesign_weights = FALSE,
  use_svydesign_field = FALSE,
  id = "",
  svydesign = NULL,
  use_column_weights = FALSE,
  exclude_nulls = FALSE,
  rename_nulls = "null_data",
  unique_colour = "indianred",
  title_size = 20,
  subtitle_size = 15
)

Arguments

data

A dataframe of text in CoNLL-U format with additional `field` column for splitting data.

field

Column in `data` used for splitting groups

number

The number of n-grams to return, default is `10`.

ngrams

The type of n-grams to return, default is `1`.

norm

The method for normalising the data. Valid settings are `"number_words"` (the number of words in the responses), `"number_resp"` (the number of responses), or `NULL` (raw count returned, default, also used when weights are applied).

pos_filter

List of UPOS tags for inclusion, default is `NULL` which means all word types included.

strict

Whether to strictly cut-off at `number` (ties are alphabetically ordered), default is `TRUE`.

use_svydesign_weights

Option to weight words in the wordcloud using weights from a svydesign object containing the raw data, default is `FALSE`

use_svydesign_field

Option to get `field` for splitting the data from the svydesign object, default is `FALSE`

id

ID column from raw data, required if `use_svydesign_weights = TRUE` and must match the `docid` in formatted `data`.

svydesign

A svydesign object which contains the raw data and weights.

use_column_weights

Option to weight words in the wordcloud using weights from formatted data which includes addition `weight` column, default is `FALSE`

exclude_nulls

Whether to include NULLs in `field` column, default is `FALSE`

rename_nulls

What to fill NULL values with if `exclude_nulls = FALSE`.

unique_colour

Colour to display unique words, default is `"indianred"`.

title_size

size to display plot title

subtitle_size

size to display title of individual top ngrams plot

Value

Plots of top n-grams in the plots pane with unique n-grams highlighted.

Examples

c <- fst_child
g <- 'gender'
fst_ngrams_compare(c, g, ngrams = 4, number = 10, norm = "number_resp")

fst_ngrams_compare(c, g, ngrams = 2, number = 10, norm = NULL)

s <- survey::svydesign(id=~1, weights= ~paino, data = child)
c2 <- fst_child_2
fst_ngrams_compare(c2, g, 10, 3, NULL, NULL, TRUE, TRUE, TRUE, 'fsd_id', s)

fst_ngrams_compare(c, g, 10, 2, use_column_weights = TRUE, strict = TRUE)