Skip to contents

Creates a dataframe in CoNLL-U format from a `svydesign` object including Finnish text using the [udpipe] package and a Finnish language model plus weights if these are included in the `svydesign` object and any columns added through `add_cols`.Stopwords and punctuation are optionally removed if the the `stopword_list` argument is not "none".

Usage

fst_prepare_svydesign(
  svydesign,
  question,
  id,
  model = "ftb",
  stopword_list = "nltk",
  language = "fi",
  use_weights = TRUE,
  add_cols = NULL,
  manual = FALSE,
  manual_list = ""
)

Arguments

svydesign

A `svydesign` object which contains an open-ended question.

question

The column in the dataframe which contains the open-ended question.

id

The column in the dataframe which contains the ids for the responses.

model

A language model available for [udpipe], such as `"ftb"` (default) or `"tdt"` which are available for Finnish.

stopword_list

A valid Finnish stopword list, default is `"nltk"`, or `"none"`.

language

two-letter ISO code for the language for the stopword list

use_weights

Optional, whether to use weights within the `svydesign`

add_cols

Optional, a column (or columns) from the dataframe which contain other information you'd like to retain (for instance, dimension columnns for splitting the data for comparison plots).

manual

An optional boolean to indicate that a manual list will be provided, `stopword_list = "manual"` can also or instead be used.

manual_list

A manual list of stopwords.

Value

A dataframe of Finnish text in CoNLL-U format.

Details

`fst_prepare_svydesign()` produces a dataframe containing Finnish survey text responses in CoNLL-U format with stopwords optionally removed.

Examples

if (FALSE) { # \dontrun{
i <- "fsd_id"
svy_child <- survey::svydesign(id=~1, weights= ~paino, data = child)
fst_prepare_svydesign(svy_child, question = "q7", id = i, use_weights = TRUE)

svy_d <- survey::svydesign(id = ~1, weights = ~paino, data =dev_coop)
fst_prepare_svydesign(svy_d, question = "q11_2", id = i, add_cols = 'gender')

fst_prepare_svydesign(svy_d, 'q11_2', i, 'finnish-ftb', 'nltk', 'fi')
unlink("finnish-ftb-ud-2.5-191206.udpipe")
unlink("finnish-tdt-ud-2.5-191206.udpipe")
} # }