Skip to contents

Creates a dataframe in CoNLL-U format from a dataframe containing Finnish text from using the [udpipe] package and a Finnish language model plus any additional columns that are included such as `weights` or columns added through `add_cols`. Stopwords and punctuation are optionally removed if the the `stopword_list` argument is not "none".

Usage

fst_prepare(
  data,
  question,
  id,
  model = "ftb",
  stopword_list = "nltk",
  language = "fi",
  weights = NULL,
  add_cols = NULL,
  manual = FALSE,
  manual_list = ""
)

Arguments

data

A dataframe of survey responses which contains an open-ended question.

question

The column in the dataframe which contains the open-ended question.

id

The column in the dataframe which contains the ids for the responses.

model

A language model available for [udpipe]. `"ftb"` (default) or `"tdt"` are recognised as shorthand for "finnish-ftb" and "finnish-tdt". The full list is available in the [udpipe] documentation.

stopword_list

A valid stopword list, default is `"nltk"`, `"manual"` can be used to indicate that a manual list will be provided, or `"none"` if you don't want to remove stopwords known as 'source' in `stopwords::stopwords`

language

two-letter ISO code for the language for the stopword list

weights

Optional, the column of the dataframe which contains the respective weights for each response.

add_cols

Optional, a column (or columns) from the dataframe which contain other information you'd like to retain (for instance, dimension columnns for splitting the data for comparison plots).

manual

An optional boolean to indicate that a manual list will be provided, `stopword_list = "manual"` can also or instead be used.

manual_list

A manual list of stopwords.

Value

A dataframe of Finnish text in CoNLL-U format.

Details

`fst_prepare_conllu()` produces a dataframe containing Finnish survey text responses in CoNLL-U format with stopwords optionally removed.

Examples

if (FALSE) { # \dontrun{
i <- "fsd_id"
cb <- child
dev <- dev_coop
fst_prepare(data = cb, question = "q7", id = 'fsd_id', weights = 'paino')
fst_prepare(data = dev, question = "q11_2", id = i, add_cols = c('gender'))
fst_prepare(data = dev, question = "q11_3", id = i, add_cols = 'gender')
fst_prepare(data = child, question = "q7", id = i, model = 'swedish-lines')
unlink("finnish-ftb-ud-2.5-191206.udpipe")
unlink("finnish-tdt-ud-2.5-191206.udpipe")
unlink("swedish-lines-ud-2.5-191206.udpipe")
} # }