Skip to contents

Creates a dataframe in CoNLL-U format from a dataframe containing Finnish text from using the [udpipe] package and a Finnish language model plus any additional columns that are included such as `weights` or columns added through `add_cols`.

Usage

fst_format(data, question, id, model = "ftb", weights = NULL, add_cols = NULL)

Arguments

data

A dataframe of survey responses which contains an open-ended question.

question

The column in the dataframe which contains the open-ended question.

id

The column in the dataframe which contains the ids for the responses.

model

A language model available for [udpipe]. `"ftb"` (default) or `"tdt"` are recognised as shorthand for "finnish-ftb" and "finnish-tdt". The full list is available in the [udpipe] documentation.

weights

Optional, the column of the dataframe which contains the respective weights for each response.

add_cols

Optional, a column (or columns) from the dataframe which contain other information you'd like to retain (for instance, covariate columnns for splitting the data for comparison plots).

Value

Dataframe of annotated text in CoNLL-U format plus any additional columns.

Examples

if (FALSE) { # \dontrun{
i <- "fsd_id"
fst_format(data = child, question = "q7", id = i)
fst_format(data = child, question = "q7", id = i, model = "tdt")
fst_format(data = child, question = "q7", id = i, weights="paino")
cols <- c("gender", "major_region", "daycare_before_school")
fst_format(child, question = "q7", id = i, add_cols = cols)
fst_format(child, question = "q7", id = i, add_cols = "gender, major_region")
fst_format(child, question = 'q7', id = i, model = 'swedish-talbanken')
unlink("finnish-ftb-ud-2.5-191206.udpipe")
unlink("finnish-tdt-ud-2.5-191206.udpipe")
unlink("swedish-talkbanken-ud-2.5-191206.udpipe")
} # }