Annotate open-ended survey responses in Finnish into CoNLL-U format
Source:R/01_prepare.R
fst_format.Rd
Creates a dataframe in CoNLL-U format from a dataframe containing Finnish text from using the [udpipe] package and a Finnish language model plus any additional columns that are included such as `weights` or columns added through `add_cols`.
Arguments
- data
A dataframe of survey responses which contains an open-ended question.
- question
The column in the dataframe which contains the open-ended question.
- id
The column in the dataframe which contains the ids for the responses.
- model
A language model available for [udpipe]. `"ftb"` (default) or `"tdt"` are recognised as shorthand for "finnish-ftb" and "finnish-tdt". The full list is available in the [udpipe] documentation.
- weights
Optional, the column of the dataframe which contains the respective weights for each response.
- add_cols
Optional, a column (or columns) from the dataframe which contain other information you'd like to retain (for instance, covariate columnns for splitting the data for comparison plots).
Examples
if (FALSE) { # \dontrun{
i <- "fsd_id"
fst_format(data = child, question = "q7", id = i)
fst_format(data = child, question = "q7", id = i, model = "tdt")
fst_format(data = child, question = "q7", id = i, weights="paino")
cols <- c("gender", "major_region", "daycare_before_school")
fst_format(child, question = "q7", id = i, add_cols = cols)
fst_format(child, question = "q7", id = i, add_cols = "gender, major_region")
fst_format(child, question = 'q7', id = i, model = 'swedish-talbanken')
unlink("finnish-ftb-ud-2.5-191206.udpipe")
unlink("finnish-tdt-ud-2.5-191206.udpipe")
unlink("swedish-talkbanken-ud-2.5-191206.udpipe")
} # }