Child Barometer 2016 Bullying response data in CoNLL-U format with NLTK stopwords removed and background variables

This data contains the responses to q7 "Kertoisitko, mitä sinun mielestäsi kiusaaminen on? (Avokysymys)" in the FSD3134 Lapsibarometri 2016 dataset in CoNLL-U format with NLTK stopwords and punctuation removed plus weights and background variables.

Usage

fst_child

Format

## `fst_child` A dataframe with 1580 rows and 18 columns:

doc_id: the identifier of the document
paragraph_id: the identifier of the paragraph
sentence_id: the identifier of the sentence
sentence: the text of the sentence for which this token is part of
token_id: Word index, integer starting at 1 for each new sentence; may be a range for multi-word tokens; may be a decimal number for empty nodes.
token: Word form or punctuation symbol.
lemma: Lemma or stem of word form.
upos: Universal part-of-speech tag.
xpos: Language-specific part-of-speech tag; underscore if not available.
feats: List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available.
head_token_id: Head of the current word, which is either a value of token_id or zero (0).
dep_rel: Universal dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
deps: Enhanced dependency graph in the form of a list of head-deprel pairs.
misc: Any other annotation.
weight: Weight
gender: Gender
major_region: Major region
daycare_before_school: Daycare before pre-school

Source

<https://urn.fi/urn:nbn:fi:fsd:T-FSD3134>