Child Barometer 2016 Bullying response data in CoNLL-U format with NLTK stopwords removed and background variables
Source:R/data.R
fst_child.Rd
This data contains the responses to q7 "Kertoisitko, mitä sinun mielestäsi kiusaaminen on? (Avokysymys)" in the FSD3134 Lapsibarometri 2016 dataset in CoNLL-U format with NLTK stopwords and punctuation removed plus weights and background variables.
Format
## `fst_child` A dataframe with 1580 rows and 18 columns:
- doc_id
the identifier of the document
- paragraph_id
the identifier of the paragraph
- sentence_id
the identifier of the sentence
- sentence
the text of the sentence for which this token is part of
- token_id
Word index, integer starting at 1 for each new sentence; may be a range for multi-word tokens; may be a decimal number for empty nodes.
- token
Word form or punctuation symbol.
- lemma
Lemma or stem of word form.
- upos
Universal part-of-speech tag.
- xpos
Language-specific part-of-speech tag; underscore if not available.
- feats
List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available.
- head_token_id
Head of the current word, which is either a value of token_id or zero (0).
- dep_rel
Universal dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
- deps
Enhanced dependency graph in the form of a list of head-deprel pairs.
- misc
Any other annotation.
- weight
Weight
- gender
Gender
- major_region
Major region
- daycare_before_school
Daycare before pre-school