Skip to contents

Creates a summary table for the input CoNLL-U data which counts the number of words of each part-of-speech tag within the data.

Usage

fst_pos(data)

Arguments

data

A dataframe of text in CoNLL-U format, with optional additional columns.

Value

A dataframe with a count and proportion of each UPOS tag in the data and the full name of the tag.

Examples

fst_pos(fst_child)
#>     UPOS                  UPOS_Name Count Proportion
#> 1    ADJ                  adjective   156      0.099
#> 2    ADP                 adposition     5      0.003
#> 3    ADV                     adverb    98      0.062
#> 4    AUX                  auxiliary    36      0.023
#> 5  CCONJ   coordinating conjunction     1      0.001
#> 6    DET                 determiner    72      0.046
#> 7   INTJ               interjection    16      0.010
#> 8   NOUN                       noun   455      0.288
#> 9    NUM                    numeral     2      0.001
#> 10  PART                   particle    38      0.024
#> 11  PRON                    pronoun   148      0.094
#> 12 PROPN                proper noun     6      0.004
#> 13 PUNCT                punctuation     0      0.000
#> 14 SCONJ  subordinating conjunction     0      0.000
#> 15   SYM                     symbol     0      0.000
#> 16  VERB                       verb   545      0.345
#> 17     X                      other     2      0.001
fst_pos(fst_dev_coop)
#>     UPOS                  UPOS_Name Count Proportion
#> 1    ADJ                  adjective   389      0.093
#> 2    ADP                 adposition    24      0.006
#> 3    ADV                     adverb    64      0.015
#> 4    AUX                  auxiliary     3      0.001
#> 5  CCONJ   coordinating conjunction     3      0.001
#> 6    DET                 determiner    28      0.007
#> 7   INTJ               interjection     2      0.000
#> 8   NOUN                       noun  3311      0.790
#> 9    NUM                    numeral     5      0.001
#> 10  PART                   particle    29      0.007
#> 11  PRON                    pronoun    12      0.003
#> 12 PROPN                proper noun    31      0.007
#> 13 PUNCT                punctuation     0      0.000
#> 14 SCONJ  subordinating conjunction     0      0.000
#> 15   SYM                     symbol     1      0.000
#> 16  VERB                       verb   278      0.066
#> 17     X                      other    12      0.003