Skip to contents

Creates a summary table for the input CoNLL-U data which counts the number of words of each part-of-speech tag within the data.





A dataframe of text in CoNLL-U format, with optional additional columns.


A dataframe with a count and proportion of each UPOS tag in the data and the full name of the tag.


#>     UPOS                  UPOS_Name Count Proportion
#> 1    ADJ                  adjective   156      0.099
#> 2    ADP                 adposition     5      0.003
#> 3    ADV                     adverb    98      0.062
#> 4    AUX                  auxiliary    36      0.023
#> 5  CCONJ   coordinating conjunction     1      0.001
#> 6    DET                 determiner    72      0.046
#> 7   INTJ               interjection    16      0.010
#> 8   NOUN                       noun   455      0.288
#> 9    NUM                    numeral     2      0.001
#> 10  PART                   particle    38      0.024
#> 11  PRON                    pronoun   148      0.094
#> 12 PROPN                proper noun     6      0.004
#> 13 PUNCT                punctuation    NA         NA
#> 14 SCONJ  subordinating conjunction    NA         NA
#> 15   SYM                     symbol    NA         NA
#> 16  VERB                       verb   545      0.345
#> 17     X                      other     2      0.001
#>     UPOS                  UPOS_Name Count Proportion
#> 1    ADJ                  adjective   389      0.093
#> 2    ADP                 adposition    24      0.006
#> 3    ADV                     adverb    64      0.015
#> 4    AUX                  auxiliary     3      0.001
#> 5  CCONJ   coordinating conjunction     3      0.001
#> 6    DET                 determiner    28      0.007
#> 7   INTJ               interjection     2      0.000
#> 8   NOUN                       noun  3311      0.790
#> 9    NUM                    numeral     5      0.001
#> 10  PART                   particle    29      0.007
#> 11  PRON                    pronoun    12      0.003
#> 12 PROPN                proper noun    31      0.007
#> 13 PUNCT                punctuation    NA         NA
#> 14 SCONJ  subordinating conjunction    NA         NA
#> 15   SYM                     symbol     1      0.000
#> 16  VERB                       verb   278      0.066
#> 17     X                      other    12      0.003