Skip to contents

This function takes a string of terms (separated by commas) or a single term and, using `fst_cn_search()` find words connected to these searched terms. Then, a dataframe is returned of 'edges' between two words which are connected together in an frequently-occurring n-gram containing a concept term.

Usage

fst_cn_edges(
  data,
  concepts,
  threshold = NULL,
  norm = "number_words",
  pos_filter = NULL
)

Arguments

data

A dataframe of text in CoNLL-U format, with optional additional columns.

concepts

List of terms to search for, separated by commas.

threshold

A minimum number of occurrences threshold for 'edge' between searched term and other word, default is `NULL`. Note, the threshold is applied before normalisation.

norm

The method for normalising the data. Valid settings are `"number_words"` (the number of words in the responses), `"number_resp"` (the number of responses), or `NULL` (raw count returned, default, also used when weights are applied).

pos_filter

List of UPOS tags for inclusion, default is `NULL` to include all UPOS tags.

Value

Dataframe of co-occurrences between two connected words.

Examples

con <- "kiusata, lyöminen"
fst_cn_edges(fst_child, con, pos_filter = c("NOUN", "VERB", "ADJ", "ADV"))
#> # A tibble: 2 × 3
#>   from      to         co_occurrence
#>   <chr>     <chr>              <dbl>
#> 1 lyöminen  potkiminen       0.00696
#> 2 töniminen lyöminen         0.00127
fst_cn_edges(fst_child, con, pos_filter = 'VERB, NOUN')
#> # A tibble: 3 × 3
#>   from      to         co_occurrence
#>   <chr>     <chr>              <dbl>
#> 1 lyöminen  potkiminen       0.00886
#> 2 lyöminen  sanoa            0.00127
#> 3 töniminen lyöminen         0.00127
fst_cn_edges(fst_child, "lyöminen", threshold = 2, norm = "number_resp")
#> # A tibble: 2 × 3
#>   from      to         co_occurrence
#>   <chr>     <chr>              <dbl>
#> 1 lyöminen  potkiminen       0.0145 
#> 2 töniminen lyöminen         0.00484