Skip to contents

How to Use finnsurveytext in another language!

Despite the package’s name, finnsurveytext can be used to analyse surveys in LOTS of different languages. This vignette aims to explain how to use finnsurveytext in another language with as little additional effort as possible.

The reason finnsurveytext can be used with other languages is that the packages it employs to process the raw survey data work in multiple languages! So we have the developers of the udpipe and stopwords packages to thank!

There is a survey in English provided with the package called english_sample_survey which we will use to demonstrate the use of the package in a language other than Finnish.

knitr::kable(head(english_sample_survey, 5))
id label label_coder1 label_coder2 text
1 proactive proactive proactive Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks.
2 proactive proactive proactive I think he should have the receptionist talk to the doctor to make sure that he gets in there at the appropriate time; find out if it actually can be two weeks or if two weeks later would be OK.
3 proactive proactive proactive Joe should talk to the doctor and make arrangements to come in in two weeks. He was pretty specific about that.
4 proactive proactive proactive I think Joe should insist on an appointment in two weeks.
5 proactive proactive proactive Joe should discuss this with the receptionist as to what the doctor told him to do. And insist on seeing him at two weeks.

1. Essential: Your language has a language model available for udpipe

The udpipe package is available from the CRAN. The relevant udpipe function we use is udpipe::udpipe_download_model. You can see the list of available models in the udpipe manual.

At the time of writing this vignette, these were:

afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, belarusian-hse, bulgarian-btb, buryat-bdt, catalan-ancora, chinese-gsd, chinese-gsdsimp, coptic-scriptorium, croatian-set, czech-cac, czech-cltt, czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, english-partut, estonian-edt, finnish-ftb, finnish-tdt, french-gsd, french-partut, french-sequoia, french-spoken, galician-ctg, galician-treegal, german-gsd, german-hdt, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged, indonesian-gsd, irish-idt, italian-isdt, italian-partut, italian-postwita, italian-twittiro, japanese-gsd, kazakh-ktb, korean-gsd, korean-kaist, kurmanji-mg, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-hse, maltese-mudt, marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, persian-seraji, polish-lfg, polish-sz, portuguese-bosque, portuguese-br, portuguese-gsd, romanian-nonstandard, romanian-rrt, russian-gsd, russian-syntagrus, russian-taiga, sanskrit-ufal, scottish_gaelic-arcosg, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, spanish-gsd, swedish-lines, swedish-talbanken, tamil-ttb, telugu-mtg, turkish-imst, ukrainian-iu, upper_sorbian-ufal, urdu-udtb, uyghur-udt, vietnamese-vtb

Alternatively, you can find the list of available models by running fst_print_available_models(). By providing a search term, the list will be filtered for models containing this language:

fst_print_available_models()
#>   [1] "afrikaans-afribooms"        "ancient_greek-perseus"     
#>   [3] "ancient_greek-proiel"       "arabic-padt"               
#>   [5] "armenian-armtdp"            "basque-bdt"                
#>   [7] "belarusian-hse"             "bulgarian-btb"             
#>   [9] "buryat-bdt"                 "catalan-ancora"            
#>  [11] "chinese-gsd"                "chinese-gsdsimp"           
#>  [13] "classical_chinese-kyoto"    "coptic-scriptorium"        
#>  [15] "croatian-set"               "czech-cac"                 
#>  [17] "czech-cltt"                 "czech-fictree"             
#>  [19] "czech-pdt"                  "danish-ddt"                
#>  [21] "dutch-alpino"               "dutch-lassysmall"          
#>  [23] "english-ewt"                "english-gum"               
#>  [25] "english-lines"              "english-partut"            
#>  [27] "estonian-edt"               "estonian-ewt"              
#>  [29] "finnish-ftb"                "finnish-tdt"               
#>  [31] "french-gsd"                 "french-partut"             
#>  [33] "french-sequoia"             "french-spoken"             
#>  [35] "galician-ctg"               "galician-treegal"          
#>  [37] "german-gsd"                 "german-hdt"                
#>  [39] "gothic-proiel"              "greek-gdt"                 
#>  [41] "hebrew-htb"                 "hindi-hdtb"                
#>  [43] "hungarian-szeged"           "indonesian-gsd"            
#>  [45] "irish-idt"                  "italian-isdt"              
#>  [47] "italian-partut"             "italian-postwita"          
#>  [49] "italian-twittiro"           "italian-vit"               
#>  [51] "japanese-gsd"               "kazakh-ktb"                
#>  [53] "korean-gsd"                 "korean-kaist"              
#>  [55] "kurmanji-mg"                "latin-ittb"                
#>  [57] "latin-perseus"              "latin-proiel"              
#>  [59] "latvian-lvtb"               "lithuanian-alksnis"        
#>  [61] "lithuanian-hse"             "maltese-mudt"              
#>  [63] "marathi-ufal"               "north_sami-giella"         
#>  [65] "norwegian-bokmaal"          "norwegian-nynorsk"         
#>  [67] "norwegian-nynorsklia"       "old_church_slavonic-proiel"
#>  [69] "old_french-srcmf"           "old_russian-torot"         
#>  [71] "persian-seraji"             "polish-lfg"                
#>  [73] "polish-pdb"                 "polish-sz"                 
#>  [75] "portuguese-bosque"          "portuguese-br"             
#>  [77] "portuguese-gsd"             "romanian-nonstandard"      
#>  [79] "romanian-rrt"               "russian-gsd"               
#>  [81] "russian-syntagrus"          "russian-taiga"             
#>  [83] "sanskrit-ufal"              "scottish_gaelic-arcosg"    
#>  [85] "serbian-set"                "slovak-snk"                
#>  [87] "slovenian-ssj"              "slovenian-sst"             
#>  [89] "spanish-ancora"             "spanish-gsd"               
#>  [91] "swedish-lines"              "swedish-talbanken"         
#>  [93] "tamil-ttb"                  "telugu-mtg"                
#>  [95] "turkish-imst"               "ukrainian-iu"              
#>  [97] "upper_sorbian-ufal"         "urdu-udtb"                 
#>  [99] "uyghur-udt"                 "vietnamese-vtb"            
#> [101] "wolof-wtb"

fst_print_available_models(search = 'estonian')
#> [1] "estonian-edt" "estonian-ewt"

fst_print_available_models('sami')
#> [1] "north_sami-giella"

How to use:

The relevant model, eg “swedish-talbanken”, should be used for the model input in fst_format() or fst_prepare()

Demonstration:

We find an English model and format our English data below:

fst_print_available_models("english")
#> [1] "english-ewt"    "english-gum"    "english-lines"  "english-partut"

en_df <- fst_format(data = english_sample_survey,
           question = 'text', 
           id = 'id', 
           model = 'english-ewt'
           )
#> Downloading udpipe model from https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.5/master/inst/udpipe-ud-2.5-191206/english-ewt-ud-2.5-191206.udpipe to /home/runner/work/finnsurveytext/finnsurveytext/vignettes/web_only/english-ewt-ud-2.5-191206.udpipe
#>  - This model has been trained on version 2.5 of data from https://universaldependencies.org
#>  - The model is distributed under the CC-BY-SA-NC license: https://creativecommons.org/licenses/by-nc-sa/4.0
#>  - Visit https://github.com/jwijffels/udpipe.models.ud.2.5 for model license details.
#>  - For a list of all models and their licenses (most models you can download with this package have either a CC-BY-SA or a CC-BY-SA-NC license) read the documentation at ?udpipe_download_model. For building your own models: visit the documentation by typing vignette('udpipe-train', package = 'udpipe')
#> Downloading finished, model stored at '/home/runner/work/finnsurveytext/finnsurveytext/vignettes/web_only/english-ewt-ud-2.5-191206.udpipe'

knitr::kable(head(en_df, 5))
doc_id paragraph_id sentence_id sentence token_id token lemma upos xpos feats head_token_id dep_rel deps misc
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 1 joe joe PROPN NNP Number=Sing 3 nsubj NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 2 should should AUX MD VerbForm=Fin 3 aux NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 3 talk talk VERB VB VerbForm=Inf 0 root NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 4 to to ADP IN NA 6 case NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 5 the the DET DT Definite=Def|PronType=Art 6 det NA NA

The stopwords package is available from the CRAN. The relevant stopwords functions are stopwords::stopwords, stopwords::stopwords_getsources and stopwords::stopwords_getlanguages. We recommend you first identify the two-letter ISO code for the language you are using. You can see the list of available sources and languages in the stopwords manual or by running the ‘get sources’ and ‘get languages’ functions:

stopwords_getsources()
#> [1] "snowball"      "stopwords-iso" "misc"          "smart"        
#> [5] "marimo"        "ancient"       "nltk"          "perseus"
stopwords::stopwords_getlanguages(source = 'nltk')
#>  [1] "ar" "az" "da" "nl" "en" "fi" "fr" "de" "el" "hu" "id" "it" "kk" "ne" "no"
#> [16] "pt" "ro" "ru" "sl" "es" "sv" "tg" "tr"
stopwords('da', source = 'nltk')
#>  [1] "og"     "i"      "jeg"    "det"    "at"     "en"     "den"    "til"   
#>  [9] "er"     "som"    "på"     "de"     "med"    "han"    "af"     "for"   
#> [17] "ikke"   "der"    "var"    "mig"    "sig"    "men"    "et"     "har"   
#> [25] "om"     "vi"     "min"    "havde"  "ham"    "hun"    "nu"     "over"  
#> [33] "da"     "fra"    "du"     "ud"     "sin"    "dem"    "os"     "op"    
#> [41] "man"    "hans"   "hvor"   "eller"  "hvad"   "skal"   "selv"   "her"   
#> [49] "alle"   "vil"    "blev"   "kunne"  "ind"    "når"    "være"   "dog"   
#> [57] "noget"  "ville"  "jo"     "deres"  "efter"  "ned"    "skulle" "denne" 
#> [65] "end"    "dette"  "mit"    "også"   "under"  "have"   "dig"    "anden" 
#> [73] "hende"  "mine"   "alt"    "meget"  "sit"    "sine"   "vor"    "mod"   
#> [81] "disse"  "hvis"   "din"    "nogle"  "hos"    "blive"  "mange"  "ad"    
#> [89] "bliver" "hendes" "været"  "thi"    "jer"    "sådan"
stopwords('da') # The default source is 'snowball'
#>  [1] "og"     "i"      "jeg"    "det"    "at"     "en"     "den"    "til"   
#>  [9] "er"     "som"    "på"     "de"     "med"    "han"    "af"     "for"   
#> [17] "ikke"   "der"    "var"    "mig"    "sig"    "men"    "et"     "har"   
#> [25] "om"     "vi"     "min"    "havde"  "ham"    "hun"    "nu"     "over"  
#> [33] "da"     "fra"    "du"     "ud"     "sin"    "dem"    "os"     "op"    
#> [41] "man"    "hans"   "hvor"   "eller"  "hvad"   "skal"   "selv"   "her"   
#> [49] "alle"   "vil"    "blev"   "kunne"  "ind"    "når"    "være"   "dog"   
#> [57] "noget"  "ville"  "jo"     "deres"  "efter"  "ned"    "skulle" "denne" 
#> [65] "end"    "dette"  "mit"    "også"   "under"  "have"   "dig"    "anden" 
#> [73] "hende"  "mine"   "alt"    "meget"  "sit"    "sine"   "vor"    "mod"   
#> [81] "disse"  "hvis"   "din"    "nogle"  "hos"    "blive"  "mange"  "ad"    
#> [89] "bliver" "hendes" "været"  "thi"    "jer"    "sådan"

Alternatively, you can use our function fst_find_stopwords to simplify this process. This function provides a table of lists available through the stopwords package for a language and provides the contents for comparison (if you have multiple options!). To run this, you need the two-letter ISO language code:

knitr::kable(fst_find_stopwords(language = 'lv'))
Name Stopwords Length
stopwords-iso aiz , ap , apakš , apakšpus , ar , arī , augšpus , bet , bez , bija , biji , biju , bijām , bijāt , būs , būsi , būsiet , būsim , būt , būšu , caur , diemžēl , diezin , droši , dēļ , esam , esat , esi , esmu , gan , gar , iekam , iekams , iekām , iekāms , iekš , iekšpus , ik , ir , it , itin , iz , ja , jau , jeb , jebšu , jel , jo , jā , ka , kamēr , kaut , kolīdz , kopš , kā , kļuva , kļuvi , kļuvu , kļuvām , kļuvāt , kļūs , kļūsi , kļūsiet , kļūsim , kļūst , kļūstam , kļūstat , kļūsti , kļūstu , kļūt , kļūšu , labad , lai , lejpus , līdz , līdzko , ne , nebūt , nedz , nekā , nevis , nezin , no , nu , nē , otrpus , pa , par , pat , pie , pirms , pret , priekš , pār , pēc , starp , tad , tak , tapi , taps , tapsi , tapsiet , tapsim , tapt , tapāt , tapšu , taču , te , tiec , tiek , tiekam , tiekat , tieku , tik , tika , tikai , tiki , tikko , tiklab , tiklīdz , tiks , tiksiet , tiksim , tikt , tiku , tikvien , tikām , tikāt , tikšu , tomēr , topat , turpretim, turpretī , tā , tādēļ , tālab , tāpēc , un , uz , vai , var , varat , varēja , varēji , varēju , varējām , varējāt , varēs , varēsi , varēsiet , varēsim , varēt , varēšu , vien , virs , virspus , vis , viņpus , zem , ārpus , šaipus 161
fst_find_stopwords(language = 'no')
#> # A tibble: 3 × 3
#>   Name          Stopwords   Length   
#>   <chr>         <list>      <list>   
#> 1 nltk          <chr [172]> <int [1]>
#> 2 snowball      <chr [176]> <int [1]>
#> 3 stopwords-iso <chr [221]> <int [1]>

How to use:

The relevant language and stopword list (‘source’), eg “sv” and “nltk”, should be used for the language and stopword_list inputs respectively in fst_prepare() (or fst_rm_stop_punct() which is automatically called within fst_prepare()).

Demonstration:

We can find and compare English stopwords lists as below. Once we have chosen a stopwords list, we can run fst_prepare() to format the data and remove the stopwords:

knitr::kable(head(fst_find_stopwords(language = 'en'), 5))
Name Stopwords Length
marimo i , me , myself , we , ours , ourselves , you , yours , yourself , yourselves, he , him , himself , she , hers , herself , it , itself , they , them , theirs , themselves, this , that , these , those , my , our , your , his , her , its , their , what , which , who , whom , whose , when , where , why , how , i’m , you’re , he’s , she’s , it’s , we’re , they’re , i’ve , you’ve , we’ve , they’ve , i’d , you’d , he’d , she’d , we’d , they’d , i’ll , you’ll , he’ll , she’ll , we’ll , they’ll , am , is , are , was , were , be , been , being , have , has , had , having , do , does , did , doing , would , should , could , ought , will , isn’t , aren’t , wasn’t , weren’t , hasn’t , haven’t , hadn’t , doesn’t , don’t , didn’t , won’t , wouldn’t , shan’t , shouldn’t , can’t , cannot , couldn’t , mustn’t , let’s , that’s , who’s , what’s , here’s , there’s , when’s , where’s , why’s , how’s , say , says , said , tell , tells , told , report , reports , reported , a , an , the , and , but , if , or , because , so , while , nor , as , until , once , here , there , all , any , both , each , few , many , more , most , other , some , such , no , not , only , then , too , very , little , less , of , at , by , for , with , about , against , between , into , through , during , before , after , above , below , to , from , up , down , in , out , on , off , over , under , again , further , than , own , same , minute , hour , month , year , century , am , pm , january , february , march , april , may , june , july , august , september , october , november , december , jan , feb , mar , apr , may , jun , jul , aug , sep , sept , oct , nov , dec , sunday , monday , tuesday , wednesday , thursday , friday , saturday , one , two , three , four , five , six , seven , eight , nine , ten 237
nltk i , me , my , myself , we , our , ours , ourselves , you , you’re , you’ve , you’ll , you’d , your , yours , yourself , yourselves, he , him , his , himself , she , she’s , her , hers , herself , it , it’s , its , itself , they , them , their , theirs , themselves, what , which , who , whom , this , that , that’ll , these , those , am , is , are , was , were , be , been , being , have , has , had , having , do , does , did , doing , a , an , the , and , but , if , or , because , as , until , while , of , at , by , for , with , about , against , between , into , through , during , before , after , above , below , to , from , up , down , in , out , on , off , over , under , again , further , then , once , here , there , when , where , why , how , all , any , both , each , few , more , most , other , some , such , no , nor , not , only , own , same , so , than , too , very , s , t , can , will , just , don , don’t , should , should’ve , now , d , ll , m , o , re , ve , y , ain , aren , aren’t , couldn , couldn’t , didn , didn’t , doesn , doesn’t , hadn , hadn’t , hasn , hasn’t , haven , haven’t , isn , isn’t , ma , mightn , mightn’t , mustn , mustn’t , needn , needn’t , shan , shan’t , shouldn , shouldn’t , wasn , wasn’t , weren , weren’t , won , won’t , wouldn , wouldn’t 179
smart a , a’s , able , about , above , according , accordingly , across , actually , after , afterwards , again , against , ain’t , all , allow , allows , almost , alone , along , already , also , although , always , am , among , amongst , an , and , another , any , anybody , anyhow , anyone , anything , anyway , anyways , anywhere , apart , appear , appreciate , appropriate , are , aren’t , around , as , aside , ask , asking , associated , at , available , away , awfully , b , be , became , because , become , becomes , becoming , been , before , beforehand , behind , being , believe , below , beside , besides , best , better , between , beyond , both , brief , but , by , c , c’mon , c’s , came , can , can’t , cannot , cant , cause , causes , certain , certainly , changes , clearly , co , com , come , comes , concerning , consequently , consider , considering , contain , containing , contains , corresponding, could , couldn’t , course , currently , d , definitely , described , despite , did , didn’t , different , do , does , doesn’t , doing , don’t , done , down , downwards , during , e , each , edu , eg , eight , either , else , elsewhere , enough , entirely , especially , et , etc , even , ever , every , everybody , everyone , everything , everywhere , ex , exactly , example , except , f , far , few , fifth , first , five , followed , following , follows , for , former , formerly , forth , four , from , further , furthermore , g , get , gets , getting , given , gives , go , goes , going , gone , got , gotten , greetings , h , had , hadn’t , happens , hardly , has , hasn’t , have , haven’t , having , he , he’s , hello , help , hence , her , here , here’s , hereafter , hereby , herein , hereupon , hers , herself , hi , him , himself , his , hither , hopefully , how , howbeit , however , i , i’d , i’ll , i’m , i’ve , ie , if , ignored , immediate , in , inasmuch , inc , indeed , indicate , indicated , indicates , inner , insofar , instead , into , inward , is , isn’t , it , it’d , it’ll , it’s , its , itself , j , just , k , keep , keeps , kept , know , knows , known , l , last , lately , later , latter , latterly , least , less , lest , let , let’s , like , liked , likely , little , look , looking , looks , ltd , m , mainly , many , may , maybe , me , mean , meanwhile , merely , might , more , moreover , most , mostly , much , must , my , myself , n , name , namely , nd , near , nearly , necessary , need , needs , neither , never , nevertheless , new , next , nine , no , nobody , non , none , noone , nor , normally , not , nothing , novel , now , nowhere , o , obviously , of , off , often , oh , ok , okay , old , on , once , one , ones , only , onto , or , other , others , otherwise , ought , our , ours , ourselves , out , outside , over , overall , own , p , particular , particularly , per , perhaps , placed , please , plus , possible , presumably , probably , provides , q , que , quite , qv , r , rather , rd , re , really , reasonably , regarding , regardless , regards , relatively , respectively , right , s , said , same , saw , say , saying , says , second , secondly , see , seeing , seem , seemed , seeming , seems , seen , self , selves , sensible , sent , serious , seriously , seven , several , shall , she , should , shouldn’t , since , six , so , some , somebody , somehow , someone , something , sometime , sometimes , somewhat , somewhere , soon , sorry , specified , specify , specifying , still , sub , such , sup , sure , t , t’s , take , taken , tell , tends , th , than , thank , thanks , thanx , that , that’s , thats , the , their , theirs , them , themselves , then , thence , there , there’s , thereafter , thereby , therefore , therein , theres , thereupon , these , they , they’d , they’ll , they’re , they’ve , think , third , this , thorough , thoroughly , those , though , three , through , throughout , thru , thus , to , together , too , took , toward , towards , tried , tries , truly , try , trying , twice , two , u , un , under , unfortunately, unless , unlikely , until , unto , up , upon , us , use , used , useful , uses , using , usually , uucp , v , value , various , very , via , viz , vs , w , want , wants , was , wasn’t , way , we , we’d , we’ll , we’re , we’ve , welcome , well , went , were , weren’t , what , what’s , whatever , when , whence , whenever , where , where’s , whereafter , whereas , whereby , wherein , whereupon , wherever , whether , which , while , whither , who , who’s , whoever , whole , whom , whose , why , will , willing , wish , with , within , without , won’t , wonder , would , would , wouldn’t , x , y , yes , yet , you , you’d , you’ll , you’re , you’ve , your , yours , yourself , yourselves , z , zero 571
snowball i , me , my , myself , we , our , ours , ourselves , you , your , yours , yourself , yourselves, he , him , his , himself , she , her , hers , herself , it , its , itself , they , them , their , theirs , themselves, what , which , who , whom , this , that , these , those , am , is , are , was , were , be , been , being , have , has , had , having , do , does , did , doing , would , should , could , ought , i’m , you’re , he’s , she’s , it’s , we’re , they’re , i’ve , you’ve , we’ve , they’ve , i’d , you’d , he’d , she’d , we’d , they’d , i’ll , you’ll , he’ll , she’ll , we’ll , they’ll , isn’t , aren’t , wasn’t , weren’t , hasn’t , haven’t , hadn’t , doesn’t , don’t , didn’t , won’t , wouldn’t , shan’t , shouldn’t , can’t , cannot , couldn’t , mustn’t , let’s , that’s , who’s , what’s , here’s , there’s , when’s , where’s , why’s , how’s , a , an , the , and , but , if , or , because , as , until , while , of , at , by , for , with , about , against , between , into , through , during , before , after , above , below , to , from , up , down , in , out , on , off , over , under , again , further , then , once , here , there , when , where , why , how , all , any , both , each , few , more , most , other , some , such , no , nor , not , only , own , same , so , than , too , very , will 175
stopwords-iso ’ll , ’tis , ’twas , ’ve , 10 , 39 , a , a’s , able , ableabout , about , above , abroad , abst , accordance , according , accordingly , across , act , actually , ad , added , adj , adopted , ae , af , affected , affecting , affects , after , afterwards , ag , again , against , ago , ah , ahead , ai , ain’t , aint , al , all , allow , allows , almost , alone , along , alongside , already , also , although , always , am , amid , amidst , among , amongst , amoungst , amount , an , and , announce , another , any , anybody , anyhow , anymore , anyone , anything , anyway , anyways , anywhere , ao , apart , apparently , appear , appreciate , appropriate , approximately , aq , ar , are , area , areas , aren , aren’t , arent , arise , around , arpa , as , aside , ask , asked , asking , asks , associated , at , au , auth , available , aw , away , awfully , az , b , ba , back , backed , backing , backs , backward , backwards , bb , bd , be , became , because , become , becomes , becoming , been , before , beforehand , began , begin , beginning , beginnings , begins , behind , being , beings , believe , below , beside , besides , best , better , between , beyond , bf , bg , bh , bi , big , bill , billion , biol , bj , bm , bn , bo , both , bottom , br , brief , briefly , bs , bt , but , buy , bv , bw , by , bz , c , c’mon , c’s , ca , call , came , can , can’t , cannot , cant , caption , case , cases , cause , causes , cc , cd , certain , certainly , cf , cg , ch , changes , ci , ck , cl , clear , clearly , click , cm , cmon , cn , co , co. , com , come , comes , computer , con , concerning , consequently , consider , considering , contain , containing , contains , copy , corresponding , could , could’ve , couldn , couldn’t , couldnt , course , cr , cry , cs , cu , currently , cv , cx , cy , cz , d , dare , daren’t , darent , date , de , dear , definitely , describe , described , despite , detail , did , didn , didn’t , didnt , differ , different , differently , directly , dj , dk , dm , do , does , doesn , doesn’t , doesnt , doing , don , don’t , done , dont , doubtful , down , downed , downing , downs , downwards , due , during , dz , e , each , early , ec , ed , edu , ee , effect , eg , eh , eight , eighty , either , eleven , else , elsewhere , empty , end , ended , ending , ends , enough , entirely , er , es , especially , et , et-al , etc , even , evenly , ever , evermore , every , everybody , everyone , everything , everywhere , ex , exactly , example , except , f , face , faces , fact , facts , fairly , far , farther , felt , few , fewer , ff , fi , fifteen , fifth , fifty , fify , fill , find , finds , fire , first , five , fix , fj , fk , fm , fo , followed , following , follows , for , forever , former , formerly , forth , forty , forward , found , four , fr , free , from , front , full , fully , further , furthered , furthering , furthermore , furthers , fx , g , ga , gave , gb , gd , ge , general , generally , get , gets , getting , gf , gg , gh , gi , give , given , gives , giving , gl , gm , gmt , gn , go , goes , going , gone , good , goods , got , gotten , gov , gp , gq , gr , great , greater , greatest , greetings , group , grouped , grouping , groups , gs , gt , gu , gw , gy , h , had , hadn’t , hadnt , half , happens , hardly , has , hasn , hasn’t , hasnt , have , haven , haven’t , havent , having , he , he’d , he’ll , he’s , hed , hell , hello , help , hence , her , here , here’s , hereafter , hereby , herein , heres , hereupon , hers , herself , herse” , hes , hi , hid , high , higher , highest , him , himself , himse” , his , hither , hk , hm , hn , home , homepage , hopefully , how , how’d , how’ll , how’s , howbeit , however , hr , ht , htm , html , http , hu , hundred , i , i’d , i’ll , i’m , i’ve , i.e. , id , ie , if , ignored , ii , il , ill , im , immediate , immediately , importance , important , in , inasmuch , inc , inc. , indeed , index , indicate , indicated , indicates , information , inner , inside , insofar , instead , int , interest , interested , interesting , interests , into , invention , inward , io , iq , ir , is , isn , isn’t , isnt , it , it’d , it’ll , it’s , itd , itll , its , itself , itse” , ive , j , je , jm , jo , join , jp , just , k , ke , keep , keeps , kept , keys , kg , kh , ki , kind , km , kn , knew , know , known , knows , kp , kr , kw , ky , kz , l , la , large , largely , last , lately , later , latest , latter , latterly , lb , lc , least , length , less , lest , let , let’s , lets , li , like , liked , likely , likewise , line , little , lk , ll , long , longer , longest , look , looking , looks , low , lower , lr , ls , lt , ltd , lu , lv , ly , m , ma , made , mainly , make , makes , making , man , many , may , maybe , mayn’t , maynt , mc , md , me , mean , means , meantime , meanwhile , member , members , men , merely , mg , mh , microsoft , might , might’ve , mightn’t , mightnt , mil , mill , million , mine , minus , miss , mk , ml , mm , mn , mo , more , moreover , most , mostly , move , mp , mq , mr , mrs , ms , msie , mt , mu , much , mug , must , must’ve , mustn’t , mustnt , mv , mw , mx , my , myself , myse” , mz , n , na , name , namely , nay , nc , nd , ne , near , nearly , necessarily , necessary , need , needed , needing , needn’t , neednt , needs , neither , net , netscape , never , neverf , neverless , nevertheless , new , newer , newest , next , nf , ng , ni , nine , ninety , nl , no , no-one , nobody , non , none , nonetheless , noone , nor , normally , nos , not , noted , nothing , notwithstanding, novel , now , nowhere , np , nr , nu , null , number , numbers , nz , o , obtain , obtained , obviously , of , off , often , oh , ok , okay , old , older , oldest , om , omitted , on , once , one , one’s , ones , only , onto , open , opened , opening , opens , opposite , or , ord , order , ordered , ordering , orders , org , other , others , otherwise , ought , oughtn’t , oughtnt , our , ours , ourselves , out , outside , over , overall , owing , own , p , pa , page , pages , part , parted , particular , particularly , parting , parts , past , pe , per , perhaps , pf , pg , ph , pk , pl , place , placed , places , please , plus , pm , pmid , pn , point , pointed , pointing , points , poorly , possible , possibly , potentially , pp , pr , predominantly , present , presented , presenting , presents , presumably , previously , primarily , probably , problem , problems , promptly , proud , provided , provides , pt , put , puts , pw , py , q , qa , que , quickly , quite , qv , r , ran , rather , rd , re , readily , really , reasonably , recent , recently , ref , refs , regarding , regardless , regards , related , relatively , research , reserved , respectively , resulted , resulting , results , right , ring , ro , room , rooms , round , ru , run , rw , s , sa , said , same , saw , say , saying , says , sb , sc , sd , se , sec , second , secondly , seconds , section , see , seeing , seem , seemed , seeming , seems , seen , sees , self , selves , sensible , sent , serious , seriously , seven , seventy , several , sg , sh , shall , shan’t , shant , she , she’d , she’ll , she’s , shed , shell , shes , should , should’ve , shouldn , shouldn’t , shouldnt , show , showed , showing , shown , showns , shows , si , side , sides , significant , significantly , similar , similarly , since , sincere , site , six , sixty , sj , sk , sl , slightly , sm , small , smaller , smallest , sn , so , some , somebody , someday , somehow , someone , somethan , something , sometime , sometimes , somewhat , somewhere , soon , sorry , specifically , specified , specify , specifying , sr , st , state , states , still , stop , strongly , su , sub , substantially , successfully , such , sufficiently , suggest , sup , sure , sv , sy , system , sz , t , t’s , take , taken , taking , tc , td , tell , ten , tends , test , text , tf , tg , th , than , thank , thanks , thanx , that , that’ll , that’s , that’ve , thatll , thats , thatve , the , their , theirs , them , themselves , then , thence , there , there’d , there’ll , there’re , there’s , there’ve , thereafter , thereby , thered , therefore , therein , therell , thereof , therere , theres , thereto , thereupon , thereve , these , they , they’d , they’ll , they’re , they’ve , theyd , theyll , theyre , theyve , thick , thin , thing , things , think , thinks , third , thirty , this , thorough , thoroughly , those , thou , though , thoughh , thought , thoughts , thousand , three , throug , through , throughout , thru , thus , til , till , tip , tis , tj , tk , tm , tn , to , today , together , too , took , top , toward , towards , tp , tr , tried , tries , trillion , truly , try , trying , ts , tt , turn , turned , turning , turns , tv , tw , twas , twelve , twenty , twice , two , tz , u , ua , ug , uk , um , un , under , underneath , undoing , unfortunately , unless , unlike , unlikely , until , unto , up , upon , ups , upwards , us , use , used , useful , usefully , usefulness , uses , using , usually , uucp , uy , uz , v , va , value , various , vc , ve , versus , very , vg , vi , via , viz , vn , vol , vols , vs , vu , w , want , wanted , wanting , wants , was , wasn , wasn’t , wasnt , way , ways , we , we’d , we’ll , we’re , we’ve , web , webpage , website , wed , welcome , well , wells , went , were , weren , weren’t , werent , weve , wf , what , what’d , what’ll , what’s , what’ve , whatever , whatll , whats , whatve , when , when’d , when’ll , when’s , whence , whenever , where , where’d , where’ll , where’s , whereafter , whereas , whereby , wherein , wheres , whereupon , wherever , whether , which , whichever , while , whilst , whim , whither , who , who’d , who’ll , who’s , whod , whoever , whole , wholl , whom , whomever , whos , whose , why , why’d , why’ll , why’s , widely , width , will , willing , wish , with , within , without , won , won’t , wonder , wont , words , work , worked , working , works , world , would , would’ve , wouldn , wouldn’t , wouldnt , ws , www , x , y , ye , year , years , yes , yet , you , you’d , you’ll , you’re , you’ve , youd , youll , young , younger , youngest , your , youre , yours , yourself , yourselves , youve , yt , yu , z , za , zero , zm , zr 1298

en_df2 <- fst_prepare(data = english_sample_survey,
                      question = 'text',
                      id = 'id',
                      model = 'english-ewt',
                      stopword_list = 'smart', 
                      language = 'en')

knitr::kable(head(en_df2, 5))
doc_id paragraph_id sentence_id sentence token_id token lemma upos xpos feats head_token_id dep_rel deps misc
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 1 joe joe PROPN NNP Number=Sing 3 nsubj NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 3 talk talk VERB VB VerbForm=Inf 0 root NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 6 doctor doctor NOUN NN Number=Sing 3 obl NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 10 nurse nurse NOUN NN Number=Sing 8 obj NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 13 doctor doctor NOUN NN Number=Sing 14 nsubj NA NA

2b. Optional: Provide your own list of stopwords

If a stopword list is not available for your language, or you would like to provide your own, you can use the manual_list option within fst_prepare() (or fst_rm_stop_punct()) making sure to also either set manual = TRUE or stopwords_list = "manual".

You can also chose to not remove stopwords but you may find that you want to remove them to get more meaningful results!

If you provide a manual list, you can leave language as its default values.

Demonstration
#EXAMPLE OF PROVIDING A MANUAL LIST
manualList <- c('and', 'the', 'of', 'you', 'me', 'ours', 'mine', 'them', 'theirs')
manualList2 <- "to, the, I"

df1 <- fst_prepare(data = english_sample_survey,
                  question = 'text',
                  id = 'id',
                  model = 'english-ewt',
                  manual_list = manualList,
                  stopword_list = 'manual'
                  )

knitr::kable(head(df1, 5))
doc_id paragraph_id sentence_id sentence token_id token lemma upos xpos feats head_token_id dep_rel deps misc
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 1 joe joe PROPN NNP Number=Sing 3 nsubj NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 2 should should AUX MD VerbForm=Fin 3 aux NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 3 talk talk VERB VB VerbForm=Inf 0 root NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 4 to to ADP IN NA 6 case NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 6 doctor doctor NOUN NN Number=Sing 3 obl NA NA

df2 <- fst_prepare(data = english_sample_survey,
                  question = 'text',
                  id = 'id',
                  model = 'english-ewt',
                  manual = TRUE,
                  manual_list = manualList2
                  )

knitr::kable(head(df2, 5))
doc_id paragraph_id sentence_id sentence token_id token lemma upos xpos feats head_token_id dep_rel deps misc
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 1 joe joe PROPN NNP Number=Sing 3 nsubj NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 2 should should AUX MD VerbForm=Fin 3 aux NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 3 talk talk VERB VB VerbForm=Inf 0 root NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 6 doctor doctor NOUN NN Number=Sing 3 obl NA NA
1 1 1 Joe should talk to the doctor or tell the nurse that the doctor said he has to come back in two weeks. 7 or or CCONJ CC NA 8 cc NA NA

The remainder of the package works the same regardless of language of survey responses.