build_features.Rd
Builds the feature-matrix from a text-vector
build_features(x, term_count_min = 1, mdl = NULL, parallel = TRUE, quiet = FALSE)
x | a vector of text |
---|---|
term_count_min | a number passed to
|
mdl | is a list of existing models-data (containing the vectorizer, the tfidf, and the lsa object), defaults to NULL, in which case it is rebuild |
parallel | T/F if the task should be executed in parallel, defaults to TRUE |
quiet | T/F if the function remains silent, defaults to FALSE |
a list of two: a dgCMatrix that contains the features (columns) for each text (row) and as a second element a list of the model that can be passed as mdl
text <- c( "This is a first text that describes something", "A second Text That USES A LOT of CAPITALS", "Lastly MANY!!!! (like, really a lot!) punctuations!!!" ) build_features(text)#> Calculating Features... #> Create DTM... #> Finished in 5.12 seconds#> $model_matrix #> 3 x 21 sparse Matrix of class "dgCMatrix"#>#> #> 1 45 1 0.02222222 . . . . 8 . . 1 1 1 . . . . . . . . #> 2 41 19 0.46341463 . . . . 9 . . . . . 1 1 1 . . . . . #> 3 53 5 0.09433962 . 8 . 11 7 . . . . . . . . 1 1 1 1 1 #> #> $mdl #> $mdl$vectorizer #> function (iterator, grow_dtm, skip_grams_window_context, window_size, #> weights) #> { #> vocab_corpus_ptr = cpp_vocabulary_corpus_create(vocabulary$term, #> attr(vocabulary, "ngram")[[1]], attr(vocabulary, "ngram")[[2]], #> attr(vocabulary, "stopwords"), attr(vocabulary, "sep_ngram")) #> setattr(vocab_corpus_ptr, "ids", character(0)) #> setattr(vocab_corpus_ptr, "class", "VocabCorpus") #> corpus_insert(vocab_corpus_ptr, iterator, grow_dtm, skip_grams_window_context, #> window_size, weights) #> } #> <environment: 0x000000000652caf0> #> #># a second example train <- c("Banking is finance", "flowers are not houses", "finance is power", "houses are build") test <- c("finance is greed", "flowers belong in the garbage", "houses are build") a1 <- build_features(test)#> Calculating Features... #> Create DTM... #> Finished in 3.38 secondsa12 <- build_features(test, mdl = a1$mdl)#> Calculating Features... #> Create DTM... #> Finished in 2.99 secondsa2 <- build_features(train, mdl = a1$mdl)#> Calculating Features... #> Create DTM... #> Finished in 3.07 secondsa2$model_matrix %>% as.matrix()#> length ncap ncap_len nsen nexcl nquest npunct nword nsymb nsmile greed #> 1 18 1 0.05555556 0 0 0 0 3 0 0 0 #> 2 22 0 0.00000000 0 0 0 0 4 0 0 0 #> 3 16 0 0.00000000 0 0 0 0 3 0 0 0 #> 4 16 0 0.00000000 0 0 0 0 3 0 0 0 #> financ belong garbag flower build hous #> 1 1 0 0 0 0 0 #> 2 0 0 0 1 0 1 #> 3 1 0 0 0 0 0 #> 4 0 0 0 0 1 1