Sentiment Classification of Bank Clients’ Reviews Written in the Polish Language

Szczegóły
Opis

Tytuł:: Sentiment Classification of Bank Clients’ Reviews Written in the Polish Language
Analiza sentymentu na podstawie polskojęzycznych recenzji klientów banku
Autorzy:: Idczak, Adam Piotr
Tematy:: analiza sentymentu
klasyfikacja dokumentów
textmining
regresja logistyczna
naiwny klasyfikator Bayesa
sentiment analysis
opinion mining
text classification
text mining
logistic regression
naive Bayes classifier
Data publikacji:: 2021-06-30
Wydawca:: Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego
Język:: angielski
Prawa:: CC BY: Creative Commons Uznanie autorstwa 4.0
Źródło:: Acta Universitatis Lodziensis. Folia Oeconomica; 2021, 2, 353; 43-56
0208-6018
2353-7663
Dostawca treści:: Biblioteka Nauki
: Artykuł

Przejdź do źródła

It is estimated that approximately 80% of all data gathered by companies are text documents. This article is devoted to one of the most common problems in text mining, i.e. text classification in sentiment analysis, which focuses on determining the sentiment of a document. A lack of defined structure of the text makes this problem more challenging. This has led to the development of various techniques used in determining the sentiment of a document. In this paper, a comparative analysis of two methods in sentiment classification, a naive Bayes classifier and logistic regression, was conducted. Analysed texts are written in the Polish language and come from banks. The classification was conducted by means of a bag‑of‑n‑grams approach, where a text document is presented as a set of terms and each term consists of n words. The results show that logistic regression performed better.

Szacuje się, że około 80% wszystkich danych gromadzonych i przechowywanych w systemach informacyjnych przedsiębiorstw ma postać dokumentów tekstowych. Artykuł jest poświęcony jednemu z podstawowych problemów textminingu, tj. klasyfikacji tekstów w analizie sentymentu, która rozumiana jest jako badanie wydźwięku tekstu. Brak określonej struktury dokumentów tekstowych jest przeszkodą w realizacji tego zadania. Taki stan rzeczy wymusił rozwój wielu różnorodnych technik ustalania sentymentu dokumentów. W artykule przeprowadzono analizę porównawczą dwóch metod badania sentymentu: naiwnego klasyfikatora Bayesa oraz regresji logistycznej. Badane teksty są napisane w języku polskim, pochodzą z banków i mają charakter marketingowy. Klasyfikację przeprowadzono, stosując podejście bag‑of‑n‑grams. W ramach tego podejścia dokument tekstowy wyrażony jest za pomocą podciągów składających się z określonej liczby n wyrazów. Uzyskane wyniki pokazały, że lepiej spisała się regresja logistyczna.

Informacja

Sentiment Classification of Bank Clients’ Reviews Written in the Polish Language