MURAL - Maynooth University Research Archive Library



    Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text


    Parnell, Andrew, González-Castro, Víctor, Alaiz-Rodríguez, Rocío and Barrientos, Gonzalo Molpeceres (2020) Machine Learning Techniques for the Detection of Inappropriate Erotic Content in Text. International Journal of Computational Intelligence Systems, 13 (1). p. 591. ISSN 1875-6883

    [thumbnail of AndrewParnellMac2022.pdf]
    Preview
    Text
    AndrewParnellMac2022.pdf

    Download (3MB) | Preview

    Abstract

    Nowadays, children have access to Internet on a regular basis. Just like the real world, the Internet has many unsafe locations where kids may be exposed to inappropriate content in the form of obscene, aggressive, erotic or rude comments. In this work, we address the problem of detecting erotic/sexual content on text documents using Natural Language Processing (NLP) techniques. Following an approach based on Machine Learning techniques, we have assessed twelve models resulting from the combination of three text encoders (Bag of Words, Term Frequency-Inverse Document Frequency and Word2vec) together with four classifiers (Support Vector Machines (SVMs), Logistic Regression, k-Nearest Neighbours and Random Forests). We evaluated these alternatives on a new created dataset extracted from public data on the Reddit Website. The best performance result was achieved by the combination of the text encoder TF-IDF and the SVM classifier with linear kernel with an accuracy of 0.97 and F-score 0.96 (precision 0.96/recall 0.95). This study demonstrates that it is possible to detect erotic content on text documents and therefore, develop filters for minors or according to user's preferences.
    Item Type: Article
    Additional Information: Cite as: Hernández, A., Martin-Puertas, C., Moffa-Sánchez, P., Moreno-Chamarro, E., Ortega, P., Blockley, S., Cobb, K.M., Comas-Bru, L., Giralt, S., Goosse, H., Luterbacher, J., Martrat, B., Muscheler, R., Parnell, A., Pla-Rabes, S., Sjolte, J., Scaife, A.A., Swingedouw, D., Wise, E. & Xu, G. 2020, "Modes of climate variability: Synthesis and review of proxy-based reconstructions through the Holocene", Earth-science reviews, vol. 209, pp. 103286. Copyright: This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).
    Keywords: Inappropriate content; Machine learning; Text classification Natural language processing; Text encoders;
    Academic Unit: Faculty of Science and Engineering > Mathematics and Statistics
    Faculty of Science and Engineering > Research Institutes > Hamilton Institute
    Faculty of Social Sciences > Research Institutes > Irish Climate Analysis and Research Units, ICARUS
    Item ID: 16230
    Identification Number: 10.2991/ijcis.d.200519.003
    Depositing User: Andrew Parnell
    Date Deposited: 05 Jul 2022 14:20
    Journal or Publication Title: International Journal of Computational Intelligence Systems
    Publisher: Atlantis Press
    Refereed: Yes
    Related URLs:
    URI: https://mu.eprints-hosting.org/id/eprint/16230
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Repository Staff Only (login required)

    Item control page
    Item control page

    Downloads

    Downloads per month over past year

    Origin of downloads