Turbocharged: automating quality analysis in trust & safety
Author
Alexandra Bianca Tîrnăcop
Corresponding AuthorAffiliation: Bucharest University of Economic Studies, Bucharest, Romania
How to Cite
Tîrnăcop, A. B. (2025). Turbocharged: automating quality analysis in trust & safety. CACTUS - Journal of Tourism Business, Management and Economics, 32 (1). doi.org/10.24818/CTS/7/2025/2.11
© 2025 The Author(s);
Licensed under CC BY-NC 4.0
Abstract
Trust and Safety (T&S) is a key framework for online platforms, aiming to protect users from harm such as misinformation, harassment, and exploitation, while also supporting free expression. Although policies, AI tools, and cross-platform collaboration (e.g., GIFCT, StopNCII.org) enhance moderation, significant challenges remain. This study uses a demo dataset of 15 social media posts, reviewed by 9 moderators and checked by a single analyst. Each ticket has been reviewed by three raters to ensure agreement. The model achieved a precision, recall, and F1 score of 70.37%, with an overall accuracy of 64.44%. Automation improves efficiency but requires bias moderation, transparency, and human intervention to address challenging content. However, outsourcing and underinvestment in moderators raise ethical concerns, as human reviewers face psychological risks without adequate support. To address these issues, this paper proposes a decision matrix for use in both machine learning training and moderator and quality analyst training.
Keywords
JEL Classification
References
Ahmed, A., & Khan, M. N. (2024). AI and content moderation: Legal and ethical approaches to protecting free speech and privacy [Manuscript]. ResearchGate. https://www.researchgate.net/publication/383661951_AI_and_Content_Moderation_Legal_and_Ethical_Approaches_to_Protecting_Free_Speech_and_Privacy
Business & Human Rights Resource Centre. (2021). Santa Clara Principles present standards for tech platforms to provide transparency and accountability in content moderation. https://www.business-humanrights.org/en/latest-news/the-santa-clara-principles-on-transparency-and-accountability-in-content-moderation/
Cyberhaven. (2015). What are false positives? https://www.cyberhaven.com/infosec-essentials/what-are-false-positives
Digital Trust & Safety Partnership. (2024a). Trust & safety best practices framework [PDF]. https://dtspartnership.org/wp-content/uploads/2021/04/DTSP_Best_Practices.pdf
Digital Trust & Safety Partnership. (2024b). Best practices for AI and automation in trust & safety [PDF]. https://dtspartnership.org/wp-content/uploads/2024/09/DTSP_Best-Practices-for-AI-Automation-in-Trust-Safety.pdf
Eissfeldt, J., & Mukherjee, S. (2023). Evaluating the forces shaping the trust & safety industry. Tech Policy Press. https://www.techpolicy.press/evaluating-the-forces-shaping-the-trust-safety-industry/
Global Internet Forum to Counter Terrorism. (2022). HSDB taxonomy – For publication (Dec 2022) [PDF]. https://gifct.org/wp-content/uploads/2022/12/HSDB-Taxonomy-FOR-PUBLICATION-Dec-2022-1.pdf
Global Internet Forum to Counter Terrorism. (2024). GIFCT’s hash-sharing database. https://gifct.org/hsdb/
Google for Developers. (2025). Classification: Accuracy, recall, precision, and related metrics. https://developers.google.com/machine-learning/crashcourse/classification/accuracy-precision-recall
Habibi, M., Hovy, D., & Schwartz, C. (2025). The content moderator’s dilemma: Removal of toxic content and distortions to online discourse (arXiv:2412.16114). arXiv. https://doi.org/10.48550/arXiv.2412.16114
Horatio. (2025). What is content moderation? Pros, cons, and best practices. https://www.hirehoratio.com/blog/what-is-content-moderation
Institute for Human Rights and Business. (2025). Content moderation is a new factory floor of exploitation – Labour protections must catch up. https://www.ihrb.org/latest/content-moderation-is-a-new-factory-floor-of-exploitation-labour-protections-must-catch-up
INTERPOL. (2024a). Crimes against children. https://www.interpol.int/en/Crimes/Crimes-against-children
INTERPOL. (2024b). International child sexual exploitation database. https://www.interpol.int/en/Crimes/Crimes-against-children/International-Child-Sexual-Exploitation-database
Juba, B., & Le, H. S. (2019). Precision-recall versus accuracy and the role of large data sets. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 4039–4048. https://doi.org/10.1609/aaai.v33i01.33014039
Listen Data. (2024). How to calculate confusion matrix in Excel. https://www.listendata.com/2024/06/confusion-matrix-in-excel.html
Microsoft. (2025). AND function. https://support.microsoft.com/en-us/office/and-function-5f19b2e8-e1df-4408-897a-ce285a19e9d9
Mollas, I., Chrysopoulou, Z., Karlos, S., & Tsoumakas, G. (2021). Ethos: An online hate speech detection dataset (arXiv:2006.08328). arXiv. https://arxiv.org/pdf/2006.08328
Oversight Board. (2025). Content moderation in a new era for AI and automation. https://www.oversightboard.com/news/content-moderation-in-a-new-era-for-ai-and-automation/
Reelmind. (2025). Ametures gone wild: AI content moderation challenges. https://reelmind.ai/blog/ametures-gone-wild-ai-content-moderation-challenges
Ricknell, E. (2020). Freedom of expression and alternatives for internet governance: Prospects and pitfalls. Media and Communication, 8(4), 110–120. https://doi.org/10.17645/mac.v8i4.3299
Santa Clara Principles. (2021a). SCP 2.0 toolkit for companies. https://santaclaraprinciples.org/toolkit-companies/
Santa Clara Principles. (2021b). Santa Clara Principles 2.0 open consultation report. https://santaclaraprinciples.org/open-consultation/
Shulruff, T. (2024). Trust and safety work: Internal governance of technology risks and harms. Journal of Integrated Global STEM, 1(2), 95–105. https://doi.org/10.1515/jigs-2024-0003
Shweta, R. C., Bajpai, R. C., & Chaturvedi, H. K. (2015). Evaluation of inter-rater agreement and inter-rater reliability for observational data: An overview of concepts and methods. Journal of the Indian Academy of Applied Psychology, 41(3), 20–27.
Siapera, E. (2021). AI content moderation, racism and (de)coloniality. International Journal of Bullying Prevention, 4, 55–65. https://doi.org/10.1007/s42380-021-00105-7
StopNCII.org. (2025). How StopNCII.org works. https://stopncii.org/chi-siamo/
Tremau. (2025). Content moderation: Key practices & challenges. https://tremau.com/resources/content-moderation-key-practices-challenges/
TSPA. (2025). Content moderation quality assurance. https://www.tspa.org/curriculum/ts-fundamentals/content-moderation-and-operations/content-moderation-quality-assurance/
Vargas Penagos, E. (2025). Platforms on the hook? EU and human rights requirements for human involvement in content moderation. Cambridge Forum on AI: Law and Governance, 1, e23. https://doi.org/10.1017/cfl.2025.3
Walker, A. R. (2025, April 11). Legal Defense Fund exits Meta civil rights advisory group over DEI changes. The Guardian. https://www.theguardian.com/technology/2025/apr/11/meta-ldf-dei-policy
Weigl, L., & Bodó, B. (2025). Trust and safety in the age of AI – The economics and practice of the platform-based discourse apparatus (Amsterdam Law School Legal Studies & Institute for Information Law Research Paper No. 2025-1). SSRN. https://doi.org/10.2139/ssrn.5116478
Woods, J. (2022). Bias in AI program: Showing businesses how to reduce bias and mitigate risk. Vector Institute. https://vectorinstitute.ai/bias-in-ai-program-showing-businesses-how-to-reduce-bias-and-mitigate-risk/
Zeng, J., & Kaye, D. B. V. (2022). From content moderation to visibility moderation: A case study of platform governance on TikTok. Policy & Internet, 14, 79–95. https://doi.org/10.1002/poi3.287