Turbocharged: automating quality analysis in trust & safety

Tîrnăcop, Alexandra Bianca

doi:10.24818/CTS/7/2025/2.11

Back to Volume

Download PDF Open PDF

Turbocharged: automating quality analysis in trust & safety

Author

Alexandra Bianca Tîrnăcop

Corresponding Author

Affiliation: Bucharest University of Economic Studies, Bucharest, Romania

Email: tirnacopalexandra21@stud.ase.ro

https://orcid.org/0009-0004-2945-0643

Published:December 15, 2025

Download PDF

How to Cite

Tîrnăcop, A. B. (2025). Turbocharged: automating quality analysis in trust & safety. CACTUS - Journal of Tourism Business, Management and Economics, 32 (1). doi.org/10.24818/CTS/7/2025/2.11

Based on the official APA guide. Review the full set of examples.

Licensed under CC BY-NC 4.0

Abstract

Trust and Safety (T&S) is a key framework for online platforms, aiming to protect users from harm such as misinformation, harassment, and exploitation, while also supporting free expression. Although policies, AI tools, and cross-platform collaboration (e.g., GIFCT, StopNCII.org) enhance moderation, significant challenges remain. This study uses a demo dataset of 15 social media posts, reviewed by 9 moderators and checked by a single analyst. Each ticket has been reviewed by three raters to ensure agreement. The model achieved a precision, recall, and F1 score of 70.37%, with an overall accuracy of 64.44%. Automation improves efficiency but requires bias moderation, transparency, and human intervention to address challenging content. However, outsourcing and underinvestment in moderators raise ethical concerns, as human reviewers face psychological risks without adequate support. To address these issues, this paper proposes a decision matrix for use in both machine learning training and moderator and quality analyst training.

Keywords

artificial intelligence; key performance indicators; machine learning

JEL Classification

M11, O22

References

Ahmed, A., & Khan, M. N. (2024). AI and content moderation: Legal and ethical approaches to protecting free speech and privacy [Manuscript]. ResearchGate. https://www.researchgate.net/publication/383661951_AI_and_Content_Moderation_Legal_and_Ethical_Approaches_to_Protecting_Free_Speech_and_Privacy

Business & Human Rights Resource Centre. (2021). Santa Clara Principles present standards for tech platforms to provide transparency and accountability in content moderation. https://www.business-humanrights.org/en/latest-news/the-santa-clara-principles-on-transparency-and-accountability-in-content-moderation/

Cyberhaven. (2015). What are false positives? https://www.cyberhaven.com/infosec-essentials/what-are-false-positives

Digital Trust & Safety Partnership. (2024a). Trust & safety best practices framework [PDF]. https://dtspartnership.org/wp-content/uploads/2021/04/DTSP_Best_Practices.pdf

Digital Trust & Safety Partnership. (2024b). Best practices for AI and automation in trust & safety [PDF]. https://dtspartnership.org/wp-content/uploads/2024/09/DTSP_Best-Practices-for-AI-Automation-in-Trust-Safety.pdf

Eissfeldt, J., & Mukherjee, S. (2023). Evaluating the forces shaping the trust & safety industry. Tech Policy Press. https://www.techpolicy.press/evaluating-the-forces-shaping-the-trust-safety-industry/

Global Internet Forum to Counter Terrorism. (2022). HSDB taxonomy – For publication (Dec 2022) [PDF]. https://gifct.org/wp-content/uploads/2022/12/HSDB-Taxonomy-FOR-PUBLICATION-Dec-2022-1.pdf

Global Internet Forum to Counter Terrorism. (2024). GIFCT’s hash-sharing database. https://gifct.org/hsdb/

Google for Developers. (2025). Classification: Accuracy, recall, precision, and related metrics. https://developers.google.com/machine-learning/crashcourse/classification/accuracy-precision-recall

Habibi, M., Hovy, D., & Schwartz, C. (2025). The content moderator’s dilemma: Removal of toxic content and distortions to online discourse (arXiv:2412.16114). arXiv. https://doi.org/10.48550/arXiv.2412.16114

Horatio. (2025). What is content moderation? Pros, cons, and best practices. https://www.hirehoratio.com/blog/what-is-content-moderation

Institute for Human Rights and Business. (2025). Content moderation is a new factory floor of exploitation – Labour protections must catch up. https://www.ihrb.org/latest/content-moderation-is-a-new-factory-floor-of-exploitation-labour-protections-must-catch-up

INTERPOL. (2024a). Crimes against children. https://www.interpol.int/en/Crimes/Crimes-against-children

INTERPOL. (2024b). International child sexual exploitation database. https://www.interpol.int/en/Crimes/Crimes-against-children/International-Child-Sexual-Exploitation-database

Juba, B., & Le, H. S. (2019). Precision-recall versus accuracy and the role of large data sets. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 4039–4048. https://doi.org/10.1609/aaai.v33i01.33014039

Listen Data. (2024). How to calculate confusion matrix in Excel. https://www.listendata.com/2024/06/confusion-matrix-in-excel.html

Microsoft. (2025). AND function. https://support.microsoft.com/en-us/office/and-function-5f19b2e8-e1df-4408-897a-ce285a19e9d9

Mollas, I., Chrysopoulou, Z., Karlos, S., & Tsoumakas, G. (2021). Ethos: An online hate speech detection dataset (arXiv:2006.08328). arXiv. https://arxiv.org/pdf/2006.08328

Oversight Board. (2025). Content moderation in a new era for AI and automation. https://www.oversightboard.com/news/content-moderation-in-a-new-era-for-ai-and-automation/

Reelmind. (2025). Ametures gone wild: AI content moderation challenges. https://reelmind.ai/blog/ametures-gone-wild-ai-content-moderation-challenges

Ricknell, E. (2020). Freedom of expression and alternatives for internet governance: Prospects and pitfalls. Media and Communication, 8(4), 110–120. https://doi.org/10.17645/mac.v8i4.3299

Santa Clara Principles. (2021a). SCP 2.0 toolkit for companies. https://santaclaraprinciples.org/toolkit-companies/

Santa Clara Principles. (2021b). Santa Clara Principles 2.0 open consultation report. https://santaclaraprinciples.org/open-consultation/

Shulruff, T. (2024). Trust and safety work: Internal governance of technology risks and harms. Journal of Integrated Global STEM, 1(2), 95–105. https://doi.org/10.1515/jigs-2024-0003

Shweta, R. C., Bajpai, R. C., & Chaturvedi, H. K. (2015). Evaluation of inter-rater agreement and inter-rater reliability for observational data: An overview of concepts and methods. Journal of the Indian Academy of Applied Psychology, 41(3), 20–27.

Siapera, E. (2021). AI content moderation, racism and (de)coloniality. International Journal of Bullying Prevention, 4, 55–65. https://doi.org/10.1007/s42380-021-00105-7

StopNCII.org. (2025). How StopNCII.org works. https://stopncii.org/chi-siamo/

Tremau. (2025). Content moderation: Key practices & challenges. https://tremau.com/resources/content-moderation-key-practices-challenges/

TSPA. (2025). Content moderation quality assurance. https://www.tspa.org/curriculum/ts-fundamentals/content-moderation-and-operations/content-moderation-quality-assurance/

Vargas Penagos, E. (2025). Platforms on the hook? EU and human rights requirements for human involvement in content moderation. Cambridge Forum on AI: Law and Governance, 1, e23. https://doi.org/10.1017/cfl.2025.3

Walker, A. R. (2025, April 11). Legal Defense Fund exits Meta civil rights advisory group over DEI changes. The Guardian. https://www.theguardian.com/technology/2025/apr/11/meta-ldf-dei-policy

Weigl, L., & Bodó, B. (2025). Trust and safety in the age of AI – The economics and practice of the platform-based discourse apparatus (Amsterdam Law School Legal Studies & Institute for Information Law Research Paper No. 2025-1). SSRN. https://doi.org/10.2139/ssrn.5116478

Woods, J. (2022). Bias in AI program: Showing businesses how to reduce bias and mitigate risk. Vector Institute. https://vectorinstitute.ai/bias-in-ai-program-showing-businesses-how-to-reduce-bias-and-mitigate-risk/

Zeng, J., & Kaye, D. B. V. (2022). From content moderation to visibility moderation: A case study of platform governance on TikTok. Policy & Internet, 14, 79–95. https://doi.org/10.1002/poi3.287