Monitoring hate speech and divisive gendered language, including through the use of AI

This activity outlines the establishment of technological systems to assess the online information environment to understand the use of hate speech and other divisive gendered language. These systems rely upon various forms of artificial intelligence (AI) to discover, categorize and quantify concerning content.

ACTIVITY

DESCRIPTION

In the contemporary political landscape, the rise of social media and digital communication has led to increased concerns regarding the spread of hate speech and divisive gendered language, particularly during election periods. The impact of such toxic discourse threatens to undermine democratic processes, exacerbating social divisions and posing significant threats to the safety and dignity of individuals, especially marginalized groups. As great efforts have been made to increase the participation of women in public life, there are credible concerns that hateful online attacks will counter this work.

Elections are pivotal moments for any society, and ensuring a fair and respectful dialogue is crucial. Hence, building robust systems to monitor and mitigate hate speech and gendered divisiveness becomes not just a technical challenge but also a societal imperative.

Hate speech, including on the basis of race, ethnicity, gender, religion or sexual orientation, can distort public opinion, incite violence and intimidate voters. Similarly, divisive gendered language perpetuates stereotypes and biases, marginalizing voices and discouraging the political participation of women and gender minorities. The unchecked proliferation of such harmful language online can skew election outcomes and destabilize communities.

Therefore, this activity looks towards the development of systems to effectively monitor and address these issues and is essential to uphold the integrity of elections and protect democratic values.

The role of AI in monitoring hate speech

Artificial Intelligence offers powerful tools to tackle the complex challenge of monitoring hate speech and gendered language. AI-driven systems can process vast amounts of data in real-time, identify patterns and detect harmful content that might elude human moderators – vital when attempting to work with large bodies of data such as social media posts. Machine learning algorithms, natural language processing (NLP) and sentiment analysis are key technologies in this endeavor. By training AI models on diverse datasets, these systems can learn to recognize nuanced forms of hate speech and gendered language across different contexts and platforms. Additionally, AI can be used to predict potential outbreaks of harmful speech, enabling preemptive actions to mitigate its impact.

Effective monitoring systems

Building effective AI-based monitoring systems can be a time consuming and resource intensive exercise. While it may remain a useful undertaking, it may be simpler to use an existing and properly maintained technological solution and to tailor that to the country context.

Some key steps to building out an AI system are:

  1. Data collection and annotation: Gathering a large, representative dataset of speech that includes both harmful and benign content. This dataset may need to be annotated to support the AI model to learn to distinguish between the nuanced and local variations of hate speech and divisive language.
  2. Model training and validation: Using advanced machine learning techniques to train the models with the dataset. These models are validated to ensure their accuracy and reliability in real-world scenarios.
  3. Data integration: The system will need to be able to access live data to be able to have current data to analyse. While some of the platforms provide data access via APIs, this has become increasingly complicated, and the data access has various limitations. There are also private companies who sell API data feeds; however, these come with additional costs. It is also possible to use web-scrapers to collect data, which incurs additional technical demands and has its own limitations.
  4. Workflows: Depending on the system’s logic and goals, workflows are likely required within the software, for example for humans to review and classify content or to direct concerning issues to the appropriate channels.
  5. Visualization and reporting: The outputs of the analysis need to be accessible and understandable. For this, dashboards are often required to provide a way to interpret the findings.
  6. Continuous learning and adaptation: Forms of hate speech and divisive language evolve over time. Thus, monitoring systems must be continuously updated with new data and feedback to adapt to emerging trends and linguistic innovations.

 

 

IMPLEMENTATION CONSIDERATIONS

1.

What are important considerations prior to initiating the activity?

Before initiating the development of AI-based monitoring systems for hate speech during elections, several critical factors must be addressed:

 

  • Goals: The intended goals of the activity are key to determining if it is the correct intervention. Specifically, this activity is suited to providing greater understanding of the information landscape. Activities that are more focused on regulatory monitoring may use some aspects of this activity but are also more strongly related to fact-checking exercises.
  • Legal and ethical frameworks: Ensure compliance with legal standards and ethical guidelines regarding electoral campaigns, data privacy, freedom of expression and human rights.
  • Stakeholder engagement: Involve key stakeholders, including government bodies, civil society organizations and technology experts, from the outset to ensure diverse perspectives and buy-in. By considering what impact the undertaking is supposed to have will help to determine the best partnerships to establish.
  • Resource allocation: Secure adequate funding and technical resources for data collection, model development and ongoing maintenance. Careful budgeting is required if the scope of development is to be significant.
  • Risk assessment: Conduct a thorough risk assessment to identify potential challenges, such as false positives/negatives, bias in AI models and public backlash.

2.

Who is best placed to implement the activity?

There is a wide range of potential implementers; however, the eventual impact should guide the final decision. The nature of the activity in providing broad insights as opposed to attempting to identify all cases of unlawful behaviour makes it a tool more for creating broader understanding and advocacy, as opposed to a regulatory tool. Accordingly, it may be of value to civil society organizations or international organizations.

3.

How to ensure context specificity and sensitivity?

To ensure the monitoring system is context-specific and sensitive, some considerations should be taken into account:

  • Localized datasets: The activity’s success will rest in no small part upon the appropriateness of the training data. Collect and annotate data that reflect the linguistic, cultural and social nuances of the target population.
  • Adaptive algorithms: Develop AI models that can be continuously updated with new data and feedback to remain relevant and effective in different contexts.

4.

How to involve youth?

Engaging youth in the development and implementation process is crucial for both innovation and relevance:

  • Youth advisory panels: Establish panels of young people to provide insights and feedback on the system’s development and implementation.
  • Hackathons and competitions: Organize events that encourage young developers and data scientists to contribute to building and improving the monitoring system.
  • Educational outreach: Partner with educational institutions to involve students in relevant projects and research, fostering a sense of ownership and responsibility.

5.

How to ensure gender sensitivity/inclusive programming?

  • Gender-disaggregated data: Collect and analyse data separately for different genders to understand specific issues and patterns.
  • Inclusive datasets: Ensure datasets and their curators include diverse voices, particularly those of women and gender minorities.
  • Consultation with gender experts: Regularly consult with gender experts and organizations to ensure the system addresses emerging gender-specific issues.

6.

How to communicate about these activities?

  • Reports: Regularly publish reports on the system’s findings and identified trends.
  • Public awareness campaigns: Use various media channels to inform the public about the findings of the activity.

7.

How to coordinate with other actors/Which other stakeholders to involve?

An Electoral Management Body’s mandate is typically limited.

  • Law enforcement: Where there are content items or users of significant concern, the information could be passed to appropriate agencies to address identified threats and ensure public safety.
  • Social media platforms: Collaborate and advocate major social media companies to monitor and manage harmful content more effectively.
  • International organizations: Engage with bodies like the United Nations and the European Union for best practices and support.
  • Civil society and advocacy groups: Partner with these groups for grassroots-level insights and community-based interventions.

How to ensure sustainability?

  • Build for sustainability: Make design and technology choices that fit the longer-term funding streams; for example, consider the potential for future software development, vendor lock-in, subscriptions or hosting costs.
  • Ongoing funding: Secure continuous funding from government sources, international grants and private-sector partnerships.
  • Capacity-building: Train local experts and stakeholders to maintain and update the system.
  • Scalable infrastructure: Develop scalable and modular systems that can adapt to increasing data volumes and evolving threats.

COST CENTRES

  • Data acquisition and annotation: Costs for collecting training datasets and manually labeling data, including hiring annotators or using crowdsourcing platforms.
  • Technology and infrastructure: Expenses for high-performance servers, cloud computing services, storage solutions and software licenses for development tools.
  • Development and maintenance: Costs for data scientists, machine learning engineers and ongoing costs for updating and retraining AI models.
  • Data feeds: Costs for acquiring and maintaining access to online data for analysis.
  • Public engagement and communication: Budget for public awareness campaigns, community engagement, transparency reports and handling public feedback.
  • Monitoring and evaluation: Expenses for tools and services to monitor system performance, conduct impact assessments and make necessary adjustments.

LIMITATIONS AND CHALLENGES

  • Outcomes: The importance of using data to inform understanding and allow for evidence-based interventions is undeniable. However, the direct impact of the dashboard alone is limited and should be combined with other actions, such as communication or advocacy.
  • Data scarcity: There may be limited availability of high-quality, annotated datasets specific to gendered hate speech, which can lead to inaccurate AI models. This may be particularly challenging based on the specific country and language.
  • Bias: There are strong concerns about AI systems’ potential to create or magnify harmful biases against women stakeholders. Protections are required against this; however, there is also a requirement for the owners of the system to consider how they wish to measure and weight potential biases towards their desired outcome.
  • Contextual understanding: AI systems struggle with interpreting context, such as sarcasm and nuanced language, leading to potential misclassifications. Users of harmful language are also often aware of these weaknesses and exploit them to make it harder for AI systems to discover them – specifically with the goal of preventing platforms from monitoring them.
  • Privacy and legal issues: Balancing effective monitoring with user privacy and avoiding infringements on freedom of speech presents significant ethical and legal challenges. There are also data processing and cybersecurity considerations when systems start to store troves of personal data.
  • Technical limitations: Real-time processing and scalability require substantial computational resources and efficient algorithms, which are difficult to implement, maintain and or fund.
  • Continuous adaptation: Hate speech evolves rapidly, necessitating ongoing updates and retraining of AI systems to maintain their effectiveness. This creates an on-going burden to the implementer.
  • Public trust and acceptance: Gaining trust and acceptance from the public and stakeholders is challenging, particularly concerning privacy, bias and potential misuse of data.
  • Resource constraints: Developing and maintaining these systems requires significant financial resources and technical expertise, which may be limited, especially in non-profit or public-sector contexts.

RESOURCES

EXAMPLES

IMPLEMENTATION PROCESS

COUNTRY DEPLOYMENTS

ADDITIONAL INFORMATION

DO NOT DELETE THIS SECTION - CONTAINS THE CODE TO HIDE EMPTY ELEMENTS

Information Integrity E-learning

Coming soon