When People call out the AI

First Algorithmic Bias bounty challenge

Many people think we are still waiting for a Terminator moment when we should start worrying about AI’s harmful “intentions”.

Illustration borrowed from the movies The Terminator (left) and The Social Dilemma (right)

In reality, our society is already merging with AI in a way more indiscernible fashion. Currently, humans have an alarmingly unbalanced relationship with AI which shows itself in the form of recommendation algorithms, search engines, route planners, medical outcome predictors etc., owned by companies, such as Facebook, Google, Twitter, Spotify, Apple, Amazon, Microsoft and tremendous amount of AI startups. We feed these algorithms with our behavioural data — clicks, view times, time spent on watching a post/video, likes, scroll speed etc. — and they form our tastes in return. This is a surreptitious feedback loop we suddenly find ourselves in, and too few is aware of it.
I touched what we as users can do about it individually, in a previous post. Now, we look at the responsibility of the ones who own some power over these algorithms.

Some academic research has addressed AI Fairness issues, such as gender and racial bias in facial recognition, Google image search and language modelling. Treating this as a solely academic issue is, however, beating around the bush, since the AI algorithms which interact with billions of people on a daily basis are developed and operated by tech companies.

Company market incentives are built into the aims of these algorithms in the form of keeping our attention on their platforms. This can have various side effects even beyond the obvious addictiveness, such as the indications that YouTube’s recommendation system drives people towards ever radicalising content. There are already several startups which use AI to predict beauty scores for photos of faces based on data which overrepresent white or chinese faces. They then recommend beauty products and plastic surgery to fix the “flaws”. A study showed that people after hearing the AI predicted score, gave scores closer to the algorithmically generated result than before. Recommendation algorithms are changing our preferences which can be continuously narrowed this way.
(Here, I want to point out that you are also reading this on a commercial social media platform, called Medium, with its own business model.)

In the past few years Responsible AI teams have emerged at all of the mentioned big companies. I assume, one reason for this is that regulators have also shown signs of awareness of issues around AI Risks and Fairness. Although, it is still in its infancy, this could become a form of checks and balances which incentivise these companies to deal with the potential harm their technology entails, if not without bumps.

Twitter’s first algorithmic bias bounty challenge

Last year users identified racial bias on Twitter’s image cropping algorithm for thumbnail generation. The company reacted quickly and announced a new thumbnail cropping method which gives users more visibility and control over what their images will look like.

This spring Twitter founded a promising team who works on Responsible AI. They are called META (stands for ML Ethics, Transparency and Accountability).

On the 30th of July they announced the first algorithmic bias bounty challenge with the mission to show potential harms such an image cropping algorithm may introduce.
They shared their code and paper on the mentioned fiasco. People from around the world could try and test Twitter’s model with all sort of adversarial examples which may reveal more harmful side effects.

I also participated in the challenge, which was great fun. Although I didn’t win an award, I learned a lot from other’s work. I decided to share my summary of the winning submissions presented at DEFCON AIVilage.

Each highlighted a different type of harm:

How to become more salient?

The 1st Place went to a project which successfully tapped onto the feedback-loop nature of social media and human psyche, something the judges and I are also quite excited about. They generated faces and modified them to increase the cropping algorithm’s internal score. They showed that making a face appear e.g. slimmer, younger, of lighter or warmer skin colour makes it score higher. This is a beautiful showcase of how AI can exacerbate the problem with beauty standards, ageism and racism.

Ageism, ableism and racism on group photos

The winner of the 2nd prize analysed group photos and found bias against individuals with grey hair, dark skin and disability. They also employed modified photos, where they coloured some people’s hair white to show the effect.

Focus point on the original image

Focus point after turning the middle girl’s hair white.

“Gazing at the mother tongue” on bilingues memes

The 3rd winner project showed linguistic bias using bilingual memes. They found that the algorithm favours latin scripts over arabic which can indicate harms in terms of linguistic diversity and representation online.

Veterans, religions, spoilers

Further three projects received honorable mentions.

The first one showed bias against camouflage outfits. This may count as working *as expected*, however the authors wanted to highlight that “working as expected” does not always mean it’s working well for people —underscoring that saliency isn’t always a good proxy for relevance.

The second project showed that image cropping has an overwhelming bias against certain headpieces associated with certain religious communities.

The last one showed that the algorithm violates the social contract on not sharing spoilers, by consistently highlighting the last frame out of comics.

Emojis and hacking borders

The most innovative prize was won by a work which showed bias towards light skin tone emojis over darker ones.

The creators of the most generalizable submission inserted a small border on the image, which is invisible to the human eye but enough to make the machine crop what’s inside the box. This is a completely different approach from the previous ones, as it shows a type of bias of the algorithm, which doesn’t stem from some human bias in the data. Instead, it points out the significant differences between human and machine vision and thinking. This alerts developers to think about elusive machine specific weaknesses as well.

The Rich the Poor and the Attractive

Finally, let me share my findings on global class bias, based on income (Github code).

I learned about the week long challenge 3 days in, but was lucky enough to get hold of some very interesting data provided by the creators of Dollar Street.
I tested the algorithm for images of cheap versus expensive objects, using the income labels of Dollar Street as a proxy. I found that the algorithm is biased towards cheap rooms and spaces, and towards expensive objects. This has the potential of reinforcing stereotypes about how people live in different social classes and could exaggerate higher income class narratives.

Left: cheap, Right: expensive washing detergents.

Left: cheap, Right: expensive spices.

Left: cheap, Right: expensive “wall inside”.

Where to go from here?

The format of this bounty challenge opened a new way for everyday AI users to interact with and call out this piece of AI technology on its biases.

During the conference, Twitter META judges highlighted the crucial aspect of the diversity of ideas. They reported on facing their own biases by learning about ideas of people from five continents, submitted by groups and individuals. I think this may be one of the most important take home messages of this event.

As of today, I see the desirable direction of the relationship between Human and AI the following way:

  1. Regulations are probably needed to balance for market incentives of tech companies.

  2. As a result, companies are hopefully going to further invest in their Responsible AI research teams.

  3. AI models should be tested not only using standard software tests by company developers and testers.
    There should be a requirement (perhaps in form of a regulation) for independent Responsibility and Transparency testing.

  4. Responsibility and Transparency testing should happen by independent bodies as well, which would incentivise within company testing too.

  5. The diversity of testing bodies has high priority.

  6. Education and communication about ethics, social issues and technology should be encouraged within society to facilitate awareness of individual users of their every day interactions with AI algorithms.

  7. Ethics has always been both a part of philosophy as well as a practical topic when it comes to law and policy making. Even in the latter case, change took decades or generations. In the era of AI, many questions, such as “What is beauty?”, “What is truth?”, “Whose narrative should prevail?” suddenly became urgent practical questions, which evolve by the minute. Therefore, I deem, today, societal conversation about basic ethical questions has an unprecedented relevance.

Challenges such as the one discussed here are a good start, in my opinion. I would glad to see more of the kind by other companies too.
But I also think that this is only the beginning of building up structures and institutions from scratch, which handle our fast paced evolution of merging with AI.


Please comment and like under the original post on Medum.

Previous
Previous

Jöttünk, láttunk… boroztunk

Next
Next

Semi-Peripheral Vaccine Surrealism