Description of Algorithmic Audit

Description of Algorithmic Audit

Olay Skin Advisor

September 16th, 2021

This report describes ORCAA’s algorithmic audit of Olay’s Skin Advisor system. We summarize the audit process, recap key findings, and describe the steps Olay is taking to address these findings.

About ORCAA

O’Neil Risk Consulting and Algorithmic Auditing (ORCAA) is a consultancy that helps organizations identify and manage risks arising from the use of predictive models, AI, and related technologies. Our algorithmic audits ask what it means for a specific model/AI to succeed and how it could fail, with a focus on fairness and bias issues. We engage stakeholders directly to elicit their concerns about the model/AI, quantify those concerns through data analysis, and suggest remediation steps. We have completed audits in industries including hiring, insurance, and housing/hospitality, as well as with municipal and public agencies. ORCAA is led by Cathy O’Neil, author of Weapons of Math Destruction, and Jacob Appel, coauthor of More Than Good Intentions and Failing in the Field.

In this audit we were proud to partner with Joy Buolamwini, the founder of Algorithmic Justice League (AJL), a nonprofit organization that combines art and research to illuminate the social implications and harms of artificial intelligence. She contributed recommendations to the internal and external audit reports. ORCAA’s audit team included AJL members Cathy O’Neil and Meredith Broussard, author of Artificial Unintelligence.

Recap of the audit

In April and May of 2021, ORCAA performed an algorithmic audit of Olay’s Skin Advisor system. Skin Advisor analyzes pictures of faces to provide personalized product recommendations, but it does not perform facial verification or facial identification. It does not identify individuals based on their face, nor does it match faces to a database. The audit was commissioned in connection with Olay’s Decode The Bias campaign. It aimed to evaluate concerns around fairness and bias, and to identify any aspects of Skin Advisor that could have unintended consequences or be misleading to consumers. The goal was to evaluate how well Skin Advisor works for users taking into consideration skin tone, age or other characteristics

The algorithmic audit can be seen as an extended conversation that ORCAA helped to facilitate and document. The conversation included five parts:

  1. Preparation // Defining the use case, reviewing background documentation provided by Olay.
  2. Stakeholder discussions // Interviewing stakeholders of the algorithm to elicit their concerns. Essentially this means asking, “How could this fail, and what would that mean for you?” Stakeholders included teams within Olay (product, data- and lab-science, IT, marketing, communications); a diverse group of Skin Advisor users; and Joy Buolamwini to provide an algorithmic justice lens. During this step we also learned the history of Skin Advisor’s development from Olay stakeholders.
  3. The Ethical Matrix // Mapping stakeholders and their concerns onto a grid, which ORCAA and Olay reviewed to prioritize the most pressing concerns. The key question was, “Which concerns, if realized, would be an existential threat to the system working, or a ‘dealbreaker’ for some stakeholder?”
  4. Validation through data analysis // At ORCAA’s request Olay conducted a review of historical Skin Advisor data to see whether the algorithm performed differently for different groups of users. This analysis measured differential performance directly, which helped confirm the extent to which key concerns were being realized.
  5. Developing guidance // Coming up with ways to further investigate and address the priority concerns. The result of this phase was a set of recommended next steps for Olay, corresponding to the most urgent concerns.

Key Audit Findings and Action Steps

A key point of context is that typical users see Skin Advisor as a fun and mostly low-stakes tool -- closer to a beauty quiz than to advice from a clinician -- so the potential for serious harm to individuals is limited. That said, Skin Advisor is presented to users as scientific, and the concerns raised in the audit intersect with larger issues of equity and inclusivity in technological tools. The main findings of the audit focused on ways that Skin Advisor could become more robust in its treatment of skin tone, age, and gender. ORCAA led the audit process and found that Skin Advisor could be more inclusive, because many algorithmic systems have hidden bias. Olay’s choice to audit and proactively address the issues is consistent with the goals of its Decode The Bias campaign. Below, we list each finding with responses from Olay that indicate the extent to which they are committed to driving more equity and inclusion with the Olay Skin Advisor tool.

  1. Skin Advisor’s skin age estimates are more accurate for some users than others.
    A key concern of Olay’s was that Skin Advisor should work well for all users, regardless of age, skin tone, or other characteristics. Analysis of historical data showed some performance gaps in skin age estimates:
    • Different accuracy by user age. Skin Advisor’s skin age estimates were most accurate for users aged 30-39. The median error in this group was less than 1 year whereas it was less than 3 years for 20-29 and 40-59 age groups. Skin age estimates were less accurate for younger and older users, with median errors of ~7 years for users under 20 or over 60.
    • Better accuracy for lighter skin tones. Skin Advisor’s skin age estimates were slightly less accurate for users with darker skin tones. Though differences are relatively small, mean error increased uniformly with darkness of skin tone, with mean error of ~1 year for the lightest-skinned users, 1.3 years for users with medium-dark skin, and 1.6 years for the darkest-skinned users.
  2. These performance gaps are likely due to the demographics and phenotypic representation of the data used to train the algorithm. If Skin Advisor was originally trained on fewer images of darker-skinned and/or older/younger women, that could explain why it is less accurate for these groups.

    Olay’s Response:

    OLAY is committed to updating the training dataset for the algorithm using consented data.The accuracy of skin age estimates should become more uniform across groups once the training data becomes more balanced on these dimensions. Olay is reviewing the existing training data to assess where more age and skin tone diversity is needed. Once the data has been expanded, the Skin Advisor algorithms will be retrained. This short-term fix is anticipated by December 2021.

  3. “Best” and “improvement” zones are less personalized than intended
    Analysis of historical Skin Advisor data showed that in practice a user’s “best zone” was always either their crows feet or forehead, and their “improvement zone” was always either their mouth, under eye, or cheek. Such limited variation in results suggests Skin Advisor may not be picking up on individual differences as intended. This was found across all users, not just those from certain demographic or phenotypic groups.
  4. Olay’s Response:

    OLAY has already begun reviewing and rebuilding this component to upgrade Skin Advisor to an even more personalized tool. Behind the scenes, this involves both data science to tune the models and consumer research to validate how the model’s outputs align with consumer’s own judgement of their best and improvement zones. This medium-term fix is anticipated by March 2022.

  5. The way skin tone, race/ethnicity and nationality are handled in the lab- and data- science fields need a broad update and more explanation.
    One part of this involves the whole dermatological field; another part concerns Olay explaining its practices.
    • Existing skin classification scales are inadequate for darker- skinned individuals. Two dominant clinical scales -- Fitzpatrick Skin Type and ITA -- are oriented towards light skin. The Fitzpatrick scale was originally developed as a numerical classification of human skin color as a way to estimate the response of different types of skin to UV light. ITA is typically divided into 6 categories ranging from light to dark, and the categories allow for subtler distinctions between lighter than darker skin tones. In short, these scales were built with a focus on lighter skin. To be clear, this concern is not limited to Olay; it is an issue with dermatological science in general.
  6. Olay’s Response:

    There are additional opportunities to share visual understanding of Skin Tone from mobile skin images. Understanding of skin tones via machine learning techniques is an evolving research topic. Current state-of-the-art research includes problematic confusion between ethnicity, skin tone and geography when assigning Skin Tone from a picture alone by machine learning techniques. This problem gets compounded with image capture condition variation (e.g. lighting environment). OLAY has significant expertise in visual understanding of skin tone from images and can help the AI community in advancing machine understanding of skin tone from images collected for research and product development purposes. This insight can extend beyond facial images to inform ongoing work regarding machine learning and potentially dermatological classifications.

  7. Skin Advisor was designed to offer women personalized product recommendations -- not for other audiences or purposes, yet it is being used by other groups of users and in ways beyond the intended scope.
    Performance for these other groups and tasks is generally untested.
    • Built for women, but used more widely. Training data only included images of individuals labelled as women. It did not include images of individuals self-labelled as men, non-binary, gender non-conforming, transgender, or other gender identities. Furthermore, the training data did not include images of people of any gender with visible facial hair. Consequently, inferior performance can be expected for these groups of users. However, some 10% of Skin Advisor users self-identify as men.
    • Not built to track a given user’s skin over time. The conditions of a given selfie (e.g., lighting, camera angle) influence Skin Advisor’s estimates, and consistency between selfies cannot be guaranteed when users take them “in the wild” under varying conditions. Therefore users cannot meaningfully compare Skin Advisor outputs from different times, e.g., to see if their skincare regimen is working. Users we interviewed said they might like to use Skin Advisor in this way.
  8. Olay’s Response:

    Expand Skin Advisor training data to incorporate users of any genders including those with facial hair. This will require expanding the training data to be more inclusive of any gender, as well as with regard to age and skin tone. Additionally, research shows that demographic and phenotypic categories are less reliably assessed from images alone. OLAY will review Skin Advisor usage datasets, understand skin advice needs of users of any gender and explore evolution of the Skin Advisor tool.

  9. Data practices are sufficient and could be even stronger.
    Skin Advisor now includes a consent prompt whereby users must grant Olay the right to use their age, race, skin tone, image, likeness and/or pictures of them for internal research purposes. The concern is whether this system of disclosure, consent, and retention-in-practice aligns with just and ethical data collection practices. For instance, some stakeholders believe people should be allowed to use Skin Advisor without granting these rights. Shifting to consented, self-identified demographic labels will also serve as a best practice model across the beauty industry and other industries. Inspired by OLAY’s Skin Promise, a commitment to zero skin retouching in all advertising, which was also applied to the Decode The Bias (DTB) campaign. DTB spokesperson Joy Buolamwini recommends the creation of a Consented Data Promise that can become an industry model for handling consumer data for commercial AI systems.
  10. Olay’s Response:

    Establish a Consented Data Promise OLAY is committing to continually assessing our data practices and will hold ourselves to the highest standards for data practices. We will review our online consents regularly. OLAY continues to support strong commitments that consider the well-being of consumers and transparency in its practices. This long term commitment will be established in 2022.

Conclusion

Olay has taken an important and public step towards algorithmic accountability. They have a long-term commitment towards continual improvement both in data practices and growing the field of skin analysis and have thereby set a standard for other companies to follow.