Programmatically Identifying Political Media (Posted on November 2nd, 2020)

With the US election coming up, the amount of political media in my face has reached an all time high. In fact this is a common complaint I hear among people I talk to, even those not in the US. So I thought to myself what if there was a way to identify political media before I ever got to the content...Keep that thought in mind as you read through this post and think how you or someone else could use this technology. The below was an exploration for me in to the current state of open source facial and text recognition libraries and a look at just how crazy good they are.

* Note: The content that follows isn't meant to represent any political opinion, it's just simply used as sample content

Finding Text on an Image

Often times political media will simply just be a tweet or a tweet and a witty reply. This type of content is the perfect type of media for optical character recognition (OCR) due to the level of contrast between the background and the text. Based on my tests, Google Cloud and Microsoft Azure rule the closed source, proprietary world. However, the focus of this project will be strictly open source projects. Trying to find open source OCR software that isn't named Tesseract is actually quite the challenge. It's been around since the late 80s/early 90s, was open sourced in 2005, and as of 2006 has been under active development by Google.

Running the Tesseract v5 alpha on this photo:

Yield the following text:

tl #TakeBackTheSouth @ & @ Retweeted Mark Hamill @ @HamillHimself - 4h She says watching @JoeBiden feels like she's watching Mr. Rogers* as if being intelligent, thoughtful, decent & inclusive is a BAD thing. | like presidents who aren't unhinged, conspiracy-minded, white- nationalist fans. *misspells his name, of course @£ Mercedes Schlapp @ @mer... -10h Well @JoeBiden @ABCPolitics townhall feels like | am watching an episode of Mister Rodgers Neighborhood. OTE WW AOKY/ \ORERI it,

Overall not too bad. The important parts are that the text that has political context translated nicely. We'll see how we can use this to our advantage later.

Recognizing Who a Person is in an Image

I thought this would be the hardest part of the project. I thought I'd need tons of images of everyone I wanted to identify and would severely hamper the amount of people I could identify. Not to mention a lot of content doesn't have the person looking directly at the camera nor would anyone confuse the picture of being good quality. This is where the state of open source really surprised me. I originally started with Face API, but wasn't able to get it to do exactly what I wanted. From there I turned my gaze to the language of machine learning, Python. I landed on Face Recognition which had everything I needed, great examples, worked on Windows, and tons of info.

If you're curious about how the Face Recognition software works at a high level I highly recommend the blog post from the creator. The concepts behind it are also how Face API and many other facial recognition libraries work.

The cool part about all this is that you can take a single training image like this:

and come out with a model that can recognize Donald Trump like this:

identifytrump

Recognizing Text and People in Videos

Understanding context in animated content is much harder than doing static images. However, the easiest way to solve a problem is to break it down in to smaller, more solvable problems. The way to do this with videos and GIFs is to think of what they really are. They're just static images played one after another at some specified rate. Thus, we can apply our same static image technique to a video or animated GIF and just repeat it multiple times. The downside is that we end up doing a lot of analysis on a video even if no new people or text make their way on to the screen for quite awhile. We'll look at some ways that we can speed this up later on. If you're not interested in the code you can click here to skip to the end results.

Identifying if an Image is Political

In order to simplify things we'll say that if an image contains Donald Trump or Joe Biden it is considered political. However, we could add any number of people to this algorithm and it would work just fine. If you want to increase the likelyhood of identifying a particular person you can load multiple images of them and just give them the same label in "known_face_names". A lot of the code is inspired by the examples from the Face Recognition examples repos; I highly recommend checking out that section. To start with our example we'll use these 3 images:

We'll recognize them using the Face Recognition library. To build encodings for each face in our known set we'll load the images locally

biden_image = face_recognition.load_image_file('biden.jpg')
trump_image = face_recognition.load_image_file('trump.jpg')
unknown_image = face_recognition.load_image_file('obamabiden.jpg')

# Our known images only have one face so just grab the first index
biden_face_encoding = face_recognition.face_encodings(biden_image)[0]
trump_face_encoding = face_recognition.face_encodings(trump_image)[0]

known_face_encodings = [
    biden_face_encoding,
    trump_face_encoding
]

known_face_names = [
    'Biden',
    'Trump'
]

If you wanted to load it remotely you could use the following code:

import io
import requests
from PIL import Image

url = requests.get(url)
img = Image.open(io.BytesIO(url.content))

From there we need to identify all the faces in our unknown image. Once we've identified all our faces we need to compare each one to the ones that we know about. If one looks similar to a known person then we'll mark the face as such. The tolerance level recommended by Face Recognition is 0.6 or less. As you get closer to 0 you're more likely to have found a match. This is how we'll do the matching:

tolerance = 0.6

# Locate the faces in our unknown image
face_locations = face_recognition.face_locations(unknown_image)
face_encodings = face_recognition.face_encodings(unknown_image, face_locations)

# Try to identify each found face based on our list of known faces
for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
    name = 'Unknown'

    # Find the closest known match to our unknown face. 0 represents an exact match
    face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
    best_match_index = np.argmin(face_distances)
    if face_distances[best_match_index] <= tolerance:
        name = known_face_names[best_match_index]
    print(name)

The above will tell us if we've found a known match or not. We can bring in PIL to draw our fancy boxes around each if desired. However, if we're just trying to find if something is political or not we can simply return after finding a face less than our tolerance without looking at every face. To show how to draw the boxes though our final code for facial recognition would look like this:

import face_recognition
from PIL import Image, ImageDraw
import numpy as np

biden_image = face_recognition.load_image_file('biden.jpg')
trump_image = face_recognition.load_image_file('trump.jpg')
unknown_image = face_recognition.load_image_file('obamabiden.jpg')

# Our known images only have one face so just grab the first index
biden_face_encoding = face_recognition.face_encodings(biden_image)[0]
trump_face_encoding = face_recognition.face_encodings(trump_image)[0]

known_face_encodings = [
    biden_face_encoding,
    trump_face_encoding
]

known_face_names = [
    'Biden',
    'Trump'
]

tolerance = 0.6

# Locate the faces in our unknown image
face_locations = face_recognition.face_locations(unknown_image)
face_encodings = face_recognition.face_encodings(unknown_image, face_locations)

pil_image = Image.fromarray(unknown_image)
draw = ImageDraw.Draw(pil_image)

# Try to identify each found face based on our list of known faces
for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
    name = 'Unknown'

    # Find the closest known match to our unknown face. 0 represents an exact match
    face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
    best_match_index = np.argmin(face_distances)
    if face_distances[best_match_index] <= tolerance:
        name = known_face_names[best_match_index]

    # Draw a box around the face using the Pillow module
    draw.rectangle(((left, top), (right, bottom)), outline=(0, 0, 255))

    # Draw a label with a name below the face
    text_width, text_height = draw.textsize(name)
    draw.rectangle(((left, bottom - text_height - 10), (right, bottom)), fill=(0, 0, 255), outline=(0, 0, 255))
    draw.text((left + 6, bottom - text_height - 5), name, fill=(255, 255, 255, 255))

# Display the resulting image
pil_image.show()    

The result is pretty cool. We're able to detect Biden among a large number if people and properly label everyone else as unknown.

obamabiden2

To recognize text in our image we can use Tesseract to translate our image to text as shown earlier. To demo this we'll use the Mark Hamill tweet from earlier. The code for reading the text on an image is as follows:

from PIL import Image
import pytesseract
import re

words = ['biden', 'trump']
regex = re.compile('|'.join(words), re.IGNORECASE)

text = pytesseract.image_to_string(Image.open('bidentext.jpg'))
is_political = bool(regex.search(text))
print(is_political)

The code above uses a simple algorithm to determine if the text in an image is political or not. We're simply using Tesseract to find all the text in the image and then searching that text for the string "biden" or "trump" in a case insensitive manner. From my experience trying to find context on shorter phrases such as a tweet is very challenging. If I was looking at this from a news article perspective I may have recommended a different approach. The main downside of the current text match is that we'd label this string ("Sally trumps Jimbo at table tennis") as political. We'd also match on people talking about video game streamer TrumpSC who commonly goes by Trump. For my purposes though I am ok with the false positives.

If we combine the above code with the facial recognition code we now have a pretty solid way to detect if a static image is political. Let's take a quick look at how we can apply the same concepts to animated images such as GIFs and videos.

Identifying if a GIF/Video is Political

Let's start with this short video clip that has both political text and a political person as defined by our previous examples. You can download it here. Loading a video is a little different than loading an image, but the principle of what we're trying to achieve is the same. To load the video we'll use Open CV. Here's how to get a frame as an image for use with our previous code:

# cv2.VideoCapture will also automatically download remote files
input_movie = cv2.VideoCapture("trumprally.mp4")

while True:
    ret, frame = input_movie.read()

    if not ret:
        break

    # Convert the image from BGR color (which OpenCV uses) to RGB color (which face_recognition uses)
    rgb_frame = frame[:, :, ::-1]

    # Find all the faces and face encodings in the current frame of video
    face_locations = face_recognition.face_locations(rgb_frame)
    face_encodings = face_recognition.face_encodings(rgb_frame, face_locations)

From there you know how to do the rest since it is the same as a static image. Once again you can just think of a video as series of static images that are just played one after another at some frame rate. You may notice that videos are way slower to process than static images though. There are two simple ways to improve this case. The first is to use Nvidia's CUDA library if you have an Nvidia GPU that supports it. Offloading some of the math required to detect faces to your GPU which is likely faster at the operation makes for significant speed ups. Don't worry if you don't have a support CUDA GPU you can still test the code outlined in this post as a run only takes a few seconds. The second way to speed up your computation is to not look at every frame. You can skip frames by doing something like this:

fps = int(input_movie.get(cv2.CAP_PROP_FPS))
frame_number = 0

while True:
    ret, frame = input_movie.read()
    frame_number += 1

    if frame_number % fps != 0 and frame_number != 1:
        continue

A full scale implementation, complete with label boxes, might look something like this:

import face_recognition
import cv2
import pytesseract
import numpy as np

# Open the movie and get some data about it
input_movie = cv2.VideoCapture("trumprally.mp4")
length = int(input_movie.get(cv2.CAP_PROP_FRAME_COUNT))
width = int(input_movie.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(input_movie.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(input_movie.get(cv2.CAP_PROP_FPS))

# Create an output movie file
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
output_movie = cv2.VideoWriter('output.mp4', fourcc, fps, (width, height))

biden_image = face_recognition.load_image_file('biden.jpg')
trump_image = face_recognition.load_image_file('trump.jpg')

biden_face_encoding = face_recognition.face_encodings(biden_image)[0]
trump_face_encoding = face_recognition.face_encodings(trump_image)[0]

known_face_encodings = [
    biden_face_encoding,
    trump_face_encoding
]

known_face_names = [
    'Biden',
    'Trump'
]

tolerance = 0.6
frame_number = 0

while True:
    # Grab a single frame of video
    ret, frame = input_movie.read()
    frame_number += 1

    # Speed up the computation
    # if frame_number % fps != 0 and frame_number != 1:
    #     continue

    # Quit when the input video file ends
    if not ret:
        break

    # Convert the image from BGR color (which OpenCV uses) to RGB color (which face_recognition uses)
    rgb_frame = frame[:, :, ::-1]

    face_locations = face_recognition.face_locations(rgb_frame)
    face_encodings = face_recognition.face_encodings(rgb_frame, face_locations)

    face_names = []
    for face_encoding in face_encodings:
        face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
        best_match_index = np.argmin(face_distances)

        # Check if video likely contains one of our political people
        if face_distances[best_match_index] <= tolerance:
            name = known_face_names[best_match_index]
            face_names.append(name)

    # Label the results
    for (top, right, bottom, left), name in zip(face_locations, face_names):
        if not name:
            continue

        # Draw a box around the face
        cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2)

        # Draw a label with a name below the face
        cv2.rectangle(frame, (left, bottom - 25), (right, bottom), (0, 0, 255), cv2.FILLED)
        font = cv2.FONT_HERSHEY_DUPLEX
        cv2.putText(frame, name, (left + 6, bottom - 6), font, 0.5, (255, 255, 255), 1)

    # Write the resulting image to the output video file
    print("Writing frame {} / {}".format(frame_number, length))
    output_movie.write(frame)

    # Text analysis
    # text = pytesseract.image_to_string(rgb_frame)
    # print(text)

input_movie.release()
cv2.destroyAllWindows()    

Our output for the above code looks like this:

For tackling the GIF case we'll use this GIF version of the previous video https://i.ibb.co/mDv4WFd/trump.gif. We can download the GIF remotely and then loop through every frame and run our pollitical detection code on it. To simplify things we'll just do the Tesseract bit for OCR.

import io
import requests
from PIL import Image
import pytesseract
import re

words = ['biden', 'trump']
regex = re.compile('|'.join(words), re.IGNORECASE)


def process_image(url):
    url = requests.get(url)
    img = Image.open(io.BytesIO(url.content))
    is_political = False

    for frame in range(0,img.n_frames):
        img.seek(frame)
        imgrgb = img.convert('RGBA')
        text = pytesseract.image_to_string(imgrgb)
        is_political = bool(regex.search(text))

        if is_political:
            break
    
    return is_political

print(process_image('https://i.ibb.co/mDv4WFd/trump.gif'))

You'll notice this runs a bit slow. In order to speed things up, like our video example, we'll only check every 30 frames. GIFs don't have frame rates in the same way that videos do. Each GIF frame can stay on screen for a configurable amount of time that is encoded in to the GIF itself. A more complex example could take this in to account and try to aim for a frame every 30 seconds. I'll leave that as an exercise for you though :)

Results

Here are a few more test images and their results:

The above is Alex Baldwin and Jim Carrey impersonating Donald Trump and Joe Biden respesctively on an SNL skit. Maybe Jim Carrey has a shot at the Joe Bidden documentary.

obamabiden2

In this photo we're able to detect Biden in the background of the image even though his face is slightly blurred.

biden1

Here we're able to successfully detect Biden and label his colleagues as unknown.

The above correctly detects Trump's face and labels Obama unknown as expected, but has the slight downside of over detecting faces. A more advanced algorithm could likely do better on the depth perception and realism bit.

This video shows us being able to track Trump's face while he moves it left, right, and center.

Last but not least is a 10x speed version of a Biden townhall clip where we are able to track him as his face moves across the screen.

As you can see the algorithm we've come up with works pretty well and is very simple to implement. You don't need to know much about programming or machine learning to implement it either. Thus it begs the questions, could there be people out there who are using simple tech like this to try and programmatically influence people's opinion of a candidate in an election?

Combined with sentiment analysis to determine if the text is positive or negative about a candidate you can really do a lot with a project like this. Often times people attribute such things to state actors, but really it could be anyone on the internet with a passing interest in an election. Additionally plenty of social platforms out there order content by likes rather than purely chronologically. Some platforms will even hide content if it gets too many reports. If someone sees a candidate in a local race with more likes than another they may be more inclined to vote for said candidate. While it may not be a good reason to vote for someone, a good reason isn't neccesarily required to vote for someone.

I'm curious to hear your thoughts on the matter and the state of facial and character recognition. Join in on the discussion at one of the below

Cheers!

Tags: Python, Machine Learning