Audio-visual sentiment analysis for learning emotional arcs in movies and predicting audience engagement
An overview article by McKinsey on “AI in storytelling: Machines as cocreators”.
ICDM 2017 short paper: PDF
ICCV 2017 workshop paper (slightly longer, but contains more detail): PDF
Eric Chu and Deb Roy. “Audio-visual Sentiment Analysis for Learning Emotional Arcs in Movies”. Data Mining (ICDM), 2017 IEEE 17th International Conference on. IEEE, 2017.
We provide two datasets for public use.
The Spotify dataset contains a number of audio features such as valence, energy, speechiness, etc. for over 600,000 songs. The full list of features and their descriptions can be found on the Spotify developer site. Links to the 30-second song samples are provided and can be retrieved using the Spotify API. These songs can also be cross referenced with the Million Song Dataset (MSD) for further features.
The Movie clip dataset contains emotional labels for ~1000 30-second clips, taken from ~175 movies. Downloadable links to the movie clips are provided within the csv files. Each clip is annotated by 3 crowdsourced workers, answering the following questions:
1. How positive or negative is this video clip? (1 being most negative, 7 being most positive)
2. How confident are you in your previous answer? (1 being least confident, 10 being most confident)
3. Which emotion(s) does this video clip contain or convey? (check all that apply or none of the above)
- Options: anger, anticipation, disgust, fear, joy, sadness, surprise, trust, none of the above
4. Which of the following contributed to your decisions? (check all that apply)
- Options: audio, dialogue, visual (actions, scene, setting)
Variety: A Team of MIT Scientists Taught an AI to Get Emotional Over Movies
TechRadar: How artificial intelligence is creating new ways of storytelling
New York Times: The Remote Control, Out of Control: Why à la Carte TV Is Too Much for a Trekkie
Have a question or comment? Please contact us at echu [at] mit [dot] edu