We are an entire team of music lovers. Beyond music though, we're particularly passionate about
understanding and
visualizing recommendation algorithms in order to understand how they work. Our motivation for this project
brings these
two passions together. Our project is involved in the area of recommendation systems and algorithms. We want
to allow
our users to be able to find song recommendations, as well as understand what makes the songs similar in
order to make
on the spot judgements to create a creative linking throughout the work.
Our tool has two primary use cases for this domain area and problem: playlist development for hobbyists or
producers and
live DJing for hobbyist or professional DJs. Producers of playlists or albums need to create a track
ordering that tells
a story or creates a seamless flow. To do this, they need to understand the similarity between songs and
what drives
that similarity in order to create an effective ordering. Alternatively, the role of a DJ is to make these
judgements on
the fly. Without an AI tool or automation, the DJ needs to rely on his own awareness of song options and
understanding
of their dynamics from memory. With our visualization tool, a DJ would be able to link songs together over
time, make
snap judgement decisions on which song to choose next, and see what trajectory his performance has taken
over time.
Background
The original dataset was collected from the Free Musical Archive ("FMA"). FMA offers free access to open
licensed,
original music by independent artists around the world. Our tool visualizes different song tracks and
recommended
songs based on similarity of different attributes like valence, tempo and danceability. We used Python to
clean and
prepare our data. In order to find the similarity between songs, we decided to use the Euclidean distance
method via
Scikit-Learn to obtain an adjacency matrix for all nodes/songs. We then found top 6 matches and converted
the data
to a json file.
The following are potential issues with our dataset. Data sampling might not be accurate for our population,
presenting biases as only independent artists who are willing to upload their music for free. As a result,
the bias
is that any music that is privatized will not be present. Some data quality issues are that some tracks do
not have
genres associated with them and there are multiple tracks with the same names. There are also some erroneous
lines
in the csv file that had to be removed. To correct this, we had to manually delete such rows from the csv
files.
Moreover, outliers in the data consist of six songs that are shorter than two seconds.