Goal:
Build a Recommendation System for the Group Lens Movie Dataset(Explicit ratings) and Audio Scrobbler's music Dataset(Implicit ratings). Understand how recommendation engines work.
Technologies Used: Pyspark
Description:
Two recommendation models were built based on collaborative filtering that rely on explicit and implicit user ratings. We used the Alternate Least Square algorithm to build the recommendation models. Internally, ALS profiles user preferences according to product ratings based on various hidden factors available in the data set. Details of the implementation are abstracted from the user.
The alternate least square algorithm requires the following parameters to come up with recommendations.
a) A Ratings object that has user id, product id and the rating (implicit or explicit), as attributes.
b) Number of hidden factors to be considered
c) Number of iterations to be made over the dataset before making a recommendation
d) Number of recommendations to be made
e) When using a dataset with implicit ratings to make recommendations, a λ value of 0.01 is usually considered.
The models that we built using ALS are briefly explained below
Explicit ratings:
The user ratings ranged from 0 to 5 in the group lens movie dataset.
Code :
import sys
from pyspark import SparkConf, SparkContext
from pyspark.mllib.recommendation import ALS, Rating
def loadMovieNames():
movieNames = {}
with open("ml-100k/u.item") as f:
for line in f:
fields = line.split('|')
movieNames[int(fields[0])] = fields[1]
return movieNames
conf = SparkConf().setMaster("local[*]")
.setAppName("MovieRecommendationsALS")
sc = SparkContext(conf = conf)
sc.setCheckpointDir('checkpoint')
print("\nLoading movie names...")
nameDict = loadMovieNames()
data = sc.textFile("ml-100k/u.data")
ratings = data.map(lambda l: l.split()).
map(lambda l: Rating(int(l[0]),
int(l[1]), float(l[2])))
.cache()
# Build the recommendation model using Alternating Least Squares
print("\nTraining recommendation model...")
rank = 10
# Low numIterations to make sure code
#runs on systems with lower configuration
numIterations = 6
model = ALS.train(ratings, rank, numIterations)
userID =int(7) #Any user ID
print("\nRatings for user ID " + str(userID) + ":")
userRatings = ratings.filter(lambda l: l[0] == userID)
for rating in userRatings.collect():
print (nameDict[int(rating[1])] + ": " + str(rating[2]))
print("\nTop 10 recommendations:")
recommendations = model.recommendProducts(userID, 10)
for recommendation in recommendations:
print (nameDict[int(recommendation[1])] + \
" score " + str(recommendation[2]))
Movie Recommendations Output:
Implicit ratings:
The Audio Scrobbler music dataset was used to build a system that recommends music artists to users,
based on the indirect ratings, which is, the number of times users have listened to songs from artists.
from pyspark import SparkConf, SparkContext
from pyspark.mllib.recommendation import Rating,ALS
conf = SparkConf().setMaster("local").setAppName("ReccommendationSystem")
sc = SparkContext(conf = conf)
uadatapath="C:/Users/Yash/Documents/Projects/ \
Spark/RecommendationSystem/user_artist_data.txt"
rawUserArtistData = sc.textFile(uadatapath)
print (rawUserArtistData.map(lambda x:float(x.split(" ")[2])).stats())
uaData=rawUserArtistData\
.map(lambda x:x.split(" "))\
.filter(lambda x: float(x[2])>=20)\
.map(lambda x:Rating(x[0],x[1],x[2]))
uaData.persist()
print (uaData.take(10))
model=ALS.trainImplicit(uaData,10,5,0.01)
user = 1000002 #Give the user ID
recommendations=model.recommendProducts(user,5)
print (recommendations)
artistsPath="C:/Users/Yash/Documents \
/Projects/Spark/RecommendationSystem/artist_data.txt"
artistLookup=sc.textFile(artistsPath).map(lambda x:x.split("\t"))
artistLookup.persist()
userArtists=rawUserArtistData\
.map(lambda x:x.split(" "))\
.filter(lambda x:int(x[0])==user and int(x[2])>50)\
.map(lambda x:x[1]).collect()
print("-------------Artists that the \
user is interested in------------------")
for artist in userArtists:
print (artistLookup.lookup(artist))
print("---------------------Recommendations based on \
the user's interest-------------------")
for rating in recommendations:
print (artistLookup.lookup(str(rating.product)))
Artist recommendations for user id: 1000002
Conclusion:
In this report, the implementation of two recommendation systems was explained based on the user explicit and implicit ratings.
This system can be modified and used not just for Music but also other products like Movies, Airlines, Rent accommodations etc.