Pyreclab : Recommendation lab for Python
Pyreclab is a recommendation library designed for training recommendation models with a friendly and easy-to-use interface, keeping a good performance in memory and CPU usage.
In order to achieve this, Pyreclab is built as a Python module to give a friendly access to its algorithms and it is completely developed in C++ to avoid the lack of performace of the interpreted languages.
At this moment, the following recommendation algorithms are supported:
Rating Prediction
User Avgerage
Item Average
Slope One
User Based KNN
Item Based KNN
Funk's SVD
Item Recommendation
Although Pyreclab can be compiled on most popular operating system, it has been tested on Linux Ubuntu 16.04 .
1.- Before start, verify you have libboost-dev installed on your system. If not, install it using your favorite package manager.
$ sudo apt-get install libboost-dev
2.- Clone the source code of Pyreclab in a local directory.
$ git clone https://github.com/gasevi/pyreclab.git
3.- Build the Python module ( default: Python 2.7 ).
$ cd pyreclab
$ cmake .
$ make
By default, it will be compiled for Python 2.7 . If you want to build it for Python 3 , you can execute the following steps:
$ cd pyreclab
$ cmake -DPY_MAJOR_VERSION=3 .
$ make
4.- Install Pyreclab.
Pyreclab provides the following classes for representing each of the recommendation algorithm currenly supported:
> >> from pyreclab import < RecAlg >
or import the entire module as you prefer
Afer that, to create an instance of any of these clases, you must provide a dataset file with the training information, which must contain the fields user_id , item_id and rating .
The following example shows the generic format for creating one of these instances.
> >> obj = pyreclab .RecAlg ( dataset = filename ,
dlmchar = b'\t ' ,
header = False ,
usercol = 0 ,
itemcol = 1 ,
ratingcol = 2 )
Where RecAlg represents the recommendation algorithm chosen from the previous list, and its parameters are presented in the next table.
Parameter
Type
Default value
Description
dataset
mandatory
N.A.
Dataset filename with fields: userid, itemid and rating
dlmchar
optional
tab
Delimiter character between fields (userid, itemid, rating)
header
optional
False
Whether dataset filename contains a header line to skip
usercol
optional
0
User column position in dataset file
itemcol
optional
1
Item column position in dataset file
rating
optional
2
Rating column position in dataset file
Due to the different nature of each algorithm, their train methods can have different parameters. For this reason, they have been described for each class as shown below.
> >> prediction = obj .predict ( userId , itemId )
Parameter
Type
Default value
Description
userId
mandatory
N.A.
User identifier
itemId
mandatory
N.A.
Item identifier
> >> predictionList , mae , rmse = obj .test ( input_file = testset ,
dlmchar = b'\t ' ,
header = False ,
usercol = 0 ,
itemcol = 1 ,
ratingcol = 2 ,
output_file = 'predictions.csv' )
Parameter
Type
Default value
Description
input_file
mandatory
N.A.
Testset filename
dlmchar
optional
tab
Delimiter character between fields (userid, itemid, rating)
header
optional
False
Dataset filename contains first line header to skip
usercol
optional
0
User column position in dataset file
itemcol
optional
1
Item column position in dataset file
rating
optional
2
Rating column position in dataset file
output_file
optional
N.A.
Output file to write predictions
> >> prediction = obj .predict ( userId , itemId )
Parameter
Type
Default value
Description
userId
mandatory
N.A.
User identifier
itemId
mandatory
N.A.
Item identifier
> >> predictionList , mae , rmse = obj .test ( input_file = testset ,
dlmchar = b'\t ' ,
header = False ,
usercol = 0 ,
itemcol = 1 ,
ratingcol = 2 ,
output_file = 'predictions.csv' )
Parameter
Type
Default value
Description
input_file
mandatory
N.A.
Testset filename
dlmchar
optional
tab
Delimiter character between fields (userid, itemid, rating)
header
optional
False
Dataset filename contains first line header to skip
usercol
optional
0
User column position in dataset file
itemcol
optional
1
Item column position in dataset file
rating
optional
2
Rating column position in dataset file
output_file
optional
N.A.
Output file to write predictions
prediction = obj .predict ( userId , itemId )
Parameter
Type
Default value
Description
userId
mandatory
N.A.
User identifier
itemId
mandatory
N.A.
Item identifier
predictionList , mae , rmse = obj .test ( input_file = testset ,
dlmchar = b'\t ' ,
header = False ,
usercol = 0 ,
itemcol = 1 ,
ratingcol = 2 ,
output_file = 'predictions.csv' )
Parameter
Type
Default value
Description
input_file
mandatory
N.A.
Testset filename
dlmchar
optional
tab
Delimiter character between fields (userid, itemid, rating)
header
optional
False
Dataset filename contains first line header to skip
usercol
optional
0
User column position in dataset file
itemcol
optional
1
Item column position in dataset file
rating
optional
2
Rating column position in dataset file
output_file
optional
N.A.
Output file to write predictions
Parameter
Type
Default value
Description
knn
optional
10
K nearest neighbors
> >> prediction = obj .predict ( userId , itemId )
Parameter
Type
Default value
Description
userId
mandatory
N.A.
User identifier
itemId
mandatory
N.A.
Item identifier
> >> predictionList , mae , rmse = obj .test ( input_file = testset ,
dlmchar = b'\t ' ,
header = False ,
usercol = 0 ,
itemcol = 1 ,
ratingcol = 2 ,
output_file = 'predictions.csv' )
Parameter
Type
Default value
Description
input_file
mandatory
N.A.
Testset filename
dlmchar
optional
tab
Delimiter character between fields (userid, itemid, rating)
header
optional
False
Dataset filename contains first line header to skip
usercol
optional
0
User column position in dataset file
itemcol
optional
1
Item column position in dataset file
rating
optional
2
Rating column position in dataset file
output_file
optional
N.A.
Output file to write predictions
Parameter
Type
Default value
Description
knn
optional
10
K nearest neighbors
> >> prediction = obj .predict ( userId , itemId )
Parameter
Type
Default value
Description
userId
mandatory
N.A.
User identifier
itemId
mandatory
N.A.
Item identifier
> >> predictionList , mae , rmse = obj .test ( input_file = testset ,
dlmchar = b'\t ' ,
header = False ,
usercol = 0 ,
itemcol = 1 ,
ratingcol = 2 ,
output_file = 'predictions.csv' )
Parameter
Type
Default value
Description
input_file
mandatory
N.A.
Testset filename
dlmchar
optional
tab
Delimiter character between fields (userid, itemid, rating)
header
optional
False
Dataset filename contains first line header to skip
usercol
optional
0
User column position in dataset file
itemcol
optional
1
Item column position in dataset file
rating
optional
2
Rating column position in dataset file
output_file
optional
N.A.
Output file to write predictions
> >> obj .train ( factors = 1000 , maxiter = 100 , lr = 0.01 , lamb = 0.1 )
Parameter
Type
Default value
Description
factors
optional
1000
Number of latent factors in matrix factorization
maxiter
optional
100
Maximum number of iterations reached without convergence
lr
optional
0.01
Learning rate
lamb
optional
0.1
Regularization parameter
> >> prediction = obj .predict ( userId , itemId )
Parameter
Type
Default value
Description
userId
mandatory
N.A.
User identifier
itemId
mandatory
N.A.
Item identifier
> >> predictionList , mae , rmse = obj .test ( input_file = testset ,
dlmchar = b'\t ' ,
header = False ,
usercol = 0 ,
itemcol = 1 ,
ratingcol = 2 ,
output_file = 'predictions.csv' )
Parameter
Type
Default value
Description
input_file
mandatory
N.A.
Testset filename
dlmchar
optional
tab
Delimiter character between fields (userid, itemid, rating)
header
optional
False
Dataset filename contains first line header to skip
usercol
optional
0
User column position in dataset file
itemcol
optional
1
Item column position in dataset file
rating
optional
2
Rating column position in dataset file
output_file
optional
N.A.
Output file to write predictions
Parameter
Type
Default value
Description
topN
optional
10
Top N items to recommend
> >> ranking = obj .recommend ( userId )
Parameter
Type
Default value
Description
userId
mandatory
N.A.
User identifier
> >> predictionList , mae , rmse = obj .test ( input_file = testset ,
dlmchar = b'\t ' ,
header = False ,
usercol = 0 ,
output_file = 'predictions.csv' )
Parameter
Type
Default value
Description
input_file
mandatory
N.A.
Testset filename
dlmchar
optional
tab
Delimiter character between fields (userid, itemid, rating)
header
optional
False
Dataset filename contains first line header to skip
usercol
optional
0
User column position in dataset file
output_file
optional
N.A.
Output file to write predictions
Extend items recommendation methods to rating prediction algorithms.
Add ranking evaluation metrics.
Extend support for other operating systems like Mac OS X and Windows .