Skip to content
MohtaMayank edited this page Oct 6, 2013 · 4 revisions

Scalable Recommender Systems in GraphChi


Team Members: 1. Mayank Mohta (mmohta@andrew.cmu.edu) 2. Shu Hao (shuhaoy@andrew.cmu.edu)




Abstract: Recommender systems have been a topic of active research both in industry and academia. Hence, there exists a lot of different approaches and algorithms for building recommender systems for different use cases. GraphChi [1] is a part of the GraphLab [2] project at CMU and it is a system to enable graph computation on very large graphs using just a single computer. GraphChi is under active development by Aapo Kyrola (a phd student at CMU working with Guy Blelloch and Carlos Guestrin) and is the topic of his phd research [3]. In this project I plan to use GraphChi system to try and build a java toolkit which comprises of different recommender system algorithms. The motive is to have a single system which can enable users to try out many of the available algorithms for providing recommendations and ascertain which one is the most suitable for their use case.

Method of attack: As a first step, I plan to read and understand the existing research on different algorithms for recommendation. Based on this background study I plan to implement a few of these algorithms for the graphchi-java system. Danny Bickson (a formed project scientist at CMU) has implemented a lot of these algorithms in the graphlab toolkit as well as for the C++ version of GraphChi [4] [5]. I believe building a similar toolkit for graphchi-java will be a good tangible goal for the first half of the project. This will help in providing some experience with the GraphChi system as well as would be contribution towards graphchi-java’s recommender system’s toolkit. Once this toolkit has a good set of algorithms implemented, the project can be taken forward towards optimizing algorithms / easy creation of ensembles.

Expected Results:

  1. A graphchi-java toolkit for recommender system algorithms.
  2. Possible extensions like optimizing for efficient creation of ensembles, supporting dynamic data for vertices in underneath GraphChi platform, implementing other graph / machine learning algorithms in GraphChi Java.

    Rough Timeline: Following is what I envision as a rough timeline for the first half of the project. I think this might change based on the progress we make in understanding the research papers and GraphChi system.

    Week 1
  3. Read research papers and go through existing implementation of recommender system algorithms. Decide on 3 algorithms to work upon. (Mayank to work on ALS, SVD++ and Bayesian Probabilistic Matrix Factorization. Shu Hao to work on Weighted ALS, bias-SGD and Restricted Boltzmann machines.)
  4. Work with professor and other students in VLIS to form a team and better define the project.
  5. Write up a rough vision document.
  6. Set up Github repository, give suitable permissions. (Forked from original GraphChi java and created a repository here: https://github.com/MohtaMayank/graphchi-java )
  7. Discuss with the other team working on building a cloud based platform for GraphChi.

Weeks 2-3:

  1. Get familiar with GraphChi system
  2. Implement the algorithms decided in graphchi-java.
  3. Collect different datasets and test the implemented algorithms.
  4. Review code and iterate through it to make it of good quality.

Weeks 4-5:

  1. Make sure that algorithms implemented work on large datasets.
  2. Based on the experience gained from implementing algorithms, have design discussion on how the unify the input data format and make it more generic. Refactor implemented algorithms accordingly.
  3. Pick up 2-3 more algorithms. Understand their working and start implementation under the new design.

Week 6

  1. Review progress on the graphchi-java toolkit and the project.
  2. Midterm presentation.
  3. Decide on the deliverables for the second half of the project.

Timeline for the second half of the project needs to be decided based on the progress and difficulties faced during phase 1.

References:
[1] https://github.com/GraphChi
[2] http://graphlab.org/
[3] http://www.cs.cmu.edu/~akyrola/files/kyrola_thesisproposal.pdf
[4] http://graphlab.org/toolkits/collaborative-filtering/
[5] http://bickson.blogspot.com/2012/12/collaborative-filtering-with-graphchi.html

Clone this wiki locally