Progress Document

Weekly Progress

Week 1 (September 9 - September 15)
1. Talk to people about the project idea. Meet with interested folks (Mayank)
2. Set up Github repository.
3. Set up development environment.
  (No Meeting)
Week 2 (September 16 - September 22)
1. Background work related to GraphChi and GraphLab. See sample code.
2. Background work related to recommendation systems.
3. Read research papers (SVD++, ALS, LibFM)
4. Divide algorithms to be implemented by Shu Hao and Mayank in the next 2 weeks.
5. Meeting Notes (September 22)
  - Setup github repository.
  - Decide which algorithms we are going to implement. Update each other about the background work done so far.
  - Meet with the other potential sub-team (Yuchen, Chun Chen) for a project related to web based framework for creating ensembles / performing evaluation using cloud infrastructure.
Week 3 (September 23 - September 30)
1. Start implementation of SVD++ using Stochastic Gradient descent (Mayank)
2. Start implementation of Bayesian Probabilistic Matrix Factorization (Mayank)
3. Collect test datasets. (Mayank)
4. Write vision document.
5. Meeting Notes (September 30)
  - Setup Instruction about how to set up project in eclipse/maven
  - Discussion about whether we use some better way to measure Metrics? How to evaluate the performance of our system?
  - Discuss bugs relating to MCMC method.
Week 4 (October 1 - October 7)
1. Resolve certain bugs with SVD++ and BPMF implementation (Mayank)
2. Explore the Coda Hale's Metrics library to see if it can be used. (Mayank)
3. Read about LibFM's MCMC implementation and started implementation. (Mayank)
4. Started this wiki documentation to better document and track progress. (Mayank and Shu Hao)
5. Meeting Notes 10/6 (Sun) 1pm
  - Where should we put parameters in SVD? The c++ version is to put in the global memory, while we are thinking the possibility to write in vertex datatype. Decided to contact Aapo to get better understanding.
  - Race Condition Discussed about how GraphLab provides different consistence models. However, GraphChi has only one consistency model where in changing parameter of adjacent vertices might result in a race condition.
  - Stopping condition For c++ version, the user has to specify the number of iteration. Can we use some criteria to decide whether to stop in every round?
  - Serializing a model Discussed about how we can serialize a model? For fixed point estimates (MAP), it is easy to serialize the model parameters. However, for sampling based methods this might be difficult / expensive. Send an email to Danny about his views.
  - Discussion about wiki page
Week 5 (October 8 - October 14)
1. Skype meeting with Aapo to clarify a few things. Suggestions to refactor the code to use memory efficient data structures and also generic way to define parameters.
2. Refactored code as suggested for ALS, BPMF and SVD++.
3. Made some progress on LibFM, still incomplete as of Oct. 13.
4. Meeting Notes
  - Missed Formal meeting on Sunday as team members were travelling for interviews
  - Held meeting on Tuesday (10/15). Shu Hao clarified some queries about Bias SGD implementation.
  - Mayank explained some parts of LibFM MCMC implementation (Later to realize that it was completely wrong)
  - Decided to meet more often to write the plan.
Week 6 (October 15 to October 22)
1. Implemented LibFM SGD and had some discussions about LibFM MCMC.
2. Discussed about Data representation.
3. Started writing plan documentation.
Week 7 (October 23 to October 30)
1. Plan and Vision Document completed. Midterm Presentation.
2. Standardized Framework for modelling data designed. Started implementation.
3. Decided upon AggregateRecommender model to run multiple algorithms in parallel based on available memory.
Week 8 (November 1 to November 7)
1. Completed Data API.
2. Completed work on how to represent the model and initialize different programs using JSON files.
3. Completed implementation of Aggregate Recommender which enables running multiple algorithms.
4. Ported ALS, SVDPP, LibFM_SGD and Bias_SGD to the new framework. With the new framework in place, all of these can run in a single GraphChi instance.
5. Started exploring Apache YARN (Map Reduce 2.0) to run multiple versions of GraphChi.
6. Setup a single node YARN cluster.
7. Work on serialization and deserialization of the GraphChi Model.
Week 9 (November 8 to November 14)
1. Understood working of YARN.
2. Initial implementation of YARN's application master and client.
3. YARN multi-node cluster setup on AWS.
4. Automated the process of setting up cluster on AWS (python script using boto and fabric libraries)
5. Worked on Code reviews.
Week 10 (November 15 to November 21)
1. Started memory analysis of GraphChi to ensure better scheduling of the program. Used Memory Analyzer initially for profiling.
2. Got access to YourKit java profiler which is much better. Learned how to use it and started analyzing heap dumps on large pieces of data.
3. Profiling required understanding of the internals of GraphChi. Discussion with Aapo and understanding how the graph is represented in a compressed form in GraphChi.
4. Implemented the logic for Recommender Pool which allows pipe-lining different runs of algorithms and thus saves IO costs.
5. Merged Serialization and Deserialization code.
Week 11 (November 22 to November 29)
1. Created AWS AMI for easier YARN deployment.
2. Code for scheduling with better memory estimates.
3. Code cleanup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Progress Document

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally