forked from GraphChi/graphchi-java
-
Notifications
You must be signed in to change notification settings - Fork 1
Progress Document
MohtaMayank edited this page Nov 25, 2013
·
14 revisions
Weekly Progress
-
Week 1 (September 9 - September 15)
- Talk to people about the project idea. Meet with interested folks (Mayank)
- Set up Github repository.
- Set up development environment.
(No Meeting)
-
Week 2 (September 16 - September 22)
- Background work related to GraphChi and GraphLab. See sample code.
- Background work related to recommendation systems.
- Read research papers (SVD++, ALS, LibFM)
- Divide algorithms to be implemented by Shu Hao and Mayank in the next 2 weeks.
-
Meeting Notes (September 22)
- Setup github repository.
- Decide which algorithms we are going to implement. Update each other about the background work done so far.
- Meet with the other potential sub-team (Yuchen, Chun Chen) for a project related to web based framework for creating ensembles / performing evaluation using cloud infrastructure.
-
Week 3 (September 23 - September 30)
- Start implementation of SVD++ using Stochastic Gradient descent (Mayank)
- Start implementation of Bayesian Probabilistic Matrix Factorization (Mayank)
- Collect test datasets. (Mayank)
- Write vision document.
-
Meeting Notes (September 30)
- Setup Instruction about how to set up project in eclipse/maven
- Discussion about whether we use some better way to measure Metrics? How to evaluate the performance of our system?
- Discuss bugs relating to MCMC method.
-
Week 4 (October 1 - October 7)
- Resolve certain bugs with SVD++ and BPMF implementation (Mayank)
- Explore the Coda Hale's Metrics library to see if it can be used. (Mayank)
- Read about LibFM's MCMC implementation and started implementation. (Mayank)
- Started this wiki documentation to better document and track progress. (Mayank and Shu Hao)
-
Meeting Notes 10/6 (Sun) 1pm
- Where should we put parameters in SVD? The c++ version is to put in the global memory, while we are thinking the possibility to write in vertex datatype. Decided to contact Aapo to get better understanding.
- Race Condition Discussed about how GraphLab provides different consistence models. However, GraphChi has only one consistency model where in changing parameter of adjacent vertices might result in a race condition.
- Stopping condition For c++ version, the user has to specify the number of iteration. Can we use some criteria to decide whether to stop in every round?
- Serializing a model Discussed about how we can serialize a model? For fixed point estimates (MAP), it is easy to serialize the model parameters. However, for sampling based methods this might be difficult / expensive. Send an email to Danny about his views.
- Discussion about wiki page
-
Week 5 (October 8 - October 14)
- Skype meeting with Aapo to clarify a few things. Suggestions to refactor the code to use memory efficient data structures and also generic way to define parameters.
- Refactored code as suggested for ALS, BPMF and SVD++.
- Made some progress on LibFM, still incomplete as of Oct. 13.
-
Meeting Notes
- Missed Formal meeting on Sunday as team members were travelling for interviews
- Held meeting on Tuesday (10/15). Shu Hao clarified some queries about Bias SGD implementation.
- Mayank explained some parts of LibFM MCMC implementation (Later to realize that it was completely wrong)
- Decided to meet more often to write the plan.
-
Week 6 (October 15 to October 22)
- Implemented LibFM SGD and had some discussions about LibFM MCMC.
- Discussed about Data representation.
- Started writing plan documentation.
-
Week 7 (October 23 to October 30)
- Plan and Vision Document completed. Midterm Presentation.
- Standardized Framework for modelling data designed. Started implementation.
- Decided upon AggregateRecommender model to run multiple algorithms in parallel based on available memory.
-
Week 8 (November 1 to November 7)
- Completed Data API.
- Completed work on how to represent the model and initialize different programs using JSON files.
- Completed implementation of Aggregate Recommender which enables running multiple algorithms.
- Ported ALS, SVDPP, LibFM_SGD and Bias_SGD to the new framework. With the new framework in place, all of these can run in a single GraphChi instance.
- Started exploring Apache YARN (Map Reduce 2.0) to run multiple versions of GraphChi.
- Setup a single node YARN cluster.
- Work on serialization and deserialization of the GraphChi Model.
-
Week 9 (November 8 to November 14)
- Understood working of YARN.
- Initial implementation of YARN's application master and client.
- YARN multi-node cluster setup on AWS.
- Automated the process of setting up cluster on AWS (python script using boto and fabric libraries)
- Worked on Code reviews.
-
Week 10 (November 15 to November 21)
- Started memory analysis of GraphChi to ensure better scheduling of the program. Used Memory Analyzer initially for profiling.
- Got access to YourKit java profiler which is much better. Learned how to use it and started analyzing heap dumps on large pieces of data.
- Profiling required understanding of the internals of GraphChi. Discussion with Aapo and understanding how the graph is represented in a compressed form in GraphChi.
- Implemented the logic for Recommender Pool which allows pipe-lining different runs of algorithms and thus saves IO costs.
- Merged Serialization and Deserialization code.
-
Week 11 (November 22 to November 29)
- Created AWS AMI for easier YARN deployment.
- Code for scheduling with better memory estimates.
- Code cleanup.