Large-Scale Graph Processing Resources

Large-Scale Graph Analytics Resources

Slides:

  • Large-Scale Graph Processing: PDF, by Amir Payberah, SICS
  • Brief Intro to Spark and Scala (tutorial): PDF, by Amir Payberah, SICS
  • Large Scale Graph Processing with Apache Giraph: PDF, by Sebastian Schelter

Papers:

General distributed graph processing platforms

  • Challenges in Parallel Graph Processing, 2007: PDF
  • GraphLab A New Framework For Parallel Machine Learning, 2010: PDF
  • Pregel: A System for Large-Scale Graph Processing, 2010: PDF
  • Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud, 2012: PDF
  • PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs, 2012: PDF
  • GraphX: Unifying Data-Parallel and Graph-Paralle Analytics, 2014: PDF
  • From “Think Like a Vertex” to “Think Like a Graph”, by Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, John McPherson, VLDB’13: PDF [Giraph++]
  • Processing large-scale graph data: A guide to current technology by Sherif Sakr: PDF [overview article by IBM Developer Works]

Recent research on graph processing platforms

  • Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems by Minyang Han (University of Waterloo), Khuzaima Daudjee (University of Waterloo), VLDB’15: PDF
  • One Trillion Edges: Graph Processing at Facebook-Scale by Avery Ching (Facebook), Dionysios Logothetis (Facebook), Sergey Edunov (Facebook), Maja Kabiljo (Facebook), Sambavi Muthukrishnan (Facebook), VLDB’15: PDF
  • MOCgraph: Scalable Distributed Graph Processing Using Message Online Computing, Chang Zhou, Jun Gao, Binbin Sun, Jeffrey Xu Yu, VLDB’15: PDF [MOCgraph: message online computing (MOC) to improve memory footprint by processing incoming messages in a streaming manner]
  • Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs by Da Yan (HKUST), James Cheng (CUHK), Yi Lu (CUHK), Wilfred Ng, The Hong Kong University of Science and Technology), VLDB’15: PDF [Block-oriented distributed graph processing framework to deal with skewed degree distribution, large diameter, and high density of real world graphs, such as social networks; supports flexible partitioning based on “blocks” as opposed to vertices and/or edges]
  • GraphTwist: Fast Iterative Graph Computation
    with Two-tier Optimizations, Zhou, Lee, Liu, and Zhang, VLDB’15: PDF [improvements to handle vertex-centric system bottlenecks]
  • A Scalable Distributed Graph Partitioner, Margo and Seltzer, VLDB’15: PDF [Scalable Host-tree Embeddings for Efficient Partitioning
    (Sheep): scalable general-purpose graph partitioning algorithm for distributed/parallel frameworks]
  • Pregelix: Big(ger) Graph Analytics on A Dataflow Engine, Bu, Borkar, Jia, Carey, Condie, VLDB’15: PDF [reducing memory load for distributed graph processing systems]
  • Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation, Lu, Cheng, Yang, and Wu, VLDB’15: PDF [extensive comparative performance study of the mainstream distributed graph processing frameworks]
  • Fast Failure Recovery in Distributed Graph Processing
    Systems, Shen, Chen, Jagadish, Lu, Ooi, Tudor: PDF [failure recovery by repartitioning vertices]

Probabilistic Analysis

  • “Balls into Bins” — A Simple and Tight Analysis, Raab and Steger: PDF

Problems on graphs

Community detection
  • Defining and Evaluating Communities Based on Ground Truth, Yang and Leskovec, ICDM’12: PDF [Examines and compares 13 commonly used structural definitions of network communities]
  • Community Detection in Graphs, Fortunato, 2010: PDF [survey on community detection algorithms]
  • Communities from Seed Sets, Andersen and Lang, WWW’06: PDF [local methods]
  • Local Community Identification in Social Networks, Chen, Zaiane, Goebel, ASONAM’09: PDF [local methods]
  • Local Graph Partitioning Using PageRank Vectors, Andersen, Chung, Lang, FOCS’06: PDF [local methods]
  • Near linear time algorithm to detect community structures in large-scale networks, Ragahavan, Albert, Kumara: PDF
  • Evaluating Local Community Methods in Networks, James Bagrow: PDF [accuracy benchmarking]
  • DENGRAPH: A Density-based Community Detection Algorithm, Falkowski, Bart, Spiliopoulou, ICWI’07: PDF
  • Finding community structure in very large networks, Clauset, Newman, Moore, Physics Review, 2004: PDF [classic paper, global algorithm]
  • Community Structure in Social and Biological Networks, Girvan and Newman, PNAS 99 (12), 2001: PDF
  • Finding and evaluating community structure in networks, Newman and Girvan, PDF
  • A Local Method for Detecting Communities, Bagrow and Bollt, Physics Review 2005, PDF
  • Finding local community structure in networks, Clauset, Physics Review 2005, PDF
  • Stochastic Local Clustering for Massive Graphs, Schaeffer, PAKDD’05: PDF
  • Influential Community Search in Large Networks, by Rong-Hua LI (CUHK), Lu Qin (University of Technology (Sydney), Jeffrey Xu Yu (The Chinese University of Hong Kong), Rui Mao (Shenzhen University): PDF [non-distributed, stream], VLDB’15
  • Robust Local Community Detection: On Free Rider Effect and Its Elimination by Yubao Wu (Case Western Reserve University), Ruoming Jin (Kent State University), Jing Li (Case Western Reserve University), Xiang Zhang (Case Western Reserve University): PDF, [systematic study of the existing goodness metrics and “free riders” phenomenon], VLDB’15: PDF
  • Community Detection in Social Networks: An In-depth Benchmarking Study with a Procedure-Oriented Framework by Meng Wang (Tsinghua University), Chaokun Wang (Tsinghua University), Jeffrey Xu Yu (The Chinese University of Hong Kong), Jun Zhang (Tsinghua University), [generalized community detection procedure and procedure-oriented framework for benchmarking; allows comparison of the existing community detection approaches and pinpointing their deficiencies], VLDB’15: PDF
Clustering, Hubs and Outliers
  • SCAN: a structural clustering algorithm for networks, Xu, Yuruk, Feng and Schweiger, KDD’07: PDF [not distributed, but apparently widely-used algortighm]

  • SCAN++: Efficient Algorithm for Finding Clusters, Hubs and Outliers on Large-scale Graphs by Hiroaki Shiokawa (NTT), Yasuhiro Fujiwara (NTT), Makoto Onizuka (Osaka University): VLDB’15: PDF [improved scalability for SCAN; still not distributed]

Link Prediction and Friend Recommendation in Social Networks
  • Link Prediction in Social Networks:
    the State-of-the-Art, Wang, Xu, Wu, Zhou, Science China: Information Sciences: PDF [comprehensive survey]
  • LINKREC: A Unified Framework for Link Recommendation
    with User Attributes and Graph Structure, Yin, Gupta, Weninger, Han, WWW’10: PDF [uses RW-based technique]
  • Link prediction based on local information considering
    preferential attachment, Shan Zeng, Physica A: PDF [exploits neighborhood information to produce link recommendation]
PageRank
  • FrogWild! — Fast PageRank Approximations on Graph Engines by Ioannis Mitliagkas (UT Austin), Michael Borokhovich (UT Austin), Alexandros Dimakis (UT Austin), Constantine Caramanis (UT Austin), VLDB’15: PDF [speeding up PageRank on GraphLab by relaxing synchronisation; non-distributed?]
Graph connectivity problems (connected components, strongly/weakly connected components, bi-connected components)
  • Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees by Da Yan (HKUST), James Cheng (CUHK), Kai Xing (HKUST), Yi Lu (CUHK), Wilfred Ng, Yingyi Bu (UC Irvine), VLDB’15: PDF [practical Pregel algorithms for various graph connectivity problems]
  • Finding connected components in map-reduce in logarithmic rounds, Laukik Chitnis, Anish Das Sarma, Ashwin Machanavajjhala, and Vibhor Rastog, ICDE’13: PDF [HashMin algorithm for identifying connected components]

BFS, DFS, etc.
  • The More the Merrier: Efficient Multi-Source Graph Traversal by Manuel Then, Moritz Kaufmann, Fernando Chirigati, Tuan-Anh Hoang-Vu, Kien Pham, Alfons Kemper, Thomas Neumann, Huy Vo, VLDB’15: PDF [Multiple-Source BFS (MS-BFS). non-distributed, also looking into applications to Closeness Centrality (see below)]
Centrality
  • U. Brandes. A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology, 25:163–177, 2001: PDF [Betweenness Centrality: Wikipedia article, also in Newman’s and Rajaraman et al books]
  • P. Olsen, A. Labouseur, and J.-H. Hwang. Efficient Top-K Closeness Centrality Search. In ICDE ’14, pages 196–207, March 2014: PDF [closeness centrality]
Small world (degree of separation)
  • Lars Backstrom, Paolo Boldi, Marco Rosa, Johan Ugander, and Sebastiano Vigna. Four degrees of separation. In ACM Web Science 2012: Conference Proceedings, pages 45−54. ACM Press, 2012: PDF
Graph Streams
  • Graph Stream Algorithms: A Survey by Andrew McGregor, Database Principles Column: PDF, Slides: PDF [queries over dynamically changing graphs represented as a stream of edges; mostly theoretical treatment]
SimRank
  • SimRank: a measure of structural-context similarity by Jeh and Widdom, KDD’02: PDF [seminal paper that introduced the metric; similarity is asserted through structural context in a graph: two vertices are similar if they are referenced by two vertices which are also similar; applies in many contexts: recommender systems, spam filtering, sponsored search, malware detection, etc.; typically top-k simrank scores are of interest]
Sampling
  • Sampling from Large Graphs by Leskovec & Faloutsos, KDD’06: PDF [survey of graph sampling techniques; not necessarily distributed]
  • Walk, Not Wait: Faster Sampling Over Online Social Networks by Azade Nazi (University of Texas at Arlington), Zhuojie Zhou (George Washington University), Saravanan Thirumuruganathan (University of Texas at Arlingt), Nan Zhang (George Washington University), Gautam Das (University of Texas at Arlington), VLDB’15: PDF [improved graph sampling technique compared to random walk, non-distributed]

Books:

Platforms:

Datasets