Vollbild-Modus

MGTCOM

Andere

Egor Dmitriev

Folgen

Jetzt loslegen. Gratis!

oder registrieren mit Ihrer E-Mail-Adresse

Ähnliche Mindmaps Mindmap-Gliederung

MGTCOM von Egor Dmitriev Mind Map: MGTCOM

1. Evaluation

2. Preliminaries

2.1. Heterogenous Graph

2.1.1. G = (V, E, A, R)

2.1.2. A, R are types

2.1.3. node and edge mapping V -> A

2.2. Context Path

2.2.1. Difference vs Metapath

2.2.1.1. Metapath are bound by type semantics

2.2.1.2. Type sequences are manually specified

2.2.1.3. Number of combinations explodes in highly heterogeneous graphs

2.2.2. Relaxes restrictions of metapath while preserving relations at a higher level

2.2.3. path length can be determined empirically

2.3. DPGMM

2.3.1. Are part of Bayesian Non Parametric mixture models which find a clustering solution when K is unknown

2.3.2. Define cluster centers as Multivariate Normal Distributions

2.3.3. DPGMM based on notion of infinitely many gaussians

2.3.3.1. https://i.imgur.com/6AdQO2x.png

2.3.3.2. N(...) is a gaussian pdf

2.3.3.3. mu is mean, sigma is covariance matrix

2.3.3.3.1. Assumed to be idd distributed

2.3.3.3.2. and drawn from a prior distribution (Normal Wishart)

2.3.4. We adopt Chang and FIsher III DPM sampler consisting of a split merge framekowrk

2.3.4.1. For each cluster two auxiliary subclusters are introduced

2.3.4.2. Form a 2 component gmm

2.3.4.3. Split and merges are proposed and accepted using metropolis hastings framework

2.3.4.4. Suitable candidates are proposed given auxiliare variabels as splits or by mergking k neighboring clusters

2.4. Problem definition

2.4.1. Given a heterogeneous graph G

2.4.2. Find node embedding / mapping function V -> Z

2.4.3. Such that these embeddings can be used to group nodes into communities

2.4.4. Find community / cluster parameters \theta which split nodes into K groups with similar patterns

2.4.5. K is also part of learned parameters

2.4.6. We define community embedding as multivariate gaussian distribution in low-dimensional space

2.4.6.1. Describe in terms of N(\mu, \sigma)

2.4.7. Output

2.4.7.1. Node embeddings

2.4.7.2. community parameters \theta_clus

2.4.7.2.1. Community membership pi

2.5. Maximize node cooccurence probability (Objective function)

2.5.1. sum_{v_j in N(v_i)} p(v_i | v_j)

2.5.1.1. https://i.imgur.com/tnyWshy.png

2.5.1.1.1. Corpus C aka paths

2.5.1.1.2. V is vocab aka nodes

2.5.1.1.3. Center word w_k

2.5.1.1.4. f(.) consumes sequence and outputs a scalar

2.5.1.1.5. Trained to produce higher score for positive window vs negative window

2.5.2. For each neighbor given path

2.5.3. For each path between nodes

2.5.4. Foreach node in graph

2.5.5. optimize parameters \theta_{net|topo|tempo}

2.6. Hinge Loss?

3. Approach

3.1. Overview

3.1.1. Goals

3.1.1.1. Capability to learn on heterogeneous networks

3.1.1.2. Ability to incorporate meta-topological, content and temporal fatures

3.1.1.3. Dynamically infer number of communities

3.1.1.4. Ability to handle nodes with no features

3.1.1.5. Inference of node representations for unseen nodes / incomplete networks

3.1.2. We achieve this by

3.1.2.1. Combining existing inductive approaches such as GraphSAGE and HGT

3.1.2.2. With context based unsupervided learning approaches

3.1.2.3. Combine sampling techniques in an efficient way

3.1.2.4. Introduce multiple heads learning on different tasks (topo, tempo)

3.1.2.5. Build combine the learned embeddings

3.1.2.6. Devise a way to combine hierarchical DPM techniques with embedding representation to infer k and detect number of communities

3.1.3. Four major components

3.1.3.1. Feature Embedding Layer

3.1.3.1.1. For transforming node features into node representation

3.1.3.2. Topo / Tempo Layer

3.1.3.2.1. For embedding high-order topological and temporal relationship of each node into representation vector

3.1.3.3. Community Detection module

3.1.3.3.1. Responsible for finding community parameters over node representation space

3.2. Feature learning

3.2.1. Custom Graph

3.2.1.1. Node features aither come from embeddings or graph

3.2.2. Used to transform node features into node embeddings

3.2.3. Finding all paths in infeasible (amount grows exponentially)

3.2.3.1. We adopt graph convolutional neural nets

3.2.3.2. Specifically Heterogenous Graph Transformer

3.2.3.3. Approximates the message passing algorithm by sampling a subgraph for each node

3.2.4. Architecture

3.2.4.1. Feature Transformation

3.2.4.1.1. Tranform node features into dense representation vectors

3.2.4.1.2. Embed nodes without features

3.2.4.1.3. Augment unseen nodes missing features with zero vector

3.2.4.2. Heterogrenous Graph Transformer

3.2.4.2.1. Used to capture higher order relations Within the graph (meta-topology)

3.2.4.2.2. Used to capture topo logical information from the neighbors

3.2.4.2.3. Relation Attention

3.2.4.2.4. https://i.imgur.com/AZv5tq4.png

3.2.4.2.5. https://i.imgur.com/cuPnYxs.png

3.2.4.2.6. https://i.imgur.com/ZTQ1yo9.png

3.2.4.3. Sampling

3.2.4.3.1. We first collect context paths

3.2.4.3.2. All nodes in said context paths are used in message passing

3.2.4.3.3. We set budget dynamically proportional to amount of nodes present

3.2.4.3.4. Heterogenous relations are captured by adjacency patrix reconstruction on the subgraph

3.3. Topo Emb / Tempo Emb

3.3.1. Context path sampling

3.3.1.1. Define context path by temporal sampling (ballroom)

3.3.1.1.1. Algorithm

3.3.1.1.2. A way for unbiased sampling of temporal context for nodes with and without timestamp

3.3.1.1.3. With increase in nodes, nodes with no timestamp increase. This introduces a bias for regular temporal samplers towards non timestamped pairs

3.3.1.2. Define context path by random walks (node2vec)

3.3.2. Explain transformation heads

3.4. Community Detection

3.4.1. Given mebdding it is straightforward to find clusters

3.4.1.1. This pipeline approach lacks unified objective

3.4.2. Single objective based on GMM

3.4.2.1. assume each node embedding is generated by MVGauss

3.4.2.1.1. Likelihood function

3.4.2.1.2. p(vi|...) is MVGauss

3.4.2.1.3. pi and theta_clust are optimized

3.4.3. We adopt a hierarchical DPM model proposed by chang and fisher

3.4.3.1. For each cluser we have a subcluster model with 2 components

3.4.3.2. We use Metropolis Hastings MCMC to transition between cluster parameter states

3.4.3.3. As there are unmanageably possible states

3.4.3.4. We generate feasible candidate states by considereing subcluter models as feasible split candidates and k nearest clusters as merge candidates

3.5. End-to-End

3.5.1. Use similar methodology as cavalleri to introduce feedback loop from community detection and embedding

3.5.2. optimization enforces nodes embeddings within same community to get closer to the center

3.5.3. Nodes having similar embeddings are more likely to be in same embedding

3.5.4. Combining topology (1s and 2nd order), meta-topology, temporality, and content enforces community aware higher order proximity

3.5.4.1. Embeddings are combined by reusing the feature embedding layer when training topo and tempo heads

3.5.5. Use two step contrained optimization

3.5.5.1. Optimize embeddings given cluster params

3.5.5.2. Oprimize cluster params given embeddings

3.5.5.3. Standard expectation optimization apprach

3.5.5.3.1. GMM / DPM can work by incrementally updating parameters

4. Introduction

4.1. Community Detection

4.1.1. Identifying patterns of nodes with common properties

4.2. Link-based methods

4.2.1. Modularity optimization

4.2.2. Clique identification

4.2.3. Spectral optimization

4.3. Represenation-based methods

4.3.1. Autoencoder based

4.3.1.1. Take away from spectral

4.3.2. Rely on context based techniques

4.3.2.1. Node2vec, Line, Deepwalk

4.3.3. Incorporate multiobjective learning

4.3.3.1. Community awareness

4.3.4. Other advantages

4.3.4.1. Node classification, Link predictoin, etc.

4.4. Real-world networks

4.4.1. Multimodality

4.4.1.1. Temporality

4.4.1.2. Meta-Topology

4.4.1.3. Node Features

4.4.1.3.1. Text

4.4.1.3.2. Image

4.4.1.4. Heterophily

4.4.1.5. Causal links may exist

4.4.1.6. None of the methods cover them

4.4.1.6.1. Losslessness

4.4.2. Information Variance

4.4.2.1. Heterogenous networks

4.4.2.2. Feature Subsets

4.4.2.3. Dimensionality

4.4.3. Incompleteness

4.4.3.1. Topologically incomplete

4.4.3.2. Other kinds of incompleteness

4.4.3.2.1. Missing features

4.4.3.2.2. Missing timestamps

4.4.4. Scalability

4.4.4.1. Web-scale networks

4.4.4.1.1. Billions of nodes

4.4.4.1.2. Cant use all nodes for training

4.4.4.1.3. On-demand inference

4.5. In this paper

4.5.1. Introduce a community detection framework

4.5.2. Incorporate

4.5.2.1. Temporal

4.5.2.2. Topological

4.5.2.3. Meta-topological

4.5.2.4. Feature

4.5.3. Adress the issue of

4.5.3.1. Infomation variance

4.5.3.2. Incomplete networks

4.5.3.3. Show how scale issue can be adressed

4.5.3.3.1. Traning on a sample

4.5.3.3.2. Mixture Models (also DPMM) work on mean and covariance

5. Related Work

5.1. Graph Representation Learning

5.1.1. Introduction

5.1.1.1. There is an increasing amount of graph data

5.1.1.1.1. also a demand for efficient representation for retrieval and analytics

5.1.1.2. Focuses on represenation of graph entities into low representation vectors

5.1.1.3. approaches stem from the field of computational linguistics, which relies heavily on the notion of distributional semantics (propsal)

5.1.1.4. Approaches

5.1.1.4.1. Shallow Architecture (direct embedding)

5.1.1.4.2. Deep Architecture (more inductive kind of repr)

5.1.1.5. Link a few surveys

5.1.2. Unsupervised Methods Second order proximity

5.1.2.1. Inherit a lot from NLP

5.1.2.1.1. Skipgram

5.1.2.2. Deepwalk

5.1.2.2.1. Introduce randomwalk approach to contextualize non-linear node neighborhood into a learnable represenation

5.1.2.3. Line, SDNE

5.1.2.3.1. Aim to preserve 1st and 2nd order proximity

5.1.2.3.2. Higher complexity

5.1.2.4. Node2Vec

5.1.2.4.1. Provides tradeoff between 1st and 2nd order proximity

5.1.2.4.2. Modifies deepwalk process by introducing bias parameters to the sampling process

5.1.2.4.3. Bias for revisiting similar nodes (p)

5.1.2.4.4. Bias for leaving a neighborhood (q) - DFS vs BFS tradeoff

5.1.2.4.5. BFS - Favors structural similarity DFS - Favors communities

5.1.2.5. Metapath2Vec

5.1.3. Inductive / Convolution Methods First Order Proximity

5.1.3.1. Hamilton et al. - GraphSAGE

5.1.3.1.1. Introduces neighborhood sampling and efficient aggregation to infer node representation

5.1.3.1.2. Allows(requires) nodes to have a feature vector

5.1.3.2. Hu et al - HGT

5.1.3.2.1. Heterogeneous Graph Transformer

5.1.3.2.2. Improves efficiency of sampling process by introducing per node budget

5.1.3.2.3. Introduces a weight efficient architecture to model different nodes and different relations inbetween

5.1.3.3. Focus on node representation (not learning strategy)

5.1.4. Autoencoder Based

5.1.4.1. Graph reconstruction using normalized similarity Matrix

5.1.4.1.1. Very expensive |V|^~3

5.1.4.1.2. SAE- Fei Tian

5.1.5. Heterogenous

5.1.5.1. Survey - yangHeterogeneousNetworkRepresentation2020

5.1.5.2. TransE

5.1.5.2.1. Represents relations as translation matrices

5.1.5.3. Metapath2vec

5.1.5.3.1. Defiens metapaths as sequences of node types

5.1.5.3.2. Ustilizes n2v architecture to learn these node types

5.1.5.3.3. Similarly hin2vec

5.1.5.4. HGT

5.1.5.4.1. Uses transformers for attention between neighbors

5.1.5.4.2. Defines transformation matrices for different relations

5.1.5.4.3. Defines custom smapling algorithm

5.1.6. Temporal

5.1.6.1. Inductive

5.1.6.1.1. wuSageDyNovelSampling2021

5.1.6.1.2. huHeterogeneousGraphTransformer2020a

5.1.6.1.3. zhaoLargeScaleEvolving2019

5.1.6.2. Spectral

5.1.6.2.1. parejaEvolveGCNEvolvingGraph2020

5.1.6.3. Shallow

5.1.6.3.1. dasguptaHyTEHyperplanebasedTemporally2018

5.1.6.3.2. nguyenContinuousTimeDynamicNetwork2018

5.2. Community Detection (repr based)

5.2.1. Introduction

5.2.1.1. Link a few surveys

5.2.1.1.1. There is an abundance of work

5.2.1.2. Reiterate definition

5.2.1.2.1. No Concrete definition for a community

5.2.1.2.2. Similarly the measures for community quality a contested

5.2.1.2.3. peelGroundTruthMetadata2017

5.2.1.3. Homophily vs Heterophily

5.2.1.3.1. Important notion

5.2.1.3.2. mcphersonBirdsFeatherHomophily2001

5.2.1.3.3. zhuHomophilyGraphNeural

5.2.1.4. Recent CD methods start to exploit rich node interactions

5.2.1.4.1. With poulairy of social networks

5.2.1.4.2. Content

5.2.1.4.3. Attributes

5.2.1.4.4. Newer methods often combine feature learning with deep clustering algotihms

5.2.2. Basic Approaches

5.2.2.1. Learn Node Representation and apply clustering

5.2.2.2. tianLearningDeepRepresentations2014

5.2.2.2.1. Learn node representations and apply spectral clustering

5.2.2.3. kozdobaCommunityDetectionMeasure2015

5.2.2.3.1. node repr + kmeans

5.2.2.4. Wang - N-MNF

5.2.2.4.1. Aims to represent each comminity in embedidng space

5.2.2.4.2. Using global information through AE

5.2.2.4.3. Uses kmeans

5.2.2.5. Liang Yang - Modularity Based Community Detection with Deep Learning

5.2.2.5.1. Uses modularity matrix instead to get appropriate embeddings

5.2.2.5.2. Instead autoencoders are employed with reconstruction loss

5.2.3. Community Embedding Methods

5.2.3.1. Rozemberski - GEMSEC

5.2.3.1.1. Multiobjective (Embedding, Clustering)

5.2.3.2. Cavalleri et al. - ComE

5.2.3.2.1. Combines Representation learning Node2vec

5.2.3.2.2. With Gaussian Mixture Models

5.2.3.2.3. Alternates between the two

5.2.3.3. Jia et al. - CommunityGAN

5.2.3.3.1. Introduce a novel N-clique / motif sampling technique as a way to contextualize node "topology"

5.2.3.3.2. Modify affiliation graph models to learn community-aware node representations

5.2.3.3.3. Use GAN process to learn in a unsupervised way

5.2.4. Multimodal Methods

5.2.4.1. Metapath

5.2.4.1.1. Requires too much domain knowledge

5.2.4.2. Scraped

5.2.4.2.1. liCommunityDetectionAttributed2018

5.2.4.3. liuCommunityDetectionBased2014

5.2.4.3.1. LDA Based

5.2.4.3.2. Community meging / split

5.2.4.4. yangGraphClusteringDynamic2017 - GRACE

5.2.4.4.1. Use node features for node representation

5.2.4.4.2. Define a clustering module

5.2.4.4.3. K is predefined

5.2.4.5. caoIncorporatingNetworkStructure2018

5.2.4.5.1. AE

5.2.4.5.2. Learn Content and Topology separately

5.2.4.6. faniUserCommunityDetection2020

5.2.4.6.1. Combine Content, Topology and Temporality

5.2.4.6.2. Model users as tempral topic histograms

5.2.4.6.3. Use histogram along with graph information for community detection

5.2.5. Heterogenous

5.2.5.1. huangInformationFusionOriented2022

5.2.5.1.1. Heterogenous

5.2.5.1.2. LDA based

5.2.5.1.3. Focuses on friend recommendation

5.2.5.2. sunRelationStrengthAwareClustering2012

5.2.5.2.1. Incorporates content info

5.2.5.3. luoDetectingCommunitiesHeterogeneous2021 - CP-GNN

5.2.5.3.1. Define a context path based GNN

5.2.5.3.2. Attention based - Transformer like

5.2.5.3.3. Define context path based aggregation

5.2.5.3.4. Uses HGT and KMeans and own sampling

5.3. Deep Clustering

5.3.1. Ester et al. - DBSCAN

5.3.1.1. No assumpltions about number of clusters

5.3.2. Chang & Fisher II - DPMM Subcluster

5.3.2.1. Modifies Dirichlet Process Mixture Model

5.3.2.2. Introduces subclustering for parallel splitting

5.3.2.3. Describes clustering process as Markov Chain Monte Carlo Process

5.3.2.3.1. Split or merge is a transition to another state

5.3.2.3.2. There are a LOT of states

5.3.2.3.3. We pick best candidates by

5.3.3. Overview of CD

5.3.3.1. k-means

5.3.3.1.1. luoDetectingCommunitiesHeterogeneous2021

5.3.3.1.2. caoIncorporatingNetworkStructure2018

5.3.3.1.3. tianLearningDeepRepresentations2014

5.3.3.1.4. kozdobaCommunityDetectionMeasure2015

5.3.3.1.5. yangModularityBasedCommunity2016

5.3.3.2. Mixture models

5.3.3.2.1. cavallariLearningCommunityEmbedding2017

5.3.3.2.2. choongLearningCommunityStructure2018

5.3.3.3. Deepcluster

5.3.3.3.1. yangGraphClusteringDynamic2017

5.3.3.4. Alternative

5.3.3.4.1. faniUserCommunityDetection2020 - louvain

5.3.3.4.2. jiaCommunityGANCommunityDetection2019 - AGM

5.3.3.4.3. rozemberczkiGEMSECGraphEmbedding2019 - mean optimization

5.3.3.4.4. wangCombiningGraphConvolutional2021 - label propagation

6. Experiments

6.1. Experimental setup

6.1.1. The experiments are designed to

6.1.1.1. investigate the effectiveness of proposed framework

6.1.1.2. using a wide range of real world graphs

6.1.1.3. on various tasks related to multimodal network learning

6.1.2. Link splitting

6.1.2.1. We split 10% into validation, 10% test and 80% for training

6.1.3. Node splitting

6.1.3.1. Only for evaluation of on inference tasks

6.1.4. Datasets

6.1.4.1. We use four* widely used real-world datasets

6.1.4.1.1. Three of which are heterogenous

6.1.4.1.2. We preprocess them

6.1.4.2. Hetero

6.1.4.2.1. ICEWS

6.1.4.2.2. DBLP

6.1.4.2.3. IMDB

6.1.4.3. Homogenous

6.1.4.3.1. Cora

6.2. Evaluation metrics

6.2.1. There are currently no measures that are ables to asses quality of communities based on both time and content

6.2.1.1. Therefore we evaluate our model in parts

6.2.1.2. Given specific coherent tasks

6.2.1.3. By combining results on multiple tasks given a dataset we can asses the expected total performance of our model

6.2.2. Classification

6.2.2.1. We define three groups of labels

6.2.2.1.1. Ground truth communities

6.2.2.1.2. Node timestamps

6.2.2.1.3. Louvain communities

6.2.2.2. We train a logistic regression on node embeddings

6.2.2.2.1. Train 3 times and take the average

6.2.2.2.2. We use Accuracy, F1 micros /macro to compute the score

6.2.3. Link prediction

6.2.3.1. Captures the topology prediction capability of node embedding

6.2.3.2. We train a logistic regression on node embeddings

6.2.3.2.1. We use 50% true and 50% false samples for evaluations

6.2.3.2.2. The true edges were not included in training data

6.2.3.2.3. As features, we use the computed similarity / DOTP between the embeddings

6.2.3.3. We compute

6.2.3.3.1. Accuracy

6.2.3.3.2. ROC AUC

6.2.3.3.3. F1

6.2.4. Cluster quality

6.2.4.1. Captures cluster quality in respect to node embeddings

6.2.4.2. We used it to evaluate how we structured node embeddings are given a label

6.2.4.3. Computes

6.2.4.3.1. Silhouette coeffiecient

6.2.4.3.2. Davies bouldin coefficient

6.2.5. Community quality

6.2.5.1. Multiple measures have been proposed to evaluate cluster quality

6.2.5.2. We adopt Modularity measure proposed by Newmann Girvan as a way to quantify topological quality of a community result

6.2.5.3. Similarly ground-truth communities may capture different semantics

6.2.5.3.1. given a ground truth community we compute nmi and ARI score

6.2.5.3.2. Which measure how much the found communities overlap with groundtruth or share the same information

6.3. Baselines

6.3.1. To evaluate effectiveness of out approach

6.3.1.1. Compare our results with a list of state of the art unsupervised methods

6.3.1.2. We compare with both community detection methods as well as with representation learning methods

6.3.1.2.1. At the end of repr learning we run k-means

6.3.1.3. The following baselines are used for

6.3.2. Methods

6.3.2.1. Node2vec

6.3.2.2. Metapath2vec

6.3.2.3. CP-GNN

6.3.2.4. ComE

6.3.2.5. GEMSEC

6.3.2.6. GraphSAGE

6.3.2.7. SageDy?

6.3.3. Hyperparameters

6.3.3.1. For RW based methods we set

6.3.3.1.1. walk lenght to 80

6.3.3.1.2. Context length t10

6.3.3.1.3. p=1 and q=1 if applicable

6.3.3.2. For a fair comparison

6.3.3.3. Feature dimension is 32

6.4. Performance comparison

6.4.1. In this experiment we learn graph embeddings and detect communities on found data

6.4.2. For methods such as Node2Vec which do not have a communtity detection method we apply k-Means on the results

6.4.2.1. Similarly as some baseline methods require a predefined k we set it to 20

6.4.3. We calcualte classification, prediction, clustering and community detection metrics on the restuls

6.4.4. As our method also considers temporal data we two additional variants

6.4.4.1. MGTCOM trained only on topological data

6.4.4.2. MGTCOM trained only on temporal data

6.4.4.3. MGTCOM trained on both

6.4.4.4. All variants MGTCOM

6.4.4.4.1. consider feature data if available

6.4.4.4.2. Otherwise embeddings are instantiated

6.4.5. Metrics are calculated on the test set exclusively

6.4.5.1. Test samples are ignored

6.5. Parameter analysis and Ablation study

6.5.1. Investigate the sensitivity of the parameters and report results on DBLP dataset

oder Registrieren