Block-wise Dynamic Sparseness

Começar. É Gratuito
ou inscrever-se com seu endereço de e-mail
Block-wise Dynamic Sparseness por Mind Map: Block-wise Dynamic Sparseness

1. Introduction

1.1. Problem

1.1.1. Focus on redesign neural network architecture

1.2. This Paper

1.2.1. Network pruning

1.3. Pruning

1.3.1. Static pruning

1.3.2. Dynamic pruning

1.3.2.1. depending on the input

1.3.2.1.1. To avoid inefficiency which would result from a fully flexible sparsity pattern for each input instance,we choose block-wise pruning

1.4. Advantage of block pruning

1.4.1. block-wise pruning can achieve a computational cost similar to static pruning, while largely keeping the expressiveness of the non-sparse case, in the sense that overall the same number of network parameters can be tuned.

2. Related Work

2.1. Early Study

2.1.1. sensitivity

2.1.1.1. relay on the estimation of influence of a specific node or weight

2.1.2. penalty-term

2.1.2.1. modify the objective function to force neural networks to remove redundant weights during training.

2.2. Pruning Algorithms

2.2.1. weight pruning

2.2.1.1. applied to individual entries of weight matrices

2.2.2. neuron pruning

2.2.2.1. remove entire neurons

2.3. Recent Studies

2.3.1. structured

2.3.1.1. filters,channels, layers

2.3.2. unstructured

2.3.3. conditional computing

2.3.3.1. dynamic execution

2.3.3.2. runtime pruning

3. Dynamic Sparse Linear Layer

3.1. Gating

3.1.1. Continuous

3.1.1.1. closed gate correspond to a very small value

3.1.2. Discrete

3.1.2.1. exactly zero

3.2. Proposed gating mechanism

3.2.1. combine the advantage of both

3.2.2. 为了制造一个gating mask,可以应用在基于vector h的权重W,我们需要一个decision function