Neural Models in Dialogue Systems

Convolutional Neural Network

A sliding window feature enables convolution layers to capture local features and the pooling layers can produce hierarchical features. These two mechanisms give CNNs the local perception and global perception ability, helping to capture some specific inner structures of data.

CNNs are good textual feature extractors, but they may not be ideal sequential encoders.

比较先进的几个模型, 往往不会使用卷机模型直接作为encoder, 而是作为层次特征的extractor, 主要原因就是不能连续、灵活地跨越时间序列步骤提取信息

Recurrent Neural Networks and Vanilla Sequence-to-sequence Models

RNN的分类

Graphical models of two basic types of RNNs

$$ x_t:输入\\h_t:hidden\quad state\\y_t:output\quad of \quad time \quad step \quad t\\W_h,W_y,U_h:权重矩阵 $$

两者的区别就是hidden state的计算公式不同, 输入不同

Jordan-Type
Elman-Type

Long Short-Term Memory (LSTM)

Untitled

引入gate mechanism→解决gradient vanishing

引入长期记忆和短期记忆向量, 来编码序列化的数据, 然后用门机制来控制信息流

Gated Recurrent Unit (GRU)

Untitled