目录
介绍:
编辑
实现
介绍:
图画的不对。O1是不能往O2去的。只是中间的白色区域块是连接的。
第二个公式表示 用的是前一个时刻第j个隐藏层和该时刻的上一个隐藏层
这里的并行像计组里面的流水线,多条指令按序发射(关于训练时间问题)
实现
用nn.lstm举例子:2层lstm
import torch from torch import nn from d2l import torch as d2l batch_size, num_steps = 32, 35 train_iter, vocab = d2l.load_data_time_machine(batch_size, num_steps) vocab_size, num_hiddens, num_layers = len(vocab), 256, 2 num_inputs = vocab_size device = d2l.try_gpu() lstm_layer = nn.LSTM(num_inputs, num_hiddens, num_layers) model = d2l.RNNModel(lstm_layer, len(vocab)) model = model.to(device) num_epochs, lr = 500, 2 d2l.train_ch8(model, train_iter, vocab, lr*1.0, num_epochs, device)通过num_layers的值来设定隐藏层数)