분류 전체보기
-
[PyTorch] nn.Linear의 Weight 행렬은 왜 전치되어 있을까?카테고리 없음 2023. 3. 26. 19:50
Why does the Linear module seems to do unnecessary transposing? I was looking at the code for torch.nn.Linear(in_features, out_features, bias=True) and it seems that it store the matrix one way but then decides that to compute stuff its necessary to transpose (though the transposing seems it could have been avoided). W discuss.pytorch.org PyTorch 공식문서를 보면 nn.Linear의 계산식이 $y=xA^T+b$라고 명시되어있다. $x$..
-
[논문 리뷰] Attention Is All You Need논문 리뷰 2023. 3. 20. 18:54
Attention Is All You Need The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new arxiv.org Abstract sequence를 변환하는 모델은 encoder와 decoder를 가진 복잡한 RNN이나 CNN이 주를 이루었고, 가장 좋은 성능을 내는 것은 attention 메커니즘을 활..