홈
태그
방명록

분류 전체보기 (2)
- 논문 리뷰 (1)

ABOUT ME

-

트위터
인스타그램

Today: -

Yesterday: -

Total: -

메모장 메모장

컨텐츠 검색 블로그 내 검색

분류 전체보기

[PyTorch] nn.Linear의 Weight 행렬은 왜 전치되어 있을까?
카테고리 없음 2023. 3. 26. 19:50

Why does the Linear module seems to do unnecessary transposing? I was looking at the code for torch.nn.Linear(in_features, out_features, bias=True) and it seems that it store the matrix one way but then decides that to compute stuff its necessary to transpose (though the transposing seems it could have been avoided). W discuss.pytorch.org PyTorch 공식문서를 보면 nn.Linear의 계산식이 $y=xA^T+b$라고 명시되어있다. $x$..
[논문 리뷰] Attention Is All You Need
논문 리뷰 2023. 3. 20. 18:54

Attention Is All You Need The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new arxiv.org Abstract sequence를 변환하는 모델은 encoder와 decoder를 가진 복잡한 RNN이나 CNN이 주를 이루었고, 가장 좋은 성능을 내는 것은 attention 메커니즘을 활..

이전

1

다음

인기포스트

ABOUT ME

LINK

ADMIN

admin 글쓰기

Designed by Tistory.

티스토리툴바