PyTorch: Storage, Shape, and Axis-Moving
A reference for the operations that change a tensor’s shape or axis order: what copies, what doesn’t, and the precise semantics of each.
The three things that define a tensor
A torch.Tensor is not just a multi-dim array of values. It is three things wrapping a flat storage:
- Storage — a flat 1D block of memory holding the actual numbers.
- Shape — how to interpret those numbers as a multi-dim grid.
- Strides — how many storage elements to skip when each dim index increments by 1.
Multiple tensors can share one storage with different shapes and strides — that is what views are.
Row-major (C-order) layout
PyTorch and NumPy default to row-major layout: the last dim is the “fast” one, and rows are contiguous in storage.
For X = torch.arange(6).reshape(2, 3):
X = [[0, 1, 2],
[3, 4, 5]]
Storage: [0, 1, 2, 3, 4, 5]
Shape: (2, 3)
Strides: (3, 1)
Strides = (3, 1) says: to move one row, advance 3 in storage; to move one column, advance 1.
A tensor is contiguous when its strides match what you would expect for a freshly-allocated row-major layout. Operations that change axis order break contiguity by leaving strides that no longer match storage order.
The reshape rule
reshape(...)walks the logical tensor in row-major order, then lays out the new shape filling row-major.
The key word is logical. Reshape iterates through the tensor as if it had been freshly materialised into a contiguous buffer, and uses that walk to populate the new shape.
For a contiguous tensor, the logical row-major walk matches storage order, so reshape is just metadata — no copy.
For a non-contiguous tensor (typically after a transpose / permute), the logical row-major walk does not correspond to a stride pattern over the original storage, so PyTorch materialises a copy into a fresh contiguous buffer first, then reshapes.
view vs reshape
view(...): requires the tensor’s strides to permit the new shape without a copy. Errors otherwise. Never copies.reshape(...): permissive. Returns a view if it can; copies if it cannot. Convenient but hides the cost.
X = torch.arange(6).reshape(2, 3) # contiguous
X.view(3, 2) # works — view
X.reshape(3, 2) # works — view (no copy needed)
X.T.view(-1) # RuntimeError — strides incompatible
X.T.contiguous().view(-1) # works — explicit copy via contiguous()
X.T.reshape(-1) # works — silent copy
For hot-path code where allocations matter, prefer view and call .contiguous() explicitly so copies are visible. For prototyping, reshape is fine.
Axis-moving operations (the catalogue)
All of these are metadata changes — they swap shape and strides without moving data. They typically produce non-contiguous tensors.
| op | what it does | example |
|---|---|---|
.T |
2D transpose; reverses dims for >2D (deprecated for >2D) | X.T |
.mT |
matrix transpose — swap the last two dims | X.mT |
.t() |
2D transpose, method form | X.t() |
.transpose(d0, d1) |
swap two specific dims | X.transpose(0, 2) |
.swapaxes(d0, d1) / .swapdims(d0, d1) |
aliases for transpose | X.swapaxes(0, 1) |
.permute(*dims) |
reorder all dims at once | X.permute(2, 0, 1) |
.movedim(src, dst) |
move dim(s) from src to dst, slide rest | X.movedim(0, -1) |
.unsqueeze(dim) |
insert a length-1 dim | X.unsqueeze(0) |
.squeeze(dim) |
remove a length-1 dim | X.squeeze(0) |
.flatten(start, end) |
collapse contiguous dim range | X.flatten(1, 3) |
Notes:
permutetakes a tuple listing the source dim for each new position:permute(2, 0, 1)means “new dim 0 is old dim 2; new dim 1 is old dim 0; …”.movedimis the ergonomic “take this one dim and put it over there” form:movedim(0, -1)slides dim 0 to the last position. Also accepts tuples:movedim((0, 1), (-2, -1)).squeeze/unsqueezepreserve contiguity (they only insert / remove length-1 dims).flattenon a non-contiguous tensor will copy, like reshape.
transpose vs reshape to the same shape
These produce same-shape but completely different tensors:
X = torch.arange(120).reshape(2, 3, 4, 5)
Y = X.transpose(0, 2) # shape (4, 3, 2, 5)
Z = X.reshape(4, 3, 2, 5) # shape (4, 3, 2, 5)
The mapping is governed by completely different rules:
- Transpose preserves data semantics.
Y[c, b, a, d] == X[a, b, c, d]— same element, relabelled coordinates. - Reshape preserves the flat byte stream.
Z[c, b, a, d] == X.flatten()[c*30 + b*10 + a*5 + d]— whichever element falls at that flat index under the new shape’s row-major traversal.
Spot check:
X[1, 2, 3, 4] # = 119
Y[3, 2, 1, 4] # = X[1, 2, 3, 4] = 119
Z[3, 2, 1, 4] # = X.flatten()[119] = 119
Y[3, 2, 0, 4] # = X[0, 2, 3, 4] = 59
Z[3, 2, 0, 4] # = X.flatten()[114] = 114
Same coordinate, different elements.
Why this matters
If X is a rollout buffer of shape (T, N, obs_dim):
X.transpose(0, 1)→(N, T, obs_dim), data preserved, semantically still “obs of env n at time t.”X.reshape(N, T, obs_dim)→(N, T, obs_dim), data scrambled, the new dim 0 is no longer “env index.”
If you write the reshape when you meant the transpose, your training loop runs without error and produces garbage.
Which op to use
- Want to relabel dims while keeping the same data at each coordinate? →
transpose/permute/movedim. - Want to reinterpret the flat byte stream under a new shape? →
view/reshape. - Want to do both (e.g., transpose then flatten)? → transpose first, then
reshape. The reshape will copy under the hood, producing a contiguous buffer in the new logical order.
When does a copy happen?
| op | copies? |
|---|---|
transpose, permute, swapdims, movedim |
no (metadata only) |
unsqueeze, squeeze |
no |
view on a compatible-strided tensor |
no |
view on an incompatible-strided tensor |
errors (no copy attempted) |
reshape when strides permit |
no |
reshape when strides forbid (e.g., after permute) |
yes, silently |
flatten on contiguous tensor |
no |
flatten on non-contiguous tensor |
yes |
.contiguous() on contiguous tensor |
no |
.contiguous() on non-contiguous tensor |
yes, explicitly |
.clone() |
yes, always |
.to(other_device) (different device) |
yes |
.to(other_dtype) (different dtype) |
yes |
The rule: a copy happens when the operation needs a new storage layout. Pure axis relabelling does not. Anything that materialises a fresh row-major buffer (contiguous, clone, reshape-after-permute, dtype/device change) does.
Practical rules
- Default to
viewwhen you know the tensor is contiguous and you only want to merge / split dims. - Reach for
.contiguous().view(...)when you want the copy to be visible in code. - Use
reshapefor prototyping; replace with.contiguous().view(...)when reviewing for hot paths. - For complex reorderings, use
einops.rearrange— it makes the intent legible:
from einops import rearrange
# Instead of: x.permute(0, 2, 1, 3).contiguous().view(B, T, H * D)
x = rearrange(x, 'b h t d -> b t (h d)')
Same underlying ops.
- For any shape change after a transpose / permute / slice, assume a copy will happen unless you have specifically reasoned about stride compatibility.
The numpy footnote
NumPy’s reshape accepts an order= parameter ('C' for row-major, 'F' for column-major) that controls the traversal direction. PyTorch does not — it only supports row-major. F-order behaviour can be emulated in PyTorch by transposing before and after the reshape (reshaping to the reversed target shape in between).
tl;dr
- A tensor is storage + shape + strides. Axis-moving ops change shape and strides; storage stays.
reshapewalks the logical tensor row-major and lays out the new shape row-major. For contiguous tensors this is a view; for non-contiguous tensors it copies.viewisreshapewithout the copy option. Errors if a copy would be needed.- Transpose preserves data semantics; reshape reinterprets the flat logical element stream. They produce different tensors even at the same shape.
- The only way a copy happens is when an op needs a new storage layout (
reshapeafter a permute,contiguous, dtype/device change,clone,flattenof non-contiguous). - Use
view(or.contiguous().view()) when allocations should be visible; usereshapewhen convenience matters; useeinops.rearrangewhen the reorder is complex enough to be unreadable.