Notes

PyTorch: Storage, Shape, and Axis-Moving

Matthew Willetts, with assistance from Codex and Claude ·v0.3 ·June 2026

A reference for the operations that change a tensor’s shape or axis order: what copies, what doesn’t, and the precise semantics of each.

The three things that define a tensor

A torch.Tensor is not just a multi-dim array of values. It is three things wrapping a flat storage:

  1. Storage — a flat 1D block of memory holding the actual numbers.
  2. Shape — how to interpret those numbers as a multi-dim grid.
  3. Strides — how many storage elements to skip when each dim index increments by 1.

Multiple tensors can share one storage with different shapes and strides — that is what views are.

Row-major (C-order) layout

PyTorch and NumPy default to row-major layout: the last dim is the “fast” one, and rows are contiguous in storage.

For X = torch.arange(6).reshape(2, 3):

X = [[0, 1, 2],
     [3, 4, 5]]
Storage:  [0, 1, 2, 3, 4, 5]
Shape:    (2, 3)
Strides:  (3, 1)

Strides = (3, 1) says: to move one row, advance 3 in storage; to move one column, advance 1.

A tensor is contiguous when its strides match what you would expect for a freshly-allocated row-major layout. Operations that change axis order break contiguity by leaving strides that no longer match storage order.

The reshape rule

reshape(...) walks the logical tensor in row-major order, then lays out the new shape filling row-major.

The key word is logical. Reshape iterates through the tensor as if it had been freshly materialised into a contiguous buffer, and uses that walk to populate the new shape.

For a contiguous tensor, the logical row-major walk matches storage order, so reshape is just metadata — no copy.

For a non-contiguous tensor (typically after a transpose / permute), the logical row-major walk does not correspond to a stride pattern over the original storage, so PyTorch materialises a copy into a fresh contiguous buffer first, then reshapes.

view vs reshape

X = torch.arange(6).reshape(2, 3)         # contiguous

X.view(3, 2)                              # works — view
X.reshape(3, 2)                           # works — view (no copy needed)

X.T.view(-1)                              # RuntimeError — strides incompatible
X.T.contiguous().view(-1)                 # works — explicit copy via contiguous()
X.T.reshape(-1)                           # works — silent copy

For hot-path code where allocations matter, prefer view and call .contiguous() explicitly so copies are visible. For prototyping, reshape is fine.

Axis-moving operations (the catalogue)

All of these are metadata changes — they swap shape and strides without moving data. They typically produce non-contiguous tensors.

op what it does example
.T 2D transpose; reverses dims for >2D (deprecated for >2D) X.T
.mT matrix transpose — swap the last two dims X.mT
.t() 2D transpose, method form X.t()
.transpose(d0, d1) swap two specific dims X.transpose(0, 2)
.swapaxes(d0, d1) / .swapdims(d0, d1) aliases for transpose X.swapaxes(0, 1)
.permute(*dims) reorder all dims at once X.permute(2, 0, 1)
.movedim(src, dst) move dim(s) from src to dst, slide rest X.movedim(0, -1)
.unsqueeze(dim) insert a length-1 dim X.unsqueeze(0)
.squeeze(dim) remove a length-1 dim X.squeeze(0)
.flatten(start, end) collapse contiguous dim range X.flatten(1, 3)

Notes:

transpose vs reshape to the same shape

These produce same-shape but completely different tensors:

X = torch.arange(120).reshape(2, 3, 4, 5)

Y = X.transpose(0, 2)        # shape (4, 3, 2, 5)
Z = X.reshape(4, 3, 2, 5)    # shape (4, 3, 2, 5)

The mapping is governed by completely different rules:

Spot check:

X[1, 2, 3, 4]   # = 119
Y[3, 2, 1, 4]   # = X[1, 2, 3, 4] = 119
Z[3, 2, 1, 4]   # = X.flatten()[119] = 119

Y[3, 2, 0, 4]   # = X[0, 2, 3, 4] = 59
Z[3, 2, 0, 4]   # = X.flatten()[114] = 114

Same coordinate, different elements.

Why this matters

If X is a rollout buffer of shape (T, N, obs_dim):

If you write the reshape when you meant the transpose, your training loop runs without error and produces garbage.

Which op to use

When does a copy happen?

op copies?
transpose, permute, swapdims, movedim no (metadata only)
unsqueeze, squeeze no
view on a compatible-strided tensor no
view on an incompatible-strided tensor errors (no copy attempted)
reshape when strides permit no
reshape when strides forbid (e.g., after permute) yes, silently
flatten on contiguous tensor no
flatten on non-contiguous tensor yes
.contiguous() on contiguous tensor no
.contiguous() on non-contiguous tensor yes, explicitly
.clone() yes, always
.to(other_device) (different device) yes
.to(other_dtype) (different dtype) yes

The rule: a copy happens when the operation needs a new storage layout. Pure axis relabelling does not. Anything that materialises a fresh row-major buffer (contiguous, clone, reshape-after-permute, dtype/device change) does.

Practical rules

from einops import rearrange

# Instead of: x.permute(0, 2, 1, 3).contiguous().view(B, T, H * D)
x = rearrange(x, 'b h t d -> b t (h d)')

Same underlying ops.

The numpy footnote

NumPy’s reshape accepts an order= parameter ('C' for row-major, 'F' for column-major) that controls the traversal direction. PyTorch does not — it only supports row-major. F-order behaviour can be emulated in PyTorch by transposing before and after the reshape (reshaping to the reversed target shape in between).

tl;dr

← All notes