Skip to content

Fix Net.forward_backward_all losing entries when outputs and inputs count differ#7099

Open
Chessing234 wants to merge 1 commit intoBVLC:masterfrom
Chessing234:fix/net-forward-backward-all-zip-mismatch
Open

Fix Net.forward_backward_all losing entries when outputs and inputs count differ#7099
Chessing234 wants to merge 1 commit intoBVLC:masterfrom
Chessing234:fix/net-forward-backward-all-zip-mismatch

Conversation

@Chessing234
Copy link
Copy Markdown

Bug

`Net.forward_backward_all` (attached from `_Net_forward_backward_all` in `python/caffe/pycaffe.py`) silently leaves some entries as un-converted Python lists, potentially still padded with zero rows, whenever the net has a different number of outputs+extra blobs than inputs+extra diffs.

Root cause

After collecting per-batch forward and backward results into two separate dicts (one keyed by `self.outputs ∪ blobs`, the other by `self.inputs ∪ diffs`), the function wraps up with:

```python

Package in ndarray.

for out, diff in zip(all_outs, all_diffs):
all_outs[out] = np.asarray(all_outs[out])
all_diffs[diff] = np.asarray(all_diffs[diff])

Discard padding at the end and package in ndarray.

pad = len(six.next(six.itervalues(all_outs))) - len(six.next(six.itervalues(kwargs)))
if pad:
for out, diff in zip(all_outs, all_diffs):
all_outs[out] = all_outs[out][:-pad]
all_diffs[diff] = all_diffs[diff][:-pad]
```

`all_outs` and `all_diffs` are independent dicts with independent key sets. `zip(all_outs, all_diffs)` stops at the shorter of the two, so any excess entries in the longer dict:

  • are never turned into ndarrays by `np.asarray` — they stay as Python lists of numpy arrays, and
  • are never trimmed to the real batch size, so they keep the zero-padding appended at the end of the last partial batch in `_Net_batch`.

This bites every net with a different number of outputs than inputs, which is extremely common (e.g. one-input-multiple-heads classification+regression nets). Only the `len(all_outs) == len(all_diffs)` case (1-input, 1-output, no extras) happens to work.

Fix

Walk each dict independently instead of zipping them together:

```diff
# Package in ndarray.

  • for out, diff in zip(all_outs, all_diffs):
  •    all_outs[out] = np.asarray(all_outs[out])
    
  •    all_diffs[diff] = np.asarray(all_diffs[diff])
    
  • for d in (all_outs, all_diffs):
  •    for k in d:
    
  •        d[k] = np.asarray(d[k])
    

    Discard padding at the end and package in ndarray.

    pad = len(six.next(six.itervalues(all_outs))) - len(six.next(six.itervalues(kwargs)))
    if pad:
  •    for out, diff in zip(all_outs, all_diffs):
    
  •        all_outs[out] = all_outs[out][:-pad]
    
  •        all_diffs[diff] = all_diffs[diff][:-pad]
    
  •    for d in (all_outs, all_diffs):
    
  •        for k in d:
    
  •            d[k] = d[k][:-pad]
    

```

No behavioral change for nets where `len(all_outs) == len(all_diffs)` — the prior `zip` already covered every entry in that case; every other shape is now correct too.

…count differ

`Net.forward_backward_all` packages both the collected forward blobs
(`all_outs`, one entry per net output and per requested `blobs=`)
and the collected backward diffs (`all_diffs`, one entry per net
input and per requested `diffs=`). Two loops finish up by turning
each entry into an ndarray and trimming the end-of-batch padding:

    for out, diff in zip(all_outs, all_diffs):
        all_outs[out] = np.asarray(all_outs[out])
        all_diffs[diff] = np.asarray(all_diffs[diff])

    if pad:
        for out, diff in zip(all_outs, all_diffs):
            all_outs[out] = all_outs[out][:-pad]
            all_diffs[diff] = all_diffs[diff][:-pad]

`all_outs` and `all_diffs` are separate dicts keyed independently
(outputs ∪ extra `blobs` vs. inputs ∪ extra `diffs`), so in the
common multi-task case — e.g. a net with a single input and multiple
output heads — they have different lengths. `zip` stops at the
shorter dict and the remaining entries in the longer dict are never
converted to ndarrays and never have padding trimmed, so the
returned dicts silently contain a mixture of ndarrays and unconverted
Python lists, with some still padded with zero rows at the end.

Walk each dict independently instead of zipping them together, so
every entry in both dicts is converted and trimmed regardless of
their relative lengths. No behavioral change for nets where
`len(all_outs) == len(all_diffs)` (the prior zip covered everything
in that case).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant