Skip to content

Invalid/obsolete serialized Data should raise an exception #760

@nevans

Description

@nevans

As a user, if my serialized data doesn't match the current definition, I'd much rather see a crash than have it continue with no exception but corrupted values.

#749 changed this so that missing keys will be converted to nil and extra keys will be ignored. Probably some tests need to be added for the following failure cases (I've included Marshal for comparison).

With the following serializations (same for any recent version of psych):

$ irb -ryaml
irb(main):001> D = Data.define(:foo, :bar)
=> D
irb(main):002> d = D["foo", "bar"]
=> #<data D foo="foo", bar="bar">
irb(main):003> d.to_yaml
=> "--- !ruby/data:D\nfoo: foo\nbar: bar\n"
irb(main):004> Marshal.dump d
=> "\x04\bS:\x06D\a:\bfooI\"\bfoo\x06:\x06ET:\bbarI\"\bbar\x06;\aT"
irb(main):005> 

I see the following behavior in psych 5.3.0:

$ irb -ryaml
irb(main):001> D = Data.define(:a, :b)
=> D
irb(main):002> YAML.unsafe_load "--- !ruby/data:D\nfoo: foo\nbar: bar\n"
=> #<data D a=nil, b=nil>
irb(main):003> 

I believe that is invalid behavior. It should have crashed. Here's how Marshal and psych 5.2.6 handle an invalid serialization:

$ irb -ryaml
irb(main):001> D = Data.define(:a, :b)
=> D
irb(main):002> Marshal.load "\x04\bS:\x06D\a:\bfooI\"\bfoo\x06:\x06ET:\bbarI\"\bbar\x06;\aT"
<internal:marshal>:34:in 'Marshal.load': struct D not compatible (:foo for :a) (TypeError)
	from (irb):2:in '<main>'
	from <internal:kernel>:168:in 'Kernel#loop'
	from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/irb-1.15.3/exe/irb:9:in '<top (required)>'
	from /home/nick/.local/share/rbenv/versions/3.4.5/bin/irb:25:in 'Kernel#load'
	from /home/nick/.local/share/rbenv/versions/3.4.5/bin/irb:25:in '<main>'
irb(main):003> YAML.unsafe_load "--- !ruby/data:D\nfoo: foo\nbar: bar\n"
/home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:222:in 'Psych::Visitors::ToRuby#init_struct': unknown keywords: foo, bar (ArgumentError)

          init_struct(data, **members)
                      ^^^^^^^^^^^^^^^
	from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:222:in 'Psych::Visitors::ToRuby#visit_Psych_Nodes_Mapping'
	from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:30:in 'Psych::Visitors::Visitor#visit'
	from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:6:in 'Psych::Visitors::Visitor#accept'
	from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:35:in 'Psych::Visitors::ToRuby#accept'
	from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:345:in 'Psych::Visitors::ToRuby#visit_Psych_Nodes_Document'
	from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:30:in 'Psych::Visitors::Visitor#visit'
	from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/visitor.rb:6:in 'Psych::Visitors::Visitor#accept'
	from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/visitors/to_ruby.rb:35:in 'Psych::Visitors::ToRuby#accept'
	from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych/nodes/node.rb:49:in 'Psych::Nodes::Node#to_ruby'
	from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/psych-5.2.6/lib/psych.rb:275:in 'Psych.unsafe_load'
	from (irb):3:in '<main>'
	from <internal:kernel>:168:in 'Kernel#loop'
	from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/irb-1.15.3/exe/irb:9:in '<top (required)>'
	from /home/nick/.local/share/rbenv/versions/3.4.5/bin/irb:25:in 'Kernel#load'
	from /home/nick/.local/share/rbenv/versions/3.4.5/bin/irb:25:in '<main>'
irb(main):004>

For what it's worth, I don't think the scenario where the parameter order changed is important. Although Data.define does provide an explicit ordering, which can be used with ::new or ::[], Data#initialize always uses keyword args which never care about order. I believe Psych handles this correctly and maybe Marshal should be updated to not care about Data member order.

$ ruby --version
ruby 3.4.5 (2025-07-16 revision 20cda200d3) +PRISM [x86_64-linux]
$ irb -ryaml
irb(main):001> D = Data.define(:bar, :foo)
=> D
irb(main):002> Marshal.load "\x04\bS:\x06D\a:\bfooI\"\bfoo\x06:\x06ET:\bbarI\"\bbar\x06;\aT"
<internal:marshal>:34:in 'Marshal.load': struct D not compatible (:foo for :bar) (TypeError)
	from (irb):2:in '<main>'
	from <internal:kernel>:168:in 'Kernel#loop'
	from /home/nick/.local/share/rubies/3.4.5/lib/ruby/gems/3.4.0/gems/irb-1.15.3/exe/irb:9:in '<top (required)>'
	from /home/nick/.local/share/rbenv/versions/3.4.5/bin/irb:25:in 'Kernel#load'
	from /home/nick/.local/share/rbenv/versions/3.4.5/bin/irb:25:in '<main>'
irb(main):003> YAML.unsafe_load "--- !ruby/data:D\nfoo: foo\nbar: bar\n"
=> #<data D bar="bar", foo="foo">
irb(main):004> 

Originally posted by @nevans in #749 (comment)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions