Conversation
| def verify_gz(entry) # :nodoc: | ||
| Zlib::GzipReader.wrap entry do |gzio| | ||
| # TODO: read into a buffer once zlib supports it | ||
| gzio.read 16_384 until gzio.eof? # gzip checksum verification |
There was a problem hiding this comment.
It seems that we read until EOF because that's when the GZipReader will verify the gzip checksum header against the decompressed data. I read that from the doc https://docs.ruby-lang.org/en/3.4/Zlib/GzipReader.html
When an reading request is received beyond the end of file (the end of compressed data). That is, when Zlib::GzipReader#read [...] reading returns nil.
Are we still reading at eof anywhere else so that the checksum verification is still performed?
There was a problem hiding this comment.
rubygems.org currently depends on this code to validate gems being pushed. Talking with Arron this morning, it should be fairly trivial 🤞🏻 to move, but I wanted to highlight this so any changes can be coordinated between the two projects.
|
Ah, just as a FYI since I'm seeing this (trying to fix something in the same areas), when we swap the "spec" object (from an EndpointSpecification to the actual Specification read from the gemspec), we perform the verification and if there is an issue we remove the rubygems/bundler/lib/bundler/source/rubygems.rb Lines 193 to 205 in 9711b0c I think now we'll no longer remove the |
This PR removes the `#verify_gz` method because it is redunant and unnecessary. Previously the `data.tar.gz` would get read twice for every file - once in `verify_gz` and once in `extract_files`. The `extract_files` method verifies the `data.tar.gz` when it reads it, and raises an error if unzipping it fails. The `verify_gz` code can be seen in some profiles as a hotspot - although not major - as it accounts for between 9% and 17% of time, but only when the installation thread doesn't have native extensions or plugins.
9e0e706 to
737c829
Compare
tenderlove
left a comment
There was a problem hiding this comment.
I think this is good and we should merge it.
We merged this PR which ensures that gems are written atomically when they're downloaded. Since they're written atomically, we'll never try to extract a corrupted gem so there's no need to check. As @colby-swandale mentioned, this code was being used by RubyGems.org on upload to verify integrity of the gem, but we've removed that in this PR.
@hsbt you tagged this as a "breaking change". I don't think it breaks anything, but I could be wrong. Do you know what it breaks?
|
I've labeled it this way because it means we're no longer doing what we were doing before. It's just a backport and documentation usage, not a merge that won't happen until Version 5. Shall we change it to performance label? |
|
I agree with changing it to performance label 👍🏻 |
This PR removes the
#verify_gzmethod because it is redunant and unnecessary.Previously the
data.tar.gzwould get read twice for every file - once inverify_gzand once inextract_files. Theextract_filesmethod verifies thedata.tar.gzwhen it reads it, and raises an error if unzipping it fails.The
verify_gzcode can be seen in some profiles as a hotspot - although not major - as it accounts for between 9% and 17% of time, but only when the installation thread doesn't have native extensions or plugins.Note: to create this profile I actually split the fetching and installing into two steps so I could profile just installing all the gems. I then profiled just the install code.
What was the end-user or developer problem that led to this PR?
I was looking at profiles for bundler and noticed we spend time in
verify_gzbut that method is redundant.What is your fix for the problem, implemented in this PR?
See commit message
Make sure the following tasks are checked
cc/ @tenderlove @Edouard-chin