Yesterday I published zip 3.0.0. In this post I discuss some of the coolest new features in zip 3.0.0. See the change log for the complete list of changes.
Remote unzip
zip_list() and unzip() can now work directly with HTTP(S) URLs.
zip_list() only downloads the directory of the entries from the zip file,
and unzip() only downloads the directory and the requested entries.
Listing the files and extracting a few entries does not download the
whole file.
For this to work the web server needs to support range requests. Most
web servers do, but not all of them. Notably, when downloading the contents
of a GitHub repository as a zip file, the web server does not support
range requests, so zip_list() and unzip() will always fall back to
downloading the whole file.
zip needs the curl package to be installed for HTTP(S) URLs to work.
This was requested in issue #39.
Password support
zip now supports passwords, both when compressing and uncompressing.
It supports the (unsecure) PKWARE ZipCrypto stream cipher and two (secure)
AES ciphers. See the password argument of unzip and zip().
This was requested in issue #38.
Vectorized, concurrent unzip()
unzip() can now handle a vector of zip files to uncompress. Moreover,
unzip() will use a pool of threads to uncompress the files concurrently.
You can set the zip_threads option or the ZIP_THREADS environment
variable to control the size of the thread pool.
Progress bars
unzip() and zip() now can create progress bars when the cli package
is installed. For zip() the progress bar is byte-level, so zipping a
large file will produce a smooth progress bar. For unzip() the progress
bar only counts the extracted entries.
Progress bars are currently opt-in, the ZIP_PROGRESS=true environment
variable or the zip.progress option. I did this to avoid unexpected
progress bars when using zip downstream. E.g. pak has its own progress
bars, and zip’s new progress bars would possibly garble them when pak
calls zip to uncompress R package files.
This was requested in issue #48.
unzip_process fallback
zip includes two small command line executables (cmdzip and cmdunzip)
that are lightweight versions of zip() and unzip() and run independently
of R. The main motivation for this is that pak
uses zip to install (uncompress, really) binary packages on Windows.
Starting a cmdunzip process is very fast compared to starting a new R
process, and pak starts a number of concurrent cmdunzip processes to
install many binary packages quickly.
This usually works great, but sometimes the cmdunzip process is blocked
by system policies. It is quite reasonable to block executables that are
included in R packages. Currently pak just fails in this case, and the only
workaround is to avoid pak or to whitelist the cmdunzip process.
zip 3.0.0 includes a fallback mechanism for this, and if cmdunzip cannot
run, then it will use unzip() in an R subprocess. pak will update to use
zip 3.0.0 in the next release.
This was requested in issue #135.
Other notable changes
unzip()now returns a data frame with data about the uncompressed files, in the same format aszip_list()(Issue #35.)zip_list()andunzip()now do a much better job with file names in non-UTF-8 encodings. (Issue #101.)zip_append()andzipr_append()now replace existing entries when appending a file whose archive path already exists in the zip file, instead of creating duplicate entries. (Issue #111.)- New
keysargument tozip(),zipr()andzip_append()lets you specify custom paths for entries inside the archive. (Issue #50.)
Thank you!
All the new features in zip 3.0.0 were requested by people in the community. I thank all contributors to zip so far, for opening issues, submitting pull requests, and providing feedback: @8qube, @alliesaizan, @AndreM84, @ArtemSokolov, @AshesITR, @babayoshihiko, @bart1, @batpigandme, @bersbersbers, @cboettig, @chainsawriot, @cimentadaj, @context-dependent, @cstepper, @daattali, @davidgohel, @dhersz, @dovydas88, @dracodoc, @egillax, @emmamendelsohn, @enricoschumann, @fproske, @FrancoisR95, @fsteinhi, @gacolitti, @hhmacedo, @jbfagotfede39, @jeanchristophechiem, @jefferis, @jennybc, @jeroen, @jeroenjanssens, @jimhester, @jwijffels, @k5cents, @lz100, @m-muecke, @madihahamza786-debug, @md0u80c9, @MichaelChirico, @Minhoux, @mirickmi, @MislavSag, @Moohan, @msgoussi, @nymphs97, @philipp-baumann, @QuLogic, @RadioPete24, @rosinnia, @schuemie, @scott-uses-git, @scschwa, @sda030, @sebffischer, @sjentsch, @skeydan, @stefanoborini, @tradeli, @Triamus, @weshinsley, @wibeasley, @WilDoane, @xhdong-umd, @yusuzech, and @zeehio.