On this page:
unzip
call-with-unzip
make-filesystem-entry-reader
read-zip-directory
zip-directory?
zip-directory-entries
zip-directory-contains?
zip-directory-includes-directory?
unzip-entry
call-with-unzip-entry
path->zip-path
exn:  fail:  unzip:  no-such-entry

5 zip File Extraction

David Herman

 (require file/unzip) package: base
The file/unzip library provides a function to extract items from a zip archive.

procedure

(unzip in    
  [entry-reader    
  #:must-unzip? must-unzip?    
  #:preserve-attributes? preserve-attributes?    
  #:preserve-timestamps? preserve-timestamps?    
  #:utc-timestamps? utc-timestamps?])  void?
  in : (or/c path-string? input-port?)
  entry-reader : 
(cond
  [preserve-attributes?
   (bytes? boolean? input-port? (and/c hash? immutable?)
           . -> . any)]
  [preserve-timestamps?
   (bytes? boolean? input-port? (or/c #f exact-integer?)
           . -> . (or/c #f (-> any)))]
  [else
   (bytes? boolean? input-port? . -> . any)])
   = (make-filesystem-entry-reader)
  must-unzip? : any/c = #t
  preserve-attributes? : any/c = #f
  preserve-timestamps? : any/c = #f
  utc-timestamps? : any/c = #f
Unzips an entire zip archive from in. If in does not start with zip-archive magic bytes, an error is reported only if must-unzip? is true, otherwise the result is (void) with no bytes consumed from in. If in is an input port and preserve-attributes? is a true value, it must support position setting via file-position.

For each entry in the archive, the entry-reader procedure is called with three or four arguments: the byte string representing the entry name, a boolean flag indicating whether the entry represents a directory, an input port containing the inflated contents of the entry, and either (if preserve-attributes?) a hash table or (if preserve-timestamps? and not preserve-attributes?) #f or a timestamp. The default entry-reader unpacks entries to the filesystem; call make-filesystem-entry-reader to configure aspects of the unpacking, such as the destination directory.

When preserve-attributes? is true, the hash table passed to entry-reader provides additional file attributes, and entry-reader must produce either #f for a post-action thunk. All post-action thunks are run in order after the last call to entry-reader; these acions are useful for setting permissions on a directory after all contained files are written, for eample. Attributes are mapped in the hash table using the following keys, but either of the keys may be absent:

Although preserve-attributes? and preserve-timestamps? provide extra information to entry-reader, unpacking entries and preserving attributes and timestamps is up to entry-reader. The reader produced by make-filesystem-entry-reader preserves whatever information is it given, except for directories on Windows or directories that already exist, and it returns a post-action thunk only when given a directory plus a timestamp and/or permission attribute.

For timestamps, zip archives normally record modification dates in local time, but if utc-timestamps? is true, then the time in the archive is interpreted as UTC.

When preserve-attributes? is #f, then in is read in a single pass as long as file entries are found. Beware that if the input represents an archive that has file entries not referenced by the “central directory” in the archive, the corresponding files are unpacked, anyway.

Changed in version 6.0.0.3 of package base: Added the #:preserve-timestamps? argument.
Changed in version 6.0.1.12: Added the #:utc-timestamps? argument.
Changed in version 8.0.0.10: Added the #:must-unzip? argument.
Changed in version 8.2.0.7: Changed the #:must-unzip? default to #t.
Changed in version 8.7.0.9: Added the #:preserve-attributes? argument.

procedure

(call-with-unzip in    
  proc    
  [#:must-unzip? must-unzip?])  any
  in : (or/c path-string? input-port?)
  proc : (-> path-string? any)
  must-unzip? : any/c = #t
Unpacks in to a temporary directory, calls proc on the temporary directory’s path, and then deletes the temporary directory while returning the result of proc.

Like unzip, no error is reported in the case in is not a zip archive, unless must-unzip? is true.

Added in version 6.0.1.6 of package base.
Changed in version 8.0.0.10: Added the #:must-unzip? argument.
Changed in version 8.2.0.7: Changed the #:must-unzip? default to #t.

procedure

(make-filesystem-entry-reader [#:dest dest-path 
  #:strip-count strip-count 
  #:permissive? permissive? 
  #:exists exists]) 
  
((bytes? boolean? input-port?) ((or/c hash? #f exact-integer?))
 . ->* . (or/c void? #f (-> void?)))
  dest-path : (or/c path-string? #f) = #f
  strip-count : exact-nonnegative-integer? = 0
  permissive? : any/c = #f
  exists : 
(or/c 'skip 'error 'replace 'truncate
      'truncate/replace 'append 'update
      'can-update 'must-truncate)
 = 'error
Creates a zip entry reader that can be used with either unzip or unzip-entry and whose behavior is to save entries to the local filesystem. Intermediate directories are always created if necessary before creating files. Directory entries are created as directories in the filesystem, and their entry contents are ignored.

If dest-path is not #f, every path in the archive is prefixed to determine the destination path of the extracted entry.

If strip-count is positive, then strip-count path elements are removed from the entry path from the archive (before prefixing the path with dest-path); if the item’s path contains strip-count elements, then it is not extracted.

Unless permissive? is true, then entries with paths containing an up-directory indicator are disallowed, and a link entry whose target is an absolute path or contains an up-directory indicator is also disallowed. Absolute paths are always disallowed. A disallowed path triggers an exception.

If exists is 'skip and the file for an entry already exists, then the entry is skipped. Otherwise, exists is passed on to open-output-file for writing the entry’s inflated content.

When the resulting returned procedure is called, it will produce (void) unless it is given a hash table as a fourth argument. When given a hash table, the result is either #f or a thunk. A thunk is returned on Unix and Mac OS when arguments refer to a directory that does not already exist and either a timestamp attribute, permission attribure, or both are provided.

Changed in version 6.0.0.3 of package base: Added support for the optional timestamp argument in the result function.
Changed in version 6.3: Added the #:permissive? argument.
Changed in version 8.7.0.9: Added support for an optional attributes hash-table argument in the result function.

Reads the central directory of a zip file and generates a zip directory representing the zip file’s contents. If in is an input port, it must support position setting via file-position.

This procedure performs limited I/O: it reads the list of entries from the zip file, but it does not inflate any of their contents.

procedure

(zip-directory? v)  boolean?

  v : any/c
Returns #t if v is a zip directory, #f otherwise.

procedure

(zip-directory-entries zipdir)  (listof bytes?)

  zipdir : zip-directory?
Extracts the list of entries for a zip archive.

procedure

(zip-directory-contains? zipdir name)  boolean?

  zipdir : zip-directory?
  name : (or/c bytes? path-string?)
Determines whether the given entry name occurs in the given zip directory. If name is not a byte string, it is converted using path->zip-path.

Directory entries match with or without trailing slashes.

procedure

(zip-directory-includes-directory? zipdir    
  name)  boolean?
  zipdir : zip-directory?
  name : (or/c bytes? path-string?)
Determines whether the given name is included anywhere in the given zip directory as a filesystem directory, either as an entry itself or as the containing directory of other entries. If name is not a byte string, it is converted using path->zip-path.

procedure

(unzip-entry in 
  zipdir 
  entry 
  [entry-reader 
  #:preserve-attributes? preserve-attributes? 
  #:preserve-timestamps? preserve-timestamps? 
  #:utc-timestamps? utc-timestamps?]) 
  (if preserve-attributes? void? (or/c #f (-> any)))
  in : (or/c path-string? input-port?)
  zipdir : zip-directory?
  entry : (or/c bytes? path-string?)
  entry-reader : 
(cond
  [preserve-attributes?
   (bytes? boolean? input-port? (and/c hash? immutable?)
           . -> . any)]
  [preserve-timestamps?
   (bytes? boolean? input-port? (or/c #f exact-integer?)
           . -> . any)]
  [else
   (bytes? boolean? input-port? . -> . any)])
   = (make-filesystem-entry-reader)
  preserve-attributes? : any/c = #f
  preserve-timestamps? : any/c = #f
  utc-timestamps? : any/c = #f
Unzips a single entry from a zip archive based on a previously read zip directory, zipdir, from read-zip-directory. If in is an input port, it must support position setting via file-position.

The entry parameter is a byte string whose name must be found in the zip file’s central directory. If entry is not a byte string, it is converted using path->zip-path.

The entry-reader argument is used to read the contents of the zip entry in the same way as for unzip. When preserve-attributes? is a true value, the result of entry-reader is returned by unzip-entry, and it will be either #f or a post-action thunk. The returned post-action thunks should all be called after extracting from in is complete.

If entry is not in zipdir, an exn:fail:unzip:no-such-entry exception is raised.

Changed in version 6.0.0.3 of package base: Added the #:preserve-timestamps? argument.
Changed in version 6.0.1.12: Added the #:utc-timestamps? argument.
Changed in version 8.7.0.9: Added the #:preserve-attributes? argument.

procedure

(call-with-unzip-entry in entry proc)  any

  in : (or/c path-string? input-port?)
  entry : path-string?
  proc : (-> path-string? any)
Unpacks entry within in to a temporary directory, calls proc on the unpacked file’s path, and then deletes the temporary directory while returning the result of proc.

Added in version 6.0.1.6 of package base.

procedure

(path->zip-path path)  bytes?

  path : path-string?
Converts a file name potentially containing path separators in the current platform’s format to use path separators recognized by the zip file format: /.

struct

(struct exn:fail:unzip:no-such-entry exn:fail (entry)
    #:extra-constructor-name make-exn:fail:unzip:no-such-entry)
  entry : bytes?
Raised when a requested entry cannot be found in a zip archive. The entry field is a byte string representing the requested entry name.