Clipfile zip archives
From PTAGISWiki
Contents |
Problem:
Attempts to download and open clipfiles with the built-in windows unzip tool or winzip result in this error:
Zip file corrupt. Possible cause -- file transfer error.
Attempts to download and open clipfiles with Mac OSX fail in a similar way with StuffitExpander.
The zip archives are being constructed with java's java.util.zip.* library. The resulting archives have leading characters that cause winzip to think the file is corrupt. Unix unzip is able to see the extra characters, but will unzip the file anyway:
[rday@snapper downloads]$ unzip -l clipfiles.zip
Archive: clipfiles.zip
warning [clipfiles.zip]: 14 extra bytes at beginning or within zipfile
(attempting to process anyway)
Length Date Time Name
-------- ---- ---- ----
2250 03-30-07 11:05 C501529.txt
-------- -------
2250 1 file
Here are the leading characters in the clipfiles.zip made with java.util.zip:
[rday@snapper downloads]$ od -c first.zip | head 0000000 \r \n \r \n \r \n \r \n \r \n \r \n \r \n P K 0000020 003 004 024 \0 \b \0 \b \0 254 X ~ 6 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \v \0 \0 \0 C 5 0 1
And here is a zip archive made unix zip:
[rday@snapper downloads]$ od -c first.zip | head 0000000 P K 003 004 024 \0 \b \0 \b \0 254 X ~ 6 \0 \0 0000020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \v \0 \0 \0 C 5
The source code that generates the zip archives of clipfiles is here:
/global/ds1/pitweb/ptagis-1.0/src/db/ClipFiles.java
That java object is called by this JSP page: ptagis/services/download.jsp
Javadocs for the java.util.zip.ZipOutputStream object are here: http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipOutputStream.html
The source can be exercised in isolation by running it from the command line on sebastes like this:
bash-2.05# pwd /global/ds1/pitweb/ptagis-1.0 bash-2.05# bin/run.sh db.ClipFiles web/ptagis/WEB-INF/db.properties packageBagsInZipStream testclips.txt C501999 Initializing the ClipFiles class from "web/ptagis/WEB-INF/db.properties". Using the driver "ca.edbc.jdbc.EdbcDriver". Writing zip file to testclips.txt. args[3]=C501999 sBagIds=1 vBagIds=[C501999] Zipping C501999.txt... Getting C501999.txt... Got it. Zipping done. Done.
This results in a file called testclips.txt that does NOT have the leading newline, carriage return characters:
bash-2.05# od -c testclips.txt | head 0000000 P K 003 004 024 \0 \b \0 \b \0 344 R 202 6 \0 \0 0000020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 013 \0 \0 \0 C 5 0000040 0 1 9 9 9 . t x t U 326 I 222 004 ! \b bash-2.05# unzip -l testclips.txt Archive: testclips.txt Length Date Time Name ------ ---- ---- ---- 2250 04-02-07 10:23 C501999.txt ------ ------- 2250 1 file
WORKS
Running the java object from the command line causes this code to be executed:
System.out.println("Writing zip file to "+args[2]+".");^M
FileOutputStream oStream= new FileOutputStream(args[2]);^M
System.out.println("args[3]="+args[3]);^M
String [] sBagIds= args[3].split(",");^M
System.out.println("sBagIds="+sBagIds.length);^M
Vector vBagIds= new Vector();^M
for (int i=0; i<sBagIds.length; i++){^M
vBagIds.add(sBagIds[i]);^M
}^M
System.out.println("vBagIds="+vBagIds);^M
oClipFiles.packageBagsInZipStream(oStream,vBagIds);^M
DOESN'T WORK
The download.jsp page uses the following code to invoke the packageBagsInZipStream object:
<%^M
String sFileName= "clipfiles.zip";^M
^M
/// retrieves the input arguments^M
String [] asClipFiles= request.getParameterValues("clipFiles");^M
Vector vBagIds= new Vector();^M
if (asClipFiles.length==0){^M
/// this should not happen^M
out.println("No clip files have been specified for download.");^M
}else{^M
^M
for (int i=0; i<asClipFiles.length; i++){^M
vBagIds.add(asClipFiles[i]);^M
}^M
^M
response.setContentType("application/x-zip-compressed");^M
// this header makes the file offer a download dialog with the given name^M
response.addHeader("Content-Disposition", "attachment; filename="+sFileName+";");^M
^M
oClipFiles.packageBagsInZipStream(response.getOutputStream(),vBagIds);^M
}^M
%>^M
Workarounds
This perl command will strip off the leading characters, if they are present.
perl -p -i -e "s/^[\n\r]*//" clipfiles.zip
Other windows zip utilities (Freezip, TotalCommander) are able to read the archives.
Freezip download page:
Solution
The problem was caused by whitespace in the download.jsp file. In a jsp file, everthing in angle brackets is executed as java servlet code and everything outside angle brackets is displayed as html or, in this case, delivered to the browser in the download stream.
The extraneous characters can be seen above in the od octal dump of the resulting zip file: 14 characters of newlines and carriage returns. Those characters were present in the download.jsp file *outside of the angle brackets*.
One way to solve the problem is to write download.jsp like this:
<%@ page import="db.DBConnection" %><%@ page import="db.ClipFiles" %><jsp:useBean id="oPTAGISProperties" scope="application" class="util.PTAGISProperties"/><jsp:useBean id="oDBConnection" scope="application" class="db.DBConnection"/><jsp:useBean id="oClipFiles" scope="request" class="db.ClipFiles"><% oClipFiles.setConnection(oDBConnection); %></jsp:useBean><%
String sFileName= "clipfiles.zip";
/// retrieves the input arguments
String [] asClipFiles= request.getParameterValues("clipFiles");
Vector vBagIds= new Vector();
if (asClipFiles.length==0){
/// this should not happen
///out.println("No clip files have been specified for download.");
}else{
for (int i=0; i<asClipFiles.length; i++){
vBagIds.add(asClipFiles[i]);
}
response.setContentType("application/x-zip-compressed");
// this header makes the file offer a download dialog with the given name
response.addHeader("Content-Disposition", "attachment; filename="+sFileName+";");
oClipFiles.packageBagsInZipStream(response.getOutputStream(),vBagIds);
}
%>
