The gzip module provides the GzipFile class, as well as theopen(), compress() and decompress() convenience functions.The GzipFile class reads and writes gzip-format files,automatically compressing or decompressing the data so that it looks like anordinary file object.
Note that additional file formats which can be decompressed by thegzip and gunzip programs, such as those produced bycompress and pack, are not supported by this module.
When fileobj is not None, the filename argument is only used to beincluded in the gzip file header, which may include the originalfilename of the uncompressed file. It defaults to the filename of fileobj, ifdiscernible; otherwise, it defaults to the empty string, and in this case theoriginal filename is not included in the header.
All gzip compressed streams are required to contain thistimestamp field. Some programs, such as gunzip, make useof the timestamp. The format is the same as the return value oftime.time() and the st_mtime attribute ofthe object returned by os.stat().
Decompress the data, returning a bytes object containing theuncompressed data. This function is capable of decompressing multi-membergzip data (multiple gzip blocks concatenated together). When the data iscertain to contain only one member the zlib.decompress() function withwbits set to 31 is faster.
First gzip compress the string 'hello'+'\r\n'+'world' using gzip module in python and then encode that compressed string to base64 in python. The output I get for this is H4sIAM7yqVcC/8tIzcnJ5+Uqzy/KSQEAQmZWMAwAAAA=
Then I use the encoded compressed string from python in java to gzip decompress that string. For that I fisrt perform base64 decode on that string in java using DatatypeConverter.parseBase64Binary which will give a byte array and then I perform gzip decompression on that byte array using GZIPInputStream. But the decompressed output in java is shown as helloworld.
If a file is compressed using gzip command, a suffix i.e. .gz will be added to the file name after compression. Hence while uncompressing this file we can either use the original file name as shown in Example 1 or the filename with the suffix .gz as shown in Example 2 as an argument.
GNU zip (gzip) is a compression utility that reduces the size of selected files. Files compressed with gzip frequently have the .gz file extension. The gzip command has several options. These command options are described in the following table.
Here, FunctionResult is still the gzipped contentWe would assume playfab decompressed the content it received from azure, builds the final json and gzippes it again when returning to the clientThe major problem here is that suddenly the returned data size can increase dramatically when switching from cloudscript to azure, resulting in multiple failed functions because of a too big data size returned from azure to playfab
P.s.: The same issue happens when using ExecuteFunction.cs for local azure debugging. It is clearly visible in the code that the returned data from Azure is never unzipped but just written into the FunctionResults property and the entire JSON is than compresses again if accept-encoding:gzip is set
Azure do compresses the result to gzip by default, however, when it reached to PlayFab it will get unzipped. By default, PlayFab support the response size of 65536 bytes from Azure, therefore if the unzipped result size is too large (larger than 65536 bytes), the error CloudScriptAzureFunctionsReturnSizeExceeded will occur.
Based on the current situation, most likely the return data got gzipped twice, you can try unzip the function result returned by PlayFab again and see if it's the correct data returned by Azure Function.
This is what we tried already, but for some reason the compressed FunctionResult is always corrupt, but just in the second byte (the magic gzip bytes 31, 139 always return as 31, 239), the remaining data is correct
But, besides the corruption why would we receive the return data too large in this case? We don't get this issue if we compress ourself. If Azure compresses by default, we shouldn't get this error for a 150KB return array which gzip compresses to 6KB, correct?
Sometimes, we would like to compress one or several files into one zipped file or decompress a zipped file. It is very common to use tools such as gzip, zip, or 7zip to create or decompress .gz, .zip, and .7z files, respectively. However, none of these tools on Linux uses multicore and multithread during compression and decompression. When the number of files are large or the file sizes are large, compression and decompression would take a lot of time using single thread.
Pigz is one of the parallel implementation for gzip and zip. Using pigz could greatly save us the time spent on compression and decompression. In this blog post, I would like to briefly discuss how to use pigz.
In this essay I'll describe how I improved chemfp's gzip read performance by about15% by replacing Python's built-in gzip modulewith a ctypesinterface to libz. If you need faster gzip read performance, you mightconsider using zcat or similar tool in a subprocess - if so, look atthe xopen module.
Not bad, but as I pointed out in my previousessay, it takes about twice as long for chemfp to read agzip-compressed FPS file than an uncompressed one. (And that's withthe faster gzio reader I'm about to discuss.)
Python's gzip package, like many gzip implementations, builds on thezlib C library to handle compressionand decompression. That zlib library is available on most systems as ashared library. Python's ctypes module gives Python code a way to talkdirectly to shared libraries like zlib. This mechanism is often called aforeign function interface.
The zcat program (gzcat on a Mac; use gzip -dc tobe portable) decompresses a gzip-compressed file and writes theresults to stdout. Presumably this is well optimized and should let usknow how much performance we can expect.% time gzip -dc Compound_000000001_000500000.sdf.gz > /dev/nullreal0m10.905suser0m10.824ssys0m0.080sIt's very peculiar that zcat is slower than Python. I did theabove timings with the system version, gzip 1.6. I also installed gzip1.10, but the timings were about the same. In case it helps,/etc/debian_version says it's buster/sid.
I installed pigz 2.4, whichwas significantly faster:% time pigz -dc Compound_000000001_000500000.sdf.gz > /dev/nullreal0m4.813suser0m7.515ssys0m0.282s(The user time higher than the real time likely becauseof multithreading. You'll note that the overall user+sys isstill a couple seconds faster than gzip's real time.)
I primarily work on a Mac but I don't tend to do timings on it becausethe background system activity (like web pages and media playing) canhave a big effect on the timing. That's why I used a Debian machinefor the above timings. I tried the system version of gzcat ("Applegzip 272.250.1") and installed GNU gzip 1.10; the data file isn'tquite the same size, but close enough:% gzcat --versionApple gzip 272.250.1% gzcat Compound_000000001_000500000.sdf.gz wc -c 2738222236% time gzcat Compound_000000001_000500000.sdf.gz > /dev/null4.179u 0.154s 0:04.67 92.5%0+0k 0+0io 0pf+0w% /local/bin/zcat --versionzcat (gzip) 1.10Copyright (C) 2007, 2011-2018 Free Software Foundation, Inc.This is free software. You may redistribute copies of it under the terms ofthe GNU General Public License .There is NO WARRANTY, to the extent permitted by law.Written by Paul Eggert.% time /local/bin/zcat Compound_000000001_000500000.sdf.gz > /dev/null4.193u 0.157s 0:04.69 92.5%0+0k 0+0io 0pf+0wThat's twice as fast as the Debian machine, for reasons I still don't undertand.
Now back to the benchmark code using Python's gzip module and my gziomodule:% python time_gz.pydt: 9.05 sec gzin: 35.3 MiB/sec out: 288.5 MiB/sec (2738222236 bytes)% python time_gzio.pydt: 7.02 sec gzin: 45.6 MiB/sec out: 372.0 MiB/sec (2738222236 bytes)That means my gzio package is about 20% faster than Python's gzip, butnot quite half the performance of Apple gzip or GNU gzip.
I am far from the first to point out that it's faster to use zcat thanPython's gzip library. The xopen module, forexample, can use the command-line pigz or gzip programs as asubprocess to decompress a file, then read from the program's stdoutvia a pipe. This approach provides a basic form of parallelization asthe decompression is in a different process than the parser for thefile contents.
It's not hard to roll your own using the subprocess module, but thereare a few annoying details to get right. For example, what if the gzipprocess fails because the file isn't found, or because the file isn'tin gzip format? The process will start successfully then quicklyexit. So, when is that error reported?
By default chemfp uses my gzio wrapper to libz. It can be configuredto use Python's gzip library, or to used a subprocess. It does not usexopen - I rolled my own version using subprocess - though afterlooking at the xopen code I'm reconsidering that decision.
These timings are not that precise because of background activity onmy laptop, but the ranking is generally the same. It's definitelyenough to show there are gzip reading options which are faster thanPython's built-in module.
This method exists even if the optional gzip feature is not enabled.This can be used to ensure a Client doesn't use gzip decompressioneven if another dependency were to enable the optional gzip feature.
An initramfs is a small filesystem which gets loaded just after your OS boots. It contains drivers and configuration data necessary to get your computer to the point where it can mount the real filesystem in all its glory. It also happens to be (on my machine) gzip compressed.
The most popular compression format on the web is gzip. We put a great deal of effort into improving the performance of the gzip compression, so we can perform compression on the fly with fewer CPU cycles. Recently a potential replacement for gzip, called Brotli, was announced by Google. Being early adopters for many technologies, we at CloudFlare want to see for ourselves if it is as good as claimed. 041b061a72