Skip to content Skip to sidebar Skip to footer

How To Parse A Zipped File Completely From RAM?

Background I need to parse some zip files of various types (getting some inner files content for one purpose or another, including getting their names). Some of the files are not

Solution 1:

I took a look at the JNI code you posted and made a couple of changes. Mostly it is defining the size argument for NewDirectByteBuffer and using malloc().

Here is the output of the log after allocating 800mb:

D/AppLog: total:1.57 GB free:1.03 GB used:541 MB (34%)
D/AppLog: total:1.57 GB free:247 MB used:1.32 GB (84%)

And the following is what the buffer looks like after the allocation. As you can see, the debugger is reporting a limit of 800mb which is what we expect.

enter image description here My C is very rusty, so I am sure that there is some work to be done. I have updated the code to be a little more robust and to allow for the freeing of memory.

native-lib.cpp

extern "C" {
static jbyteArray *_holdBuffer = NULL;
static jobject _directBuffer = NULL;
/*
    This routine is not re-entrant and can handle only one buffer at a time. If a buffer is
    allocated then it must be released before the next one is allocated.
 */
JNIEXPORT
jobject JNICALL Java_com_example_zipfileinmemoryjni_JniByteArrayHolder_allocate(
        JNIEnv *env, jobject obj, jlong size) {
    if (_holdBuffer != NULL || _directBuffer != NULL) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Call to JNI allocate() before freeBuffer()");
        return NULL;
    }

    // Max size for a direct buffer is the max of a jint even though NewDirectByteBuffer takes a
    // long. Clamp max size as follows:
    if (size > SIZE_T_MAX || size > INT_MAX || size <= 0) {
        jlong maxSize = SIZE_T_MAX < INT_MAX ? SIZE_T_MAX : INT_MAX;
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Native memory allocation request must be >0 and <= %lld but was %lld.\n",
                            maxSize, size);
        return NULL;
    }

    jbyteArray *array = (jbyteArray *) malloc(static_cast<size_t>(size));
    if (array == NULL) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Failed to allocate %lld bytes of native memory.\n",
                            size);
        return NULL;
    }

    jobject directBuffer = env->NewDirectByteBuffer(array, size);
    if (directBuffer == NULL) {
        free(array);
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Failed to create direct buffer of size %lld.\n",
                            size);
        return NULL;
    }
    // memset() is not really needed but we call it here to force Android to count
    // the consumed memory in the stats since it only seems to "count" dirty pages. (?)
    memset(array, 0xFF, static_cast<size_t>(size));
    _holdBuffer = array;

    // Get a global reference to the direct buffer so Java isn't tempted to GC it.
    _directBuffer = env->NewGlobalRef(directBuffer);
    return directBuffer;
}

JNIEXPORT void JNICALL Java_com_example_zipfileinmemoryjni_JniByteArrayHolder_freeBuffer(
        JNIEnv *env, jobject obj, jobject directBuffer) {

    if (_directBuffer == NULL || _holdBuffer == NULL) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Attempt to free unallocated buffer.");
        return;
    }

    jbyteArray *bufferLoc = (jbyteArray *) env->GetDirectBufferAddress(directBuffer);
    if (bufferLoc == NULL) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "Failed to retrieve direct buffer location associated with ByteBuffer.");
        return;
    }

    if (bufferLoc != _holdBuffer) {
        __android_log_print(ANDROID_LOG_ERROR, "JNI Routine",
                            "DirectBuffer does not match that allocated.");
        return;
    }

    // Free the malloc'ed buffer and the global reference. Java can not GC the direct buffer.
    free(bufferLoc);
    env->DeleteGlobalRef(_directBuffer);
    _holdBuffer = NULL;
    _directBuffer = NULL;
}
}

I also updated the array holder:

class JniByteArrayHolder {
    external fun allocate(size: Long): ByteBuffer
    external fun freeBuffer(byteBuffer: ByteBuffer)

    companion object {
        init {
            System.loadLibrary("native-lib")
        }
    }
}

I can confirm that this code along with the ByteBufferChannel class provided by Botje here works for Android versions before API 24. The SeekableByteChannel interface was introduced in API 24 and is needed by the ZipFile utility.

The maximum buffer size that can be allocated is the size of a jint and is due to the limitation of JNI. Larger data can be accommodated (if available) but would require multiple buffers and a way to handle them.

Here is the main activity for the sample app. An earlier version always assumed the the InputStream read buffer was was always filled and errored out when trying to put it to the ByteBuffer. This was fixed.

MainActivity.kt

class MainActivity : AppCompatActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
    }

    fun onClick(view: View) {
        button.isEnabled = false
        status.text = getString(R.string.running)

        thread {
            printMemStats("Before buffer allocation:")
            var bufferSize = 0L
            // testzipfile.zip is not part of the project but any zip can be uploaded through the
            // device file manager or adb to test.
            val fileToRead = "$filesDir/testzipfile.zip"
            val inStream =
                if (File(fileToRead).exists()) {
                    FileInputStream(fileToRead).apply {
                        bufferSize = getFileSize(this)
                        close()
                    }
                    FileInputStream(fileToRead)
                } else {
                    // If testzipfile.zip doesn't exist, we will just look at this one which
                    // is part of the APK.
                    resources.openRawResource(R.raw.appapk).apply {
                        bufferSize = getFileSize(this)
                        close()
                    }
                    resources.openRawResource(R.raw.appapk)
                }
            // Allocate the buffer in native memory (off-heap).
            val jniByteArrayHolder = JniByteArrayHolder()
            val byteBuffer =
                if (bufferSize != 0L) {
                    jniByteArrayHolder.allocate(bufferSize)?.apply {
                        printMemStats("After buffer allocation")
                    }
                } else {
                    null
                }

            if (byteBuffer == null) {
                Log.d("Applog", "Failed to allocate $bufferSize bytes of native memory.")
            } else {
                Log.d("Applog", "Allocated ${Formatter.formatFileSize(this, bufferSize)} buffer.")
                val inBytes = ByteArray(4096)
                Log.d("Applog", "Starting buffered read...")
                while (inStream.available() > 0) {
                    byteBuffer.put(inBytes, 0, inStream.read(inBytes))
                }
                inStream.close()
                byteBuffer.flip()
                ZipFile(ByteBufferChannel(byteBuffer)).use {
                    Log.d("Applog", "Starting Zip file name dump...")
                    for (entry in it.entries) {
                        Log.d("Applog", "Zip name: ${entry.name}")
                        val zis = it.getInputStream(entry)
                        while (zis.available() > 0) {
                            zis.read(inBytes)
                        }
                    }
                }
                printMemStats("Before buffer release:")
                jniByteArrayHolder.freeBuffer(byteBuffer)
                printMemStats("After buffer release:")
            }
            runOnUiThread {
                status.text = getString(R.string.idle)
                button.isEnabled = true
                Log.d("Applog", "Done!")
            }
        }
    }

    /*
        This function is a little misleading since it does not reflect the true status of memory.
        After native buffer allocation, it waits until the memory is used before counting is as
        used. After release, it doesn't seem to count the memory as released until garbage
        collection. (My observations only.) Also, see the comment for memset() in native-lib.cpp
        which is a member of this project.
    */
    private fun printMemStats(desc: String? = null) {
        val memoryInfo = ActivityManager.MemoryInfo()
        (getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager).getMemoryInfo(memoryInfo)
        val nativeHeapSize = memoryInfo.totalMem
        val nativeHeapFreeSize = memoryInfo.availMem
        val usedMemInBytes = nativeHeapSize - nativeHeapFreeSize
        val usedMemInPercentage = usedMemInBytes * 100 / nativeHeapSize
        val sDesc = desc?.run { "$this:\n" }
        Log.d(
            "AppLog", "$sDesc total:${Formatter.formatFileSize(this, nativeHeapSize)} " +
                    "free:${Formatter.formatFileSize(this, nativeHeapFreeSize)} " +
                    "used:${Formatter.formatFileSize(this, usedMemInBytes)} ($usedMemInPercentage%)"
        )
    }

    // Not a great way to do this but not the object of the demo.
    private fun getFileSize(inStream: InputStream): Long {
        var bufferSize = 0L
        while (inStream.available() > 0) {
            val toSkip = inStream.available().toLong()
            inStream.skip(toSkip)
            bufferSize += toSkip
        }
        return bufferSize
    }
}

A sample GitHub repository is here.


Solution 2:

You can steal LWJGL's native memory management functions. It is BSD3 licensed, so you only have to mention somewhere that you are using code from it.

Step 1: given an InputStream is and a file size ZIP_SIZE, slurp the stream into a direct byte buffer created by LWJGL's org.lwjgl.system.MemoryUtil helper class:

ByteBuffer bb = MemoryUtil.memAlloc(ZIP_SIZE);
byte[] buf = new byte[4096]; // Play with the buffer size to see what works best
int read = 0;
while ((read = is.read(buf)) != -1) {
  bb.put(buf, 0, read);
}

Step 2: wrap the ByteBuffer in a ByteChannel. Taken from this gist. You possibly want to strip the writing parts out.

package io.github.ncruces.utils;

import java.nio.ByteBuffer;
import java.nio.channels.NonWritableChannelException;
import java.nio.channels.SeekableByteChannel;

import static java.lang.Math.min;

public final class ByteBufferChannel implements SeekableByteChannel {
    private final ByteBuffer buf;

    public ByteBufferChannel(ByteBuffer buffer) {
        if (buffer == null) throw new NullPointerException();
        buf = buffer;
    }

    @Override
    public synchronized int read(ByteBuffer dst) {
        if (buf.remaining() == 0) return -1;

        int count = min(dst.remaining(), buf.remaining());
        if (count > 0) {
            ByteBuffer tmp = buf.slice();
            tmp.limit(count);
            dst.put(tmp);
            buf.position(buf.position() + count);
        }
        return count;
    }

    @Override
    public synchronized int write(ByteBuffer src) {
        if (buf.isReadOnly()) throw new NonWritableChannelException();

        int count = min(src.remaining(), buf.remaining());
        if (count > 0) {
            ByteBuffer tmp = src.slice();
            tmp.limit(count);
            buf.put(tmp);
            src.position(src.position() + count);
        }
        return count;
    }

    @Override
    public synchronized long position() {
        return buf.position();
    }

    @Override
    public synchronized ByteBufferChannel position(long newPosition) {
        if ((newPosition | Integer.MAX_VALUE - newPosition) < 0) throw new IllegalArgumentException();
        buf.position((int)newPosition);
        return this;
    }

    @Override
    public synchronized long size() { return buf.limit(); }

    @Override
    public synchronized ByteBufferChannel truncate(long size) {
        if ((size | Integer.MAX_VALUE - size) < 0) throw new IllegalArgumentException();
        int limit = buf.limit();
        if (limit > size) buf.limit((int)size);
        return this;
    }

    @Override
    public boolean isOpen() { return true; }

    @Override
    public void close() {}
}

Step 3: Use ZipFile as before:

ZipFile zf = new ZipFile(ByteBufferChannel(bb);
for (ZipEntry ze : zf) {
    ...
}

Step 4: Manually release the native buffer (preferably in a finally block):

MemoryUtil.memFree(bb);

Post a Comment for "How To Parse A Zipped File Completely From RAM?"