Segmented Downloading File, sometimes corrupted? - java

So I'm currently trying to increase the download speed of an application, to do so I tried to implement segmented downloading.
The code works like a charm, but in 5 test cases, 2 times the resulting File wasn't able to be opened or looked strange.
import java.nio.ByteBuffer
import java.nio.channels.ClosedChannelException
import java.nio.channels.FileChannel
//https://www.fareway.com/stores/ia/cresco/112-south-elm-street/ad/weekly/download
class Playground {
final int SEGMENT_SIZE = 8192 * 8
final int THREAD_COUNT = 8
final int UPDATE_INTERVAL = 2
boolean running = false
int target_size
static void main(String[] args){
new Playground()
}
Playground(){
running = true
long time_started_execution = System.currentTimeMillis()
URL testurl = new URL("https://www.fareway.com/stores/ia/cresco/112-south-elm-street/ad/weekly/download")
target_size = testurl.openConnection().getContentLength()
URL url = new URL("https://www.fareway.com/stores/ia/cresco/112-south-elm-street/ad/weekly/download")
File outFile = new File("testfile.pdf")
if(outFile.exists()) outFile.delete()
RandomAccessFile raf = new RandomAccessFile(outFile, "rw")
FileChannel channel = raf.getChannel()
def thread_list = []
THREAD_COUNT.times { int n ->
def t = Thread.start {
int pos = n * SEGMENT_SIZE
while(downloadSegment(url, pos, channel)){
pos+=THREAD_COUNT*SEGMENT_SIZE
}
}
t.setName("Download Thread T$n")
thread_list << t
}
def monitor_thread = Thread.start{
while(running){
try{
int before = channel.size()
sleep(UPDATE_INTERVAL * 1000)
double speed = (channel.size() - before) / UPDATE_INTERVAL
print("\rDownloading with ${speed / 1024} kb/s. (${channel.size() / 1024} / ${(target_size / 1024) as int} kb)")
}catch (ClosedChannelException cce){
running = false
}
}
}
thread_list.each{
it.join()
}
running = false
println("\n(${channel.size()} / ${(target_size)} bytes)")
raf.close()
println("Download finished in ${(System.currentTimeMillis() - time_started_execution) / 1000} s!")
}
boolean downloadSegment(URL url, int position, FileChannel channel){
if(position > target_size){
return false
}
int endPos = position+SEGMENT_SIZE-1
if(endPos > target_size){
endPos = target_size
}
HttpURLConnection conn = url.openConnection() as HttpURLConnection
conn.setRequestProperty("Range", "bytes=$position-$endPos")
//println("${Thread.currentThread().getName()}: bytes=$position-${position+SEGMENT_SIZE-1}")
conn.connect()
int current_segment_size = conn.getContentLength()
if(conn.getResponseCode() != 200 && conn.getResponseCode() != 206){
//println("\n${Thread.currentThread().getName()}: No more data!")
return false
}
byte[] buffer = conn.getInputStream().getBytes()
ByteBuffer bf = ByteBuffer.wrap(buffer)
//Set Channel Position to write at the correct place.
channel.position(position)
while(bf.hasRemaining()){
channel.write(bf)
channel.force(false)
}
bf.clear()
return current_segment_size >= SEGMENT_SIZE
}
}
So does anyone have a clue what I can try to avoid corrupted files ?
I did try to lower Thread_Count and also Segment_Size, if they are lower, the file is less likely to be corrupted, but it also lowers the download speed from around 1.5 Mb/s to 400 kb/s ... And I want to get the maximum speed here.
Thanks in advance for any help.
P.S.: I know that there are missing try-catch or other things. But this was just a Groovy Playground, to test this feature in a safe environment.

Related

Calculate memory taken by a process(job) using OSHI lib

In my application I get the estimated memory taken by the process from another application. But I am looking to get the exact memory required by the processor to run.
As I am searching online on how to get the correct memory required by the process, I found oshi lib does that. But I didn't find the the way to implement the solution. Can anyone please help me?
OSHI lib: https://github.com/oshi/oshi
FYI: We use OSHI lib to get the systemInfo, hardware, os, centralProcessor and global memory. Below is the code snippet.
oshi.SystemInfo systemInfo = new oshi.SystemInfo();
this.hal = systemInfo.getHardware();
this.os = systemInfo.getOperatingSystem();
this.centralProcessor = this.hal.getProcessor();
this.globalMemory = this.hal.getMemory();
perhaps retrieve the memory usage from the OSProcess class:
OSProcess process = new SystemInfo().getHardware().getOperatingSystem().getProcess(myPid);
process.getVirtualSize();
process.getResidentSetSize();
public static void memoryUtilizationPerProcess(int pid) {
/**
* Resident Size : how much memory is allocated to that process and is in RAM
*/
OSProcess process;
SystemInfo si = new SystemInfo();
OperatingSystem os = si.getOperatingSystem();
process = os.getProcess(pid);
oshi.hardware.GlobalMemory globalMemory = si.getHardware().getMemory();
long usedRamProcess = process.getResidentSetSize();
long totalRam = globalMemory.getTotal();
double res1 = (double) ((usedRamProcess*100)/totalRam);
System.out.println("\nMemory Usage :");
System.out.println("Memory(Ram Used/Total Mem)="+res1+"%");
System.out.println("Resident Size: "+humanReadableByteCountBin(usedRamProcess));
System.out.println("Total Size: "+humanReadableByteCountBin(totalRam));
}
public static String humanReadableByteCountBin(long bytes) {
long absB = bytes == Long.MIN_VALUE ? Long.MAX_VALUE : Math.abs(bytes);
if (absB < 1024) {
return bytes + " B";
}
long value = absB;
CharacterIterator ci = new StringCharacterIterator("KMGTPE");
for (int i = 40; i >= 0 && absB > 0xfffccccccccccccL >> i; i -= 10) {
value >>= 10;
ci.next();
}
value *= Long.signum(bytes);
return String.format("%.1f %ciB", value / 1024.0, ci.current());
}

How can I check the Payload size when using Azure Eventhubs and avoid a PayloadSizeExceededException?

So I am getting this exception:
com.microsoft.azure.servicebus.PayloadSizeExceededException: Size of the payload exceeded Maximum message size: 256 kb
I believe the exception is self explanatory, however, I an not sure what to do about it.
private int MAXBYTES = (int) ((1024 * 256) * .8);
for (EHubMessage message : payloads) {
byte[] payloadBytes = message.getPayload().getBytes(StandardCharsets.UTF_8);
EventData sendEvent = new EventData(payloadBytes);
events.add(sendEvent);
byteCount += payloadBytes.length;
if (byteCount > this.MAXBYTES) {
calls.add(ehc.sendASync(events));
logs.append("[Size:").append(events.size()).append(" - ").append(byteCount / 1024).append("kb] ");
events = new LinkedList<EventData>();
byteCount = 0;
pushes++;
}
}
I am counting the bytes and such. I have thought through the UTF-8 thing but I believe that should not matter. UTF-8 can be more than one byte, but it should be counted correctly with the "getBytes".
I could not find a reliable way to get the bytes in a string and I am not even sure how Azure counts the bytes. "payload" is a broad statement. Could include the boilerplate stuff and such.
Any Ideas? It would be great if there was a
EventHubClient.checkPayload(list);
method but there doesn't seem to be. How do you guys check the Payload Size?
Per my experience, I think you need to check the size of the current payload count and a new payload first before you add the new payload into events, as below.
const int MAXBYTES = 1024 * 256; // not necessary to multiply by .8
for (EHubMessage message : payloads) {
byte[] payloadBytes = message.getPayload().getBytes(StandardCharsets.UTF_8);
if (byteCount + payloadBytes.length > this.MAXBYTES) {
calls.add(ehc.sendASync(events));
logs.append("[Size:").append(events.size()).append(" - ").append(byteCount / 1024).append("kb] ");
events = new LinkedList<EventData>();
byteCount = 0;
pushes++;
}
EventData sendEvent = new EventData(payloadBytes);
events.add(sendEvent);
}
If you first added the new event data to count the payload size, it's too late and the data size of events which will be sent that might be exceed the payload limits.
Well, I should have added more of the actual code then I did in the original post. Here is what I came up with:
private int MAXBYTES = (int) ((1024 * 256) * .9);
for (EHubMessage message : payloads) {
byte[] payloadBytes = message.getPayload().getBytes(StandardCharsets.UTF_8);
int propsSize = message.getProps() == null ? 0 : message.getProps().toString().getBytes().length;
int messageSize = payloadBytes.length + propsSize;
if (byteCount + messageSize > this.MAXBYTES) {
calls.add(ehc.sendASync(events));
logs.append("[Size:").append(events.size()).append(" - ").append(byteCount / 1024).append("kb] ");
events = new LinkedList<EventData>();
byteCount = 0;
pushes++;
}
byteCount += messageSize;
EventData sendEvent = new EventData(payloadBytes);
sendEvent.getProperties().putAll(message.getProps());
events.add(sendEvent);
}
if (!events.isEmpty()) {
calls.add(ehc.sendASync(events));
logs.append("[Size:").append(events.size()).append(" - ").append(byteCount / 1024).append("kb]");
pushes++;
}
// lets wait til they are done.
CompletableFuture.allOf(calls.toArray(new CompletableFuture[0])).join();
}
If you notice, I was adding Properties to the EventData but not counting the bytes. The toString() for a Map returns something like:
{markiscool=markiscool}
Again, I am not sure of the boilerplate characters that the Azure api is adding but I am sure it is not much. Notice I still back off the MAXBYTES a bit just in case.
It would still be good to get a "payload size checker" method in the api but I would imagine that it would have to build the payload first to give it back to you. I experimented with having my EHubMessage object figure this out for me, but "getBytes()" on a String actually does some conversion that I don't want to do twice.

JnetPcap: reading from offline file very slow

I'm building a sort of custom version of wireshark with jnetpcap v1.4r1425. I just want to open offline pcap files and display them in my tableview, which works great except for the speed.
The files I open are around 100mb with 700k packages.
public ObservableList<Frame> readOfflineFiles1(int numFrames) {
ObservableList<Frame> frameData = FXCollections.observableArrayList();
if (numFrames == 0){
numFrames = Pcap.LOOP_INFINITE;
}
final StringBuilder errbuf = new StringBuilder();
final Pcap pcap = Pcap.openOffline(FileAddress, errbuf);
if (pcap == null) {
System.err.println(errbuf); // Error is stored in errbuf if any
return null;
}
JPacketHandler<StringBuilder> packetHandler = new JPacketHandler<StringBuilder>() {
public void nextPacket(JPacket packet, StringBuilder errbuf) {
if (packet.hasHeader(ip)){
sourceIpRaw = ip.source();
destinationIpRaw = ip.destination();
sourceIp = org.jnetpcap.packet.format.FormatUtils.ip(sourceIpRaw);
destinationIp = org.jnetpcap.packet.format.FormatUtils.ip(destinationIpRaw);
}
if (packet.hasHeader(tcp)){
protocol = tcp.getName();
length = tcp.size();
int payloadOffset = tcp.getOffset() + tcp.size();
int payloadLength = tcp.getPayloadLength();
buffer.peer(packet, payloadOffset, payloadLength); // No copies, by native reference
info = buffer.toHexdump();
} else if (packet.hasHeader(udp)){
protocol = udp.getName();
length = udp.size();
int payloadOffset = udp.getOffset() + udp.size();
int payloadLength = udp.getPayloadLength();
buffer.peer(packet, payloadOffset, payloadLength); // No copies, by native reference
info = buffer.toHexdump();
}
if (packet.hasHeader(payload)){
infoRaw = payload.getPayload();
length = payload.size();
}
frameData.add(new Frame(packet.getCaptureHeader().timestampInMillis(), sourceIp, destinationIp, protocol, length, info ));
//System.out.print(i+"\n");
//i=i+1;
}
};
pcap.loop(numFrames, packetHandler , errbuf);
pcap.close();
return frameData;
}
This code is very fast for the first maybe 400k packages, but after that it slows down a lot. It needs around 1 minute for the first 400k packages and around 10 minutes for the rest. What is the issue here?
It's not that the list is getting too timeconsuming to work with is it? the listmethod add is O(1), isnt it?
I asked about this on the official jnetpcap forums too but it's not very active.
edit:
turn out it slows down massively because of the heap usage. Is there a way to reduce this?
As the profiler showed you, you're running low on memory and it starts to slow down.
Either give more memory with -Xmx or don't load all the packets into memory at once.

A simple duplicate block finding algorithm performs worse when using BloomFilter for lookups

I have concatenated two ISO files into one file. Both the individual ISO files are Linux distros of the same vendor but different versions. In the program I have written (shown below), the concatenated file in read in blocks of 512 bytes and MD5sum of the block is computed. The MD5sum is stored in a Hashet<String>. If a block with the same signature is found using HashSet lookup, this is recorded.
The exact same algorithm is also done using BloomFilter prior to actual look up on the HashSet. As a BloomFilter provides guarantees on "non-containment" only and can provide false-positives on containment, I also look up the HashSet if the BloomFilter reports that a key might be present already.
The concatenated file size is > 1GB and hence the number of 512 byte block signatures exceeds 1.77 million. The performance of the approach using BloomFilter is consistently ~six times more than that of the first approach.
Any reasons why this might be the case? Is there something wrong I have done here?
import com.google.common.base.Charsets;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnel;
import com.google.common.hash.PrimitiveSink;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.HashSet;
import java.util.concurrent.TimeUnit;
import org.apache.commons.codec.digest.DigestUtils;
import org.apache.commons.lang3.time.StopWatch;
public class SimpleDedupTrial {
public static void main(String[] args) throws IOException {
int blockSize = 512;
HashSet<String> signatureSet = new HashSet<>();
File f = new File(
"D:\\keshav\\per\\projects\\work-area\\dedup-temp\\merged-iso"
);
FileInputStream fis = new FileInputStream(f);
long length = f.length();
long sizeSaved = 0l;
StopWatch sw = new StopWatch();
int len;
byte[] buffer = new byte[blockSize];
while ((len = fis.read(buffer)) != -1) {
String md5Hex = DigestUtils.md5Hex(buffer);
if (sw.isStopped()) {
sw.start();
}
if (sw.isSuspended()) {
sw.resume();
}
if (signatureSet.contains(md5Hex)) {
sizeSaved += len;
} else {
signatureSet.add(md5Hex);
}
sw.suspend();
}
sw.stop();
fis.close();
System.out.println("Time: "+sw.getTime(TimeUnit.MILLISECONDS));
System.out.println("File size in MB: "+convertToMB(length));
System.out.println("Size saved in MB: "+convertToMB(sizeSaved));
System.out.println("Signature set size: "+signatureSet.size());
System.out.println("Duplicate ratio: "+ ((double)sizeSaved * 100 / length));
System.out.println("With Blooom:");
useBloomFilter();
}
private static long convertToMB(long sizeInBytes) {
return sizeInBytes / (1024 * 1024);
}
private static void useBloomFilter() throws IOException {
int blockSize = 512;
Funnel<String> strFunnel = (String t, PrimitiveSink ps) -> {
ps.putString(t, Charsets.US_ASCII);
};
HashSet<String> signatureSet = new HashSet<>();
File f = new File(
"D:\\keshav\\per\\projects\\work-area\\dedup-temp\\merged-iso"
);
FileInputStream fis = new FileInputStream(f);
long length = f.length();
long sizeSaved = 0l;
BloomFilter<String> signatureBloomFilter = BloomFilter.create(
strFunnel, (length / blockSize)
);
StopWatch sw = new StopWatch();
int len;
byte[] buffer = new byte[blockSize];
while ((len = fis.read(buffer)) != -1) {
String md5Hex = DigestUtils.md5Hex(buffer);
if (sw.isStopped()) {
sw.start();
}
if (sw.isSuspended()) {
sw.resume();
}
if (signatureBloomFilter.mightContain(md5Hex)) {
if (!signatureSet.contains(md5Hex)) {
signatureBloomFilter.put(md5Hex);
signatureSet.add(md5Hex);
} else {
sizeSaved += len;
}
} else {
signatureBloomFilter.put(md5Hex);
signatureSet.add(md5Hex);
}
sw.suspend();
}
sw.stop();
fis.close();
System.out.println("Time: "+sw.getTime(TimeUnit.MILLISECONDS));
System.out.println("File size in MB: "+convertToMB(length));
System.out.println("Size saved in MB: "+convertToMB(sizeSaved));
System.out.println("Signature set size: "+signatureSet.size());
System.out.println("Duplicate ratio: "+ ((double)sizeSaved * 100 / length));
}
}
Sample output:
Time: 819
File size in MB: 1071
Size saved in MB: 205
Signature set size: 1774107
Duplicate ratio: 19.183032558071734
With Blooom:
Time: 4539
File size in MB: 1071
Size saved in MB: 205
Signature set size: 1774107
Duplicate ratio: 19.183032558071734
It looks like you missed the point of Bloom filter a bit. We use them when we can't afford memory and agree to lose some accuracy. For example decide to deliver 2 push notifications to 1/100 (or not sending to them) users for saving on storing the collection of those who has already received the notification.
In the HashSet you have expected access time O(1), so the Bloom filter won't speed up the process and as you see it slows it down. On the other hand it uses very little memory which isn't significant enough to appear in your statistics.
It's because it takes approx the same time to indicate not in and more for in.
You can read more here.

Reverse a video in android using MediaCodec, MediaExtractor, MediaMuxer etc.

Requirement : I want to reverse a video file and save it as a new video file in android. ie. the final output file should play the video in reverse.
What I tried : I've used the below code (which I got from AOSP https://android.googlesource.com/platform/cts/+/kitkat-release/tests/tests/media/src/android/media/cts/MediaMuxerTest.java) with a little modification.
File file = new File(srcMedia.getPath());
MediaExtractor extractor = new MediaExtractor();
extractor.setDataSource(file.getPath());
int trackCount = extractor.getTrackCount();
// Set up MediaMuxer for the destination.
MediaMuxer muxer;
muxer = new MediaMuxer(dstMediaPath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4);
// Set up the tracks.
HashMap<Integer, Integer> indexMap = new HashMap<Integer, Integer>(trackCount);
for (int i = 0; i < trackCount; i++) {
extractor.selectTrack(i);
MediaFormat format = extractor.getTrackFormat(i);
int dstIndex = muxer.addTrack(format);
indexMap.put(i, dstIndex);
}
// Copy the samples from MediaExtractor to MediaMuxer.
boolean sawEOS = false;
int bufferSize = MAX_SAMPLE_SIZE;
int frameCount = 0;
int offset = 100;
long totalTime = mTotalVideoDurationInMicroSeconds;
ByteBuffer dstBuf = ByteBuffer.allocate(bufferSize);
MediaCodec.BufferInfo bufferInfo = new MediaCodec.BufferInfo();
if (degrees >= 0) {
muxer.setOrientationHint(degrees);
}
muxer.start();
while (!sawEOS) {
bufferInfo.offset = offset;
bufferInfo.size = extractor.readSampleData(dstBuf, offset);
if (bufferInfo.size < 0) {
if (VERBOSE) {
Log.d(TAG, "saw input EOS.");
}
sawEOS = true;
bufferInfo.size = 0;
} else {
bufferInfo.presentationTimeUs = totalTime - extractor.getSampleTime();
//noinspection WrongConstant
bufferInfo.flags = extractor.getSampleFlags();
int trackIndex = extractor.getSampleTrackIndex();
muxer.writeSampleData(indexMap.get(trackIndex), dstBuf,
bufferInfo);
extractor.advance();
frameCount++;
if (VERBOSE) {
Log.d(TAG, "Frame (" + frameCount + ") " +
"PresentationTimeUs:" + bufferInfo.presentationTimeUs +
" Flags:" + bufferInfo.flags +
" TrackIndex:" + trackIndex +
" Size(KB) " + bufferInfo.size / 1024);
}
}
}
muxer.stop();
muxer.release();
The main change I did is in this line
bufferInfo.presentationTimeUs = totalTime - extractor.getSampleTime();
This was done in expectation that the video frames will be written to the output file in reverse order. But the result was same as the original video (not reversed).
I feel what I tried here is not making any sense. Basically I don't have much understanding of video formats, codecs, byte buffers etc.
I've also tried using JavaCV which is a good java wrapper over opencv, ffmpeg etc. and I got it working with that library. But the encoding process takes long time and also the apk size became large due to the library.
With android's built in MediaCodec APIs I expect things to be faster and lightweight. But I can accept other solutions also if they offer the same.
It's greatly appreciated if someone can offer any help on how this can be done in android. Also if you have great articles which can help me to learn the specifics/basics about video, codecs, video processing etc. that will also help.

Categories

Resources