In the memory based computing model, the only running time calculations that need to be done can be done abstractly, by considering the data structure.
However , there aren't alot of docs on high performance disk I/o algorithms. Thus I ask the following set of questions:
1) How can we estimate running time of disk I/o operations? I assume there is a simple set of constants which we might add for looking up a value on disk, rather than in memory...
2) And more specifically, what is the difference between performance for accessing a specific index in a file? Is this a constant time operation? Or does it depend on how "far down" the index is?
3) Finally... how does the JVM optimize access of indexed portions of a file?
And... as far as resources -- in general... Are there any good idioms or libraries for on disk data structure implementations?
1) how can we estimate running time of disk I/o operations? I assume there is a simple set of constants which we might add for looking up a value on disk, rather than in memory...
In chapter 6 of Computer Systems: A Programmer's Perspective they give a pretty practical mathematical model for how long it takes to read some data from a typical magnetic disk.
To quote the last page in the linked pdf:
Putting it all together, the total estimated access time is
Taccess = Tavg seek + Tavg rotation + Tavg transfer
= 9 ms + 4 ms + 0.02 ms
= 13.02 ms
This example illustrates some important points:
• The time to access the 512 bytes in a disk sector is dominated by the seek time and the rotational
latency. Accessing the first byte in the sector takes a long time, but the remaining bytes are essentially
free.
• Since the seek time and rotational latency are roughly the same, twice the seek time is a simple and
reasonable rule for estimating disk access time.
*note, the linked pdf is from the authors website == no piracy
Of course, if the data being accessed was recently accessed, there's a decent chance it's cached somewhere in the memory heiarchy, in which case the access time is extremely small(practically, "near instant" when compared to disk access time).
2)And more specifically, what is the difference between performance for accessing a specific index in a file? Is this a constant time operation? Or does it depend on how "far down" the index is?
Another seek + rotation amount of time may occur if the seeked location isnt stored sequentially nearby. It depends where in the file you're seeking, and where that data is physically stored on the disk. For example, fragmented files are guaranteed to cause disk seeks to read the entire file.
Something to keep in mind is that even though you may only request to read a few bytes, the physical reads tend to occur in multiples of a fixed size chunks(the sector size), which ends up in cache. So you may later do a seek to some nearby location in the file, and get lucky that its already in cache for you.
Btw- The full chapter in that book on the memory hierarchy is pure gold, if you're interested in the subject.
1) If you need to compare the speed of various IO functions, you have to just run it a thousand times and record how long it takes.
2) That depends on how you plan to get to this index. An index to the beginning of a file is exactly the same as an index to the middle of a file. It just points to a section of memory on the disk. If you get to this index by starting at the beginning and progressing there, then yes it will take longer.
3/4) No these are managed by the operating system itself. Java isn't low level enough to handle these kinds of operations.
high performance disk I/o algorithms.
The performance of your hardware is usually so important that what you do in software doesn't matter so much. You should first consider buying the right hardware for the job.
how can we estimate running time of disk I/o operations? I assume there is a simple set of constants which we might add for looking up a value on disk, rather than in memory...
Its simple to time them as they are always going to take many micro-seconds each. For example a HDD can perform 80-120 IOPs and an SSD can perform 80K to 230K IOPs. You can usually get within 1/2 what the manufacturer specifies easily and getting 100% is the where you might do tricks in software. Never the less you will never get a HDD to perform like an SSD unless you have lots of memory and only ever read the data in which case the OS will do all the work for you.
You can buy hybrid drives which give you the capacity of an HDD but performance close to that of an SSD. For commercial production use you may be willing to spend the money of a disk sub-system with multiple drives. This can increase the perform to say 500 IOPS but can cost increases significantly. You usually buy a disk subsytem because you need the capacity and redundancy it provides but you usually get a performance boost as well but having more spinals working together. Although this link on disk subsystem performance is old (2004) they haven't changed that much since then.
And more specifically, what is the difference between performance for accessing a specific index in a file? Is this a constant time operation? Or does it depend on how "far down" the index is?
It depends on whether it is in memory or not. If it is very close to data you recently read it quite likely, if it far away it depends on what accesses you have done in the past and how much memory you have free to cache disk accesses.
The typical latency for a HDD is ~8 ms each (i.e. if you have 10 random reads queued it can be 80 ms) The typical latency of a SSD is 25 to 100 us. It is far less likely that reads will already be queued as it is much faster to start with.
how does the JVM optimize access of indexed portions of a file?
Assuming you are using sensible buffer sizes, there is little you can do about generically in software. What you can do is done by the OS.
are there any good idioms or libraries for on disk data structure implementations?
Use a sensible buffer size like 512 bytes to 64 KB.
Much more importantly, buy the right hardware for your requirements.
1) how can we estimate running time of disk I/o operations? I assume there is a simple set of constants which we might add for looking up a value on disk, rather than in memory...
There are no such universal constants. In fact, performance models of physical disk I/O, file systems and operating systems are too complicated to be able to make accurate predictions for specific operations.
2)And more specifically, what is the difference between performance for accessing a specific index in a file? Is this a constant time operation? Or does it depend on how "far down" the index is?
It is too complicated to predict. For instance, it depends on how much file buffering the OS does, physical disk parameters (e.g. seek times) and how effectively the OS can schedule disk activity ... across all applications.
3)Finally... how does the JVM optimize access of indexed portions of a file?
It doesn't. It is an operating system level thing.
4) are there any good idioms or libraries for on disk data structure implementations?
That is difficult to answer without more details of your actual requirements. But the best idea is not to try and implement this kind of thing yourself. Find an existing library that is a good fit to your requirements.
Also note that Linux systems, at least, allow different file systems. Depending on the application, one might be a better fit than the others. http://en.wikipedia.org/wiki/File_system#Linux
Related
In numerous articles, YouTube videos, etc., I have seen Java's volatile keyword explained as a problem of cache memory, where declaring a variable volatile ensures that reads/writes are forced to main memory, and not cache memory.
It has always been my understanding that in modern CPUs, cache memory implements coherency protocols that guarantee that reads/writes are seen equally by all processors, cores, hardware threads, etc, at all levels of the cache architecture. Am I wrong?
jls-8.3.1.4 simply states
A field may be declared volatile, in which case the Java Memory Model ensures that all threads see a consistent value for the variable (§17.4).
It says nothing about caching. As long as cache coherency is in effect, volatile variables simply need to be written to a memory address, as opposed to being stored locally in a CPU register, for example. There may also be other optimizations that need to be avoided to guarantee the contract of volatile variables, and thread visibility.
I am simply amazed at the number of people who imply that CPUs do not implement cache coherency, such that I have been forced to go to StackOverflow because I am doubting my sanity. These people go to a great deal of effort, with diagrams, animated diagrams, etc. to imply that cache memory is not coherent.
jls-8.3.1.4 really is all that needs to be said, but if people are going to explain things in more depth, wouldn't it make more sense to talk about CPU Registers (and other optimizations) than blame CPU Cache Memory?
CPUs are very, very fast. That memory is physically a few centimeters away. Let's say 15 centimeters.
The speed of light is 300,000 kilometers per second, give or take. That's 30,000,000,000,000 centimeters every second. The speed of light in medium is not as fast as in vacuum, but it's close so lets ignore that part. That means sending a single signal from the CPU to the memory, even if the CPU and memory both can instantly process it all, is already limiting you to 1,000,000,000 or 1Ghz (You need to cover 30 centimeters to get form the core to the memory and back, so you can do that 1,000,000,000 every second. If you can do it any faster, you're travelling backwards in time. Or some such. You get a nobel prize if you figure out how to manage that one).
Processors are about that fast! We measure core speeds in Ghz these days, as in, in the time it takes the signal to travel, the CPU's clock has already ticked. In practice of course that memory controller is not instantaneous either, nor is the CPU pipelining system.
Thus:
It has always been my understanding that in modern CPUs, cache memory implements coherency protocols that guarantee that reads/writes are seen equally by all processors, cores, hardware threads, etc, at all levels of the cache architecture. Am I wrong?
Yes, you are wrong. QED.
I don't know why you think that or where you read that. You misremember, or you misunderstood what was written, or whatever was written was very very wrong.
In actual fact, an actual update to 'main memory' takes on the order of a thousand cycles! A CPU is just sitting there, twiddling its thumbs, doing nothing, in a time window where it could roll through a thousand, on some cores, multiple thousands of instructions, memory is that slow. Molasses level slow.
The fix is not registers, you are missing about 20 years of CPU improvement. There aren't 2 layers (registers, then main memory), no. There are more like 5: Registers, on-die cache in multiple hierarchical levels, and then, eventually, main memory. To make it all very very fast these things are very very close to the core. So close, in fact, that each core has their own, and, drumroll here - modern CPUs cannot read main memory. At all. They are entirely incapable of it.
Instead what happens is that the CPU sees you write or read to main memory and translates that, as it can't actually do any of that, by figuring out which 'page' of memory that is trying to read/write to (each chunk of e.g. 64k worth of memory is a page; actual page size depends on hardware). The CPU then checks if any of the pages loaded in its on-die cache is that page. If yes, great, and it's all mapped to that. Which does mean that, if 2 cores both have that page loaded, they both have their own copy, and obviously anything that one core does to its copy is entirely invisible to the other core.
If the CPU does -not- find this page in its own on-die cache you get what's called a cache miss, and the CPU will then check which of its loaded pages is least used, and will purge this page. Purging is 'free' if the CPU hasn't modified it, but if that page is 'dirty', it will first send a ping to the memory controller followed by blasting the entire sequence of 64k bytes into it (because sending a burst is way, way faster than waiting for the signal to bounce back and forth or to try to figure out which part of that 64k block is dirty), and the memory controller will take care of it. Then, that same CPU pings the controller to blast the correct page to it and overwrites the space that was just purged out. Now the CPU 'retries' the instruction, and this time it does work, as that page IS now in 'memory', in the sense that the part of the CPU that translates the memory location to cachepage+offset now no longer throws a CacheMiss.
And whilst all of that is going on, THOUSANDS of cycles can pass, because it's all very very slow. Cache misses suck.
This explains many things:
It explains why volatile is slow and synchronized is slower. Dog slow. In general if you want big speed, you want processes that run [A] independent (do not need to share memory between cores, except at the very start and very end perhaps to load in the data needed to operate on, and to send out the result of the complex operation), and [B] fit all memory needs to perform the calculation in 64k or so, depending on CPU cache sizes and how many pages of L1 cache it has.
It explains why one thread can observe a field having value A and another thread observes the same field having a different value for DAYS on end if you're unlucky. If the cores aren't doing all that much and the threads checking the values of those fields does it often enough, that page is never purged, and the 2 cores go on their merry way with their local core value for days. A CPU doesn't sync pages for funsies. It only does this if that page is the 'loser' and gets purged.
It explains why Spectre happened.
It explains why LinkedList is slower than ArrayList even in cases where basic fundamental informatics says it should be faster (big-O notation, analysing computational complexity). Because as long as the arraylist's stuff is limited to a single page you can more or less consider it all virtually instant - it takes about the same order of magnitude to fly through an entire page of on-die cache as it takes for that same CPU to wait around for a single cache miss. And LinkedList is horrible on this front: Every .add on it creates a tracker object (the linkedlist has to store the 'next' and 'prev' pointers somewhere!) so for every item in the linked list you have to read 2 objects (the tracker and the actual object), instead of just the one (as the arraylist's array is in contiguous memory, that page is worst-case scenario read into on-die once and remains active for your entire loop), and it's very easy to end up with the tracker object and the actual object being on different pages.
It explains the Java Memory Model rules: Any line of code may or may not observe the effect of any other line of code on the value of any field. Unless you have established a happens-before/happens-after relationship using any of the many rules set out in the JMM to establish these. That's to give the JVM the freedom to, you know, not run literally 1000x slower than neccessary, because guaranteeing consistent reads/writes can only be done by flushing memory on every read, and that is 1000x slower than not doing that.
NB: I have massively oversimplified things. I do not have the skill to fully explain ~20 years of CPU improvements in a mere SO answer. However, it should explain a few things, and it is a marvellous thing to keep in mind as you try to analyse what happens when multiple java threads try to write/read to the same field and you haven't gone out of your way to make very very sure you have an HB/HA relationship between the relevant lines. If you're scared now, good. You shouldn't be attempting to communicate between 2 threads often, or even via fields, unless you really, really know what you are doing. Toss it through a message bus, use designs where the data flow is bounded to the start and end of the entire thread's process (make a job, initialize the job with the right data, toss it in an ExecutorPool queue, set up that you get notified when its done, read out the result, don't ever share anything whatsoever with the actual thread that runs it), or talk to each other via the database.
This question already has answers here:
Calculate size of Object in Java [duplicate]
(3 answers)
Closed 1 year ago.
I am comparing a Trie with a HashMap storing English words, over 1 million. After the data is loaded, only lookup is performed. I am writing code to test both speed and memory. The speed seems easy to measure, simply recording the system time before and after the testing code.
What's the way to measure the memory usage of an object? In this case, it's either a Trie and HashMap. I watched the system performance monitor and tested in Eclipse. The OS performance monitor shows over 1G memory is used after my testing program is launched. I doubt the fact that storing the data needs so much memory.
Also, on my Windows machine, it shows that memory usage keeps rising throughout the testing time. This shouldn't happen, since the initial loading time of the data is short. And after that, during the lookup phrase, there shouldn't be any more additional memory consumption, since no new objects are created. On linux, the memory usage seems more stable, though it also increased some.
Would you please share some thoughts on this? Thanks a lot.
The short answer is: you can't.
The long answer is: you can calculate the size of objects in memory by repeating the differential memory analysis calling GC multiple times before and after the tests. But even then only a very large numbers or round can approximate the real size. You need a warmup phase first and even if it all seams to work just smoothly, you can get stuck with jit and other optimizations, you were not aware of.
In general it's a good rule of thumb to count the amount of objects you use.
If your tree implementation uses objects as structure representing the data, it is quite possible that your memory consumption is high, compared to a map.
If you have wast amount of data a map might become slow because of collisions.
A common approach is to optimize later in case optimization is needed.
Did you try the "jps" tool which is provided by Oracle in Java SDK? You can find this in JavaSDK/bin folder. Its a great tool for performance checking and even memory usage.
I am a student in Computer Science and I am hearing the word "overhead" a lot when it comes to programs and sorts. What does this mean exactly?
It's the resources required to set up an operation. It might seem unrelated, but necessary.
It's like when you need to go somewhere, you might need a car. But, it would be a lot of overhead to get a car to drive down the street, so you might want to walk. However, the overhead would be worth it if you were going across the country.
In computer science, sometimes we use cars to go down the street because we don't have a better way, or it's not worth our time to "learn how to walk".
The meaning of the word can differ a lot with context. In general, it's resources (most often memory and CPU time) that are used, which do not contribute directly to the intended result, but are required by the technology or method that is being used. Examples:
Protocol overhead: Ethernet frames, IP packets and TCP segments all have headers, TCP connections require handshake packets. Thus, you cannot use the entire bandwidth the hardware is capable of for your actual data. You can reduce the overhead by using larger packet sizes and UDP has a smaller header and no handshake.
Data structure memory overhead: A linked list requires at least one pointer for each element it contains. If the elements are the same size as a pointer, this means a 50% memory overhead, whereas an array can potentially have 0% overhead.
Method call overhead: A well-designed program is broken down into lots of short methods. But each method call requires setting up a stack frame, copying parameters and a return address. This represents CPU overhead compared to a program that does everything in a single monolithic function. Of course, the added maintainability makes it very much worth it, but in some cases, excessive method calls can have a significant performance impact.
You're tired and cant do any more work. You eat food. The energy spent looking for food, getting it and actually eating it consumes energy and is overhead!
Overhead is something wasted in order to accomplish a task. The goal is to make overhead very very small.
In computer science lets say you want to print a number, thats your task. But storing the number, the setting up the display to print it and calling routines to print it, then accessing the number from variable are all overhead.
Wikipedia has us covered:
In computer science, overhead is
generally considered any combination
of excess or indirect computation
time, memory, bandwidth, or other
resources that are required to attain
a particular goal. It is a special
case of engineering overhead.
Overhead typically reffers to the amount of extra resources (memory, processor, time, etc.) that different programming algorithms take.
For example, the overhead of inserting into a balanced Binary Tree could be much larger than the same insert into a simple Linked List (the insert will take longer, use more processing power to balance the Tree, which results in a longer percieved operation time by the user).
For a programmer overhead refers to those system resources which are consumed by your code when it's running on a giving platform on a given set of input data. Usually the term is used in the context of comparing different implementations or possible implementations.
For example we might say that a particular approach might incur considerable CPU overhead while another might incur more memory overhead and yet another might weighted to network overhead (and entail an external dependency, for example).
Let's give a specific example: Compute the average (arithmetic mean) of a set of numbers.
The obvious approach is to loop over the inputs, keeping a running total and a count. When the last number is encountered (signaled by "end of file" EOF, or some sentinel value, or some GUI buttom, whatever) then we simply divide the total by the number of inputs and we're done.
This approach incurs almost no overhead in terms of CPU, memory or other resources. (It's a trivial task).
Another possible approach is to "slurp" the input into a list. iterate over the list to calculate the sum, then divide that by the number of valid items from the list.
By comparison this approach might incur arbitrary amounts of memory overhead.
In a particular bad implementation we might perform the sum operation using recursion but without tail-elimination. Now, in addition to the memory overhead for our list we're also introducing stack overhead (which is a different sort of memory and is often a more limited resource than other forms of memory).
Yet another (arguably more absurd) approach would be to post all of the inputs to some SQL table in an RDBMS. Then simply calling the SQL SUM function on that column of that table. This shifts our local memory overhead to some other server, and incurs network overhead and external dependencies on our execution. (Note that the remote server may or may not have any particular memory overhead associated with this task --- it might shove all the values immediately out to storage, for example).
Hypothetically we might consider an implementation over some sort of cluster (possibly to make the averaging of trillions of values feasible). In this case any necessary encoding and distribution of the values (mapping them out to the nodes) and the collection/collation of the results (reduction) would count as overhead.
We can also talk about the overhead incurred by factors beyond the programmer's own code. For example compilation of some code for 32 or 64 bit processors might entail greater overhead than one would see for an old 8-bit or 16-bit architecture. This might involve larger memory overhead (alignment issues) or CPU overhead (where the CPU is forced to adjust bit ordering or used non-aligned instructions, etc) or both.
Note that the disk space taken up by your code and it's libraries, etc. is not usually referred to as "overhead" but rather is called "footprint." Also the base memory your program consumes (without regard to any data set that it's processing) is called its "footprint" as well.
Overhead is simply the more time consumption in program execution. Example ; when we call a function and its control is passed where it is defined and then its body is executed, this means that we make our CPU to run through a long process( first passing the control to other place in memory and then executing there and then passing the control back to the former position) , consequently it takes alot performance time, hence Overhead. Our goals are to reduce this overhead by using the inline during function definition and calling time, which copies the content of the function at the function call hence we dont pass the control to some other location, but continue our program in a line, hence inline.
You could use a dictionary. The definition is the same. But to save you time, Overhead is work required to do the productive work. For instance, an algorithm runs and does useful work, but requires memory to do its work. This memory allocation takes time, and is not directly related to the work being done, therefore is overhead.
You can check Wikipedia. But mainly when more actions or resources are used. Like if you are familiar with .NET there you can have value types and reference types. Reference types have memory overhead as they require more memory than value types.
A concrete example of overhead is the difference between a "local" procedure call and a "remote" procedure call.
For example, with classic RPC (and many other remote frameworks, like EJB), a function or method call looks the same to a coder whether its a local, in memory call, or a distributed, network call.
For example:
service.function(param1, param2);
Is that a normal method, or a remote method? From what you see here you can't tell.
But you can imagine that the difference in execution times between the two calls are dramatic.
So, while the core implementation will "cost the same", the "overhead" involved is quite different.
Think about the overhead as the time required to manage the threads and coordinate among them. It is a burden if the thread does not have enough task to do. In such a case the overhead cost over come the saved time through using threading and the code takes more time than the sequential one.
To answer you, I would give you an analogy of cooking Rice, for example.
Ideally when we want to cook, we want everything to be available, we want pots to be already clean, rice available in enough quantities. If this is true, then we take less time to cook our rice( less overheads).
On the other hand, let's say you don't have clean water available immediately, you don't have rice, therefore you need to go buy it from the shops first and you need to also get clean water from the tap outside your house. These extra tasks are not standard or let me say to cook rice you don't necessarily have to spend so much time gathering your ingredients. Ideally, your ingredients must be present at the time of wanting to cook your rice.
So the cost of time spent in going to buy your rice from the shops and water from the tap are overheads to cooking rice. They are costs that we can avoid or minimize, as compared to the standard way of cooking rice( everything is around you, you don't have to waste time gathering your ingredients).
The time wasted in collecting ingredients is what we call the Overheads.
In Computer Science, for example in multithreading, communication overheads amongst threads happens when threads have to take turns giving each other access to a certain resource or they are passing information or data to each other. Overheads happen due to context switching.Even though this is crucial to them but it's the wastage of time (CPU cycles) as compared to the traditional way of single threaded programming where there is never a time wastage in communication. A single threaded program does the work straight away.
its anything other than the data itself, ie tcp flags, headers, crc, fcs etc..
My Java application deals with large binary data files using memory mapped file (MappedByteBuffer, FileChannel and RandomAccessFile). It often needs to grow the binary file - my current approach is to re-map the file with a larger region.
It works, however there are two problems
Grow takes more and more time as the file becomes larger.
If grow is conducted very rapidly (E.G. in a while(true) loop), JVM will hang forever after the re-map operation is done for about 30,000+ times.
What are the alternative approaches, and what is the best way to do this?
Also I cannot figure out why the second problem occurs. Please also suggest your opinion on that problem.
Thank you!
Current code for growing a file, if it helps:
(set! data (.map ^FileChannel data-fc FileChannel$MapMode/READ_WRITE
0 (+ (.limit ^MappedByteBuffer data) (+ DOC-HDR room))))
You probably want to grow your file in larger chunks. Use a doubling each time you remap, like a dynamic array, so that the cost for growing is an amortized constant.
I don't know why the remap hangs after 30,000 times, that seems odd. But you should be able to get away with a lot less than 30,000 remaps if you use the scheme I suggest.
The JVM doesn't clean up memory mappings even if you call the cleaner explicitly. Thank you #EJP for the correction.
If you create 32,000 of these they could be all in existence at once. BTW: I suspect you might be hitting some 15-bit limit.
The only solution for this is; don't create so many mapping. You can map an entire disk 4 TB disk with less than 4K mapping.
I wouldn't create a mapping less than 16 to 128 MB if you know the usage will grow and I would consider up to 1 GB per mapping. The reason you can do this with little cost is that the main memory and disk space will not be allocated until you actually use the pages. i.e. the main memory usage grows 4 KB at a time.
The only reason I wouldn't create a 2 GB mapping is Java doesn't support this due to an Integer.MAX_VALUE size limit :( If you have 2 GB or more you have to create multiple mappings.
Unless you can afford an exponential growth on the file like doubling, or any other constant multiplier, you need to consider whether you really need a MappedByteBuffer at all, considering their limitations (unable to grow the file, no GC, etc). I personally would either be reviewing the problem or else using a RandomAccessFile in "rw" mode, probably with a virtual-array layer over the top of it.
I am implementing a program that has about 2,000,000 (2 million) arrays each of size 16,512 (128 x 129) of integers. I only need to call 200 arrays at a time (that is 3.3 MB), but I wonder if I can expand the program to have more than 2 million (say 200 million) but I still only need to call 200 arrays at a time. So what is the limit of making more and more arrays while I don't use more than 200 arrays at a time?
I highly doubt that, unless you're running on a 64 bit machine with a lot of RAM and a very generous heap.
Let's calculate the memory you'll need for your data:
2,000,000*128*129*8/1024/1024/1024 = 30.8GB.
You'll need additional RAM for the JVM, the rest of your program, and the operating system.
Sounds like a poorly conceived solution to me.
If you mean "I only have 200 arrays in memory at a time" you can certainly do that, but you'll have to move the rest out to secondary storage or a relational database. Query for them, use them, GC them. It might not be the best solution, but it's hard to tell based on the little you've posted.
Update:
Does "trigger" mean "database trigger"?
Yes, you can store them on the disk. I can't guarantee that it'll perform. Your hard drive can certainly handle 30GB of data; it's feasible that it'll accomodate 300GB if it's large enough.
Just remember that you have to think about how you'll manage RAM. GC thrashing might be a problem. A good caching solution might be your friend here. Don't write one yourself.
What happens if that hard drive fails and you lose all that data? Do you back it up? Can your app afford to be down if the disk fails? Think about those scenarios, too. Good luck.
As long as you increase max heap size to make sure your application doesn't run out of memory, you shuold be fine.
As long as you don't keep references to arrays you no longer need, there is no hard limit. Old arrays will automatically get garbage collected, so you can keep allocating and abandoning arrays pretty much ad infinitum.
There is, of course, a limit on how many arrays you can keep around at any given time. This is limited by the amount of memory available to the JVM.