standardising an array of vectors

standardising an array of vectors - java

I have an ArrayList of lists of vectors.
Each list of vectors has three elements with an undefined amount of vectors i.e
x,y,z
x1,y1,z1
....N....
xN,yN,zN
But each list I have varies in length 'N' I was just wondering what would be a standard for making them all the same length 'N'. Initially I tried a sparse sampling approach but that didnt work as I miss alot of data and I need to keep as much data as possible. Are there any other methods?

Related

How to implement a HashMap data structure in Java?

I want to implement a HashMap data structure, but I can't quite figure out what to do with underlying array structure.
If I'm not mistaken, in HashMap each key is hashed and converted into an integer which is used to refer to the array index. Search time is O(1) because of direct referring.
Let's say K is key and V is value. We can create an array of size n of V type and it will reside in the index produced by hash(K) function. However, hash(K) doesn't produce consecutive indices and Java's arrays aren't sparse and to solve this we can create a very large array to hold elements, but it won't be efficient, it will hold a lot of NULL elements.
One solution is to store elements in consecutive order, and for finding an element we have to search the entire array and check elements' keys but it will be linear time. I want to achieve direct access.
Thanks, beforehand.

Borrowed from the Wikipedia article on hash tables, you can use a smaller array for underlying storage by taking the hash modulo the array size like so:
hash = hashfunc(key)
index = hash % array_size
This is a good solution because you can keep the underlying array relatively dense, you don't have to modify the hash funciton, and it does not affect the time complexity.

You can look at the source code for all your problems.
For the sake of not going through a pretty large code. The solution to your problem is to use modulo. You can choose n=64, then store an element x with h(x) mod 64 = 2 in the second position of the array, ...
If two elements have the same hash modulo n, then you store them next to each other (usually done in a tree map). Another solution would be to increase n.

Difference between RowMatrix and Matrix in Apache Spark?

I want to know the basic difference between RowMatrix and Matrix class available in Apache Spark.

A little bit more precise question here would be what is a difference between mllib.linalg.Matrix and mllib.linalg.distributed.DistributedMatrix.
Matrix is a trait which represents local matrices which reside in a memory of a single machine. For now there are two basic implementations: DenseMatrix and SparseMatrix.
DistributedMatrix is a trait which represents distributed matrices build on top of RDD. RowMatrix is a subclass of a DistributedMatrix which stores data in a row-wise manner without meaningful row ordering. There are other implementations of DistributedMatrix (like IndexedRowMatrix, CoordinateMatrix and BlockMatrix) each with its own storage strategy and specific set of methods. See for example Matrix Multiplication in Apache Spark

This is going to come down a little to the idioms of the language / framework / discipline you're using, but in computer science, an array is a one dimensional "list" of "things" that can be referenced by their position in the list. One of the things that can be in the list is another array which let you make arrays of arrays (of arrays of arrays ...) giving you a data set arbitrarily large dimension.
A matrix comes from linear algebra and is a two dimensional representation of data (which can be represented by an array of arrays) that comes with a powerful set of mathematical operations that allows you to manipulate the data in interesting ways. While arrays can vary in size, the width and height of a matrix is generally know based on the specific type of operations you're going to perform.
Matrixes are used extensively in 3d graphics and physics engines because they are a fast, convenient way of representing transformation and acceleration data for objects in three dimensions.
Array : Collection of homogeneous elements.
Matrix : A simple row and column thing.
Both are different things in different spaces.
But in computer programming, a collection of single dimensions array can be termed as matrix.
You can represent an 2d Array(i.e, collection of single dimension arrays) in matrix form.
Example
A[2][3] : This means A is a collection of 2 single dimension arrays
each of size 3.
A[1,1] A[1,2] A[1,3] //This is a single dimensional
array
A[2,1] A[2,2] A[2,3] //This is another single dimensional array
//The collection is a multi-dimensional or 2d Array.

Implementing efficient data structure using Arrays only

As part of my programming course I was given an exercise to implement my own String collection. I was planning on using ArrayList collection or similar but one of the constraints is that we are not allowed to use any Java API to implement it, so only arrays are allowed. I could have implemented this using arrays however efficiency is very important as well as the amount of data that this code will be tested with. I was suggested to use hash tables or ordered tress as they are more efficient than arrays. After doing some research I decided to go with hash tables because they seemed easy to understand and implement but once I started writing code I realised it is not as straight forward as I thought.
So here are the problems I have come up with and would like some advice on what is the best approach to solve them again with efficiency in mind:
ACTUAL SIZE: If I understood it correctly hash tables are not ordered (indexed) so that means that there are going to be gaps in between items because hash function gives different indices. So how do I know when array is full and I need to resize it?
RESIZE: One of the difficulties that I need to create a dynamic data structure using arrays. So if I have an array String[100] once it gets full I will need to resize it by some factor I decided to increase it by 100 each time so once I would do that I would need to change positions of all existing values since their hash keys will be different as the key is calculated:
int position = "orange".hashCode() % currentArraySize;
So if I try to find a certain value its hash key will be different from what it was when array was smaller.
HASH FUNCTION: I was also wondering if built-in hashCode() method in String class is efficient and suitable for what I am trying to implement or is it better to create my own one.
DEALING WITH MULTIPLE OCCURRENCES: one of the requirements is to be able to add multiple words that are the same, because I need to be able to count how many times the word is stored in my collection. Since they are going to have the same hash code I was planning to add the next occurrence at the next index hoping that there will be a gap. I don't know if it is the best solution but here how I implemented it:
public int count(String word) {
int count = 0;
while (collection[(word.hashCode() % size) + count] != null && collection[(word.hashCode() % size) + count].equals(word))
count++;
return count;
}
Thank you in advance for you advice. Please ask anything needs to be clarified.
P.S. The length of words is not fixed and varies greatly.
UPDATE Thank you for your advice, I know I did do few stupid mistakes there I will try better. So I took all your suggestions and quickly came up with the following structure, it is not elegant but I hope it is what you roughly what you meant. I did have to make few judgements such as bucket size, for now I halve the size of elements, but is there a way to calculate or some general value? Another uncertainty was as to by what factor to increase my array, should I multiply by some n number or adding fixed number is also applicable? Also I was wondering about general efficiency because I am actually creating instances of classes, but String is a class to so I am guessing the difference in performance should not be too big?

ACTUAL SIZE: The built-in Java HashMap just resizes when the total number of elements exceeds the number of buckets multiplied by a number called the load factor, which is by default 0.75. It does not take into account how many buckets are actually full. You don't have to, either.
RESIZE: Yes, you'll have to rehash everything when the table is resized, which does include recomputing its hash.
So if I try to find a certain value it's hash key will be different from what it was when array was smaller.
Yup.
HASH FUNCTION: Yes, you should use the built in hashCode() function. It's good enough for basic purposes.
DEALING WITH MULTIPLE OCCURRENCES: This is complicated. One simple solution would just be to have the hash entry for a given string also keep count of how many occurrences of that string are present. That is, instead of keeping multiple copies of the same string in your hash table, keep an int along with each String counting its occurrences.

So how do I know when array is full and I need to resize it?
You keep track of the size and HashMap does. When the size used > capacity * load factor you grow the underlying array, either as a whole or in part.
int position = "orange".hashCode() % currentArraySize;
Some things to consider.
The % of a negative value is a negative value.
Math.abs can return a negative value.
Using & with a bit mask is faster however you need a size which is a power of 2.
I was also wondering if built-in hashCode() method in String class is efficient and suitable for what I am trying to implement or is it better to create my own one.
The built in hashCode is cached, so it is fast. However it is not a great hashCode and has poor randomness for lower bit, and higher bit for short strings. You might want to implement your own hashing strategy, possibly a 64-bit one.
DEALING WITH MULTIPLE OCCURRENCES:
This is usually done with a counter for each key. This way you can have say 32767 duplicates (if you use short) or 2 billion (if you use int) duplicates of the same key/element.

Best way to store and retrieve the distance between cities?

Hi I am new to Java so please using basic and simple Java methods that will help me quickly understand your idea.
Problem: I have n cities (each city has a unique names) and they are all connected to each other so that there is a distance between any 2 cities.
What is the best way to store those distances so later if I use name of 2 cities (since name is unique) I can retrieve distance between them?
I was thinking about using 2-D array but it doesn't seem like a good idea (possible duplication distance between A - B and B - A, also can't using city names) does it?
Why did somebody give thumb-downs to this question?

Two possibilities to add to your own Idea
HashMap of HashMap - heavier than 2D arrays, but provide ease of use, by city names directly.
Alternatively, make an enum with the names of the cities, and use the enum to index into the 2D array.
2DArray of variable dimension (non rectangular) - Each row can have a different size, store only half of the full matrix and derive the other half as needed. eg of creating a non rectangular array

If the distance between two cities is just the geometric distance, you only need to store the coordinates of each city.
Otherwise, store the distances in a N*N matrix, and keep an String[N] with the names.

Look in to the concept of collections in java. This should help you implement what you are looking for:
http://docs.oracle.com/javase/tutorial/collections/index.html

Storing and Retrieving points (related x and y) quickly in Java. List vs Array

I am attempting to recreate a board game in Java which involves me storing a set of valid places pieces can be placed (for the AI). I thought that perhaps instead of storing as a list of Points, it would be run-time faster if I had an array/list/dictionary of the X coordinates in which there was an array/list of the y coordinates, so once you found the x coordinate you would only have to check its Ys not all the remaining points'.
The trouble I have is that i must change the valid points often. I came up with some possible solutions but have difficulty picking/implementing them:
HashMap < Integer, ArrayList > with X as an integer key and the Ys as an ArrayList.
Problem: I would have to create a new ArrayList every time I add an X.
Also I am unsure about runtime performance of HashMap.
int[X][Y] array initialized to the board size with each point set to its relative location (point 2,3 sets[2][3]) unset point being an invalid integer.
Problem: I would have to iterate through all the points and check every point.
List of Points This would simply be a Linked/Array List of Points.
Problem: Lists are slower than arrays.
How would using a Linked list of Points compare to checking the whole array like above?
Perhaps I should use a 2d linked list? What would be the fastest runtime way to do this?

You're worrying about the wrong things. Accessing collection/map/array items is extremely fast. The graphical part will be way more performance-sensitive. Just use whatever data structure is most natural. It's unlikely that you're going to be storing enough items to really matter anyway. Build it first, then figure out where your performance problems really are.

if you use an ArrayList of Points you have nearly the same performance as with an array (in Java)
and I think this is the fastest solution, because as you already mentioned you have to iterate through the complete int-array and a HashMap and the relying ArrayLists have to be changed depending on changing/adding coordinates

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.