I have an input text file that contains different commands that I need to do,
the commands have to be done one by one, and I don't know-how.
I thought of just reading the text file--->putting the current line in a string and then comparing it with all the commands which is very not efficient
thanks
There are many ways to read commands from a file.
A lot depends on the format of commands and if they have parameters or not.
Here are possible solutions.
One command per row in a text file
Save in the file row by row the sequence of commands. Read the file row by row and check each row with a list of commands.
Pro:
Easy to implement
Cons:
Not easy to handle parameters
Difficult to handle blocks of commands
Difficult to handle jumps between commands
Commands saved as json objects
Hold the file as a text file having a single json array where each item holds a command, eventually with parameters.
Pro
Quite easy using libraries to parse json files
Easy to handle parameters
Cons
A list of commands as json array is less readable than a structured programming language
Create a parser and your own programming language
You can create your own programming language having only the details that you need.
Pro
This solution fit very well any need that you can have
Easy to read because you can decide the structure that you like more
Speed of code
Is possible to handle typical programming construct like loops, conditional statements, blocks of code...
Cons
Very hard to implement, you need to define your own language and implement it using a custom parser (example using ANTLR4)
Related
I currently have a large amount of information sorted into table form on google docs, an example can be seen below:
I would like to transfer all of this information into Google Spreadsheet form. With lines 1-5 going across columns B-F, respectively, and the information going underneath each respective column.
Would I need to use a script to accomplish this task? If so, what type of script should I use, and where can I access such a script (i.e. potentially find a freelance programmer who can write it for me, if necessary). Are there any other ways this task could be accomplished? All of the information in the google docs is very standardized thus there is not any sort of variation which could complicate a script. If a script could transfer one set of 5, it could work on all of the sets.
Thank you, let me know if you need any more information.
This can be done with a lot of different languages. I would approach this using Java just because I am most familiar with it. I would start by downloading the Google Doc as plaintext (.txt). Then run it through line by line parsing it into .csv format. From there you can import it directly into Google Sheets.
You can do this with Notepad++ or equivalent editor. Need to use find and replace tool using extended keys.
Like for replacing a line break search for \r\n and replace with any you need.
If you can place \t [tab space] between fields you can simply paste them onto sheet they align into columns.
So here you can replace double line breaks with some symbol then single line break with \t and then again replace the symbol with single line break. you get all data in columns structure.
So I am working on a GAE project. I need to look up cities, Country Names and Country Codes for sign ups, LBS, ect ...
Now I figured that putting all the information in the Datastore is rather stupid as it will be used quite frequently and its gonna eat up my datastore quotations for no reason, specially that these lists arent going to change, so its pointless to put in datastore.
Now that leaves me with a few options:
API - No budget for paid services, free ones are not exactly reliable.
Upload Parse-able file - Favorable option as I like the certainty that the data will always be there.
So I got the files needed from GeoNames (link has source files for all countries in case someone needs it). The file for each country is a regular UTF-8 tab delimited file which is great.
However, now that I have the option to choose how to format and access the data, the question is:
What is the best way to format and retrieve data systematically from a static file in a Java servelet container ?
The best way being the fastest, and least resource hungry method.
Valid options:
TXT file, tab delimited
XML file Static
Java Class with Tons of enums
I know that importing country files as Java Enums and going through their values will be very fast, but do you think this is going to affect memory beyond reasonable limits ? On the other hand, every time I need to access a record, the loop will go through a few thousand lines until it finds the required record ... reading line by line so no memory issues, but incredibly slow ... I have had some experience with parsing an excel file in a Java servelet and it took something like 20 seconds just to parse 250 records, on large scale, response time WILL timeout (no doubt about it) so is XML anything like excel ??
Thank you very much guys !! Please provide opinions, all and anything is appreciated !
Easiest and fastest way would be to have the file as a static web resource file, under the WEB-INF folder and on application startup, have a context listener to load the file into memory.
In memory, it should be a Map, mapping from a key you want to search by. This will allow you like a constant access time.
Memory consumption would only matter if it is really big. A hundred thousand record for example not worth optimizing if you need to access this many times.
The static file should be plain text format or CSV, they are read and parsed most efficiently. No need XML formatting as parsing it would be slow.
If the list is really big, you can break it up into multiple, smaller files, and only parse those and only when they are required. A reasonable, easy partitioning would be to break it up by country, but any other partitioning would work (like based on its name using the first few characters from its name).
You could also consider building this Map in the memory once, and then serialize this map to a binary file, and include that binary file as a static resource file, and that way you would only have to deserialize this Map and would be no need to parse/process it as a text file and build objects yourself.
Improvements on the data file
An alternative to having the static resource file as a text/CSV file or a serialized Map
data file would be to have it as a binary data file where you could create your own custom file format.
Using DataOutputStream you can write data to a binary file in a very compact and efficient way. Then you could use DataInputStream to load data from this custom file.
This solution has the advantages that the file could be much less (compared to plain text / CSV / serialized Map), and loading it would be much faster (because DataInputStream doesn't use number parsing from a text for example, it reads the bytes of a number directly).
Hold the data in source form as XML. At start of day, or when it changes, read it into memory: that's the only time you incur the parsing cost. There are then two main options:
(a) your in-memory form is still an XML tree, and you use XPath/XQuery to query it.
(b) your in-memory form is something like a java HashMap
If the data is very simple then (b) is probably best, but it only allows you to do one kind of query, which is hard-coded. If the data is more complex or you have a variety of possible queries, then (a) is more flexible.
This is my second post and I am getting used to the function of things on here now!
this is more of a theory question for computer science but, my question is what does this mean?
'Parsing a text file or data stream'
This is an assignment and the books and web sources I have consulted are old or vague. I have implemented a serializable interface on a SinglyLinkedList which saves/loads the file to/from the disk so it can be transferred/edited and accessed later on. Does this qualify for a sufficient achievement of the rather vague requirement?
things to note when considering this question:
this requirement is one of many for a project I am doing
the Singly Linked List I am using is custom made - I know, the premade Java one is better, but I must show my skills
all the methods work - I have tested them - its just a matter of documentation
I am using ObjectOutputStream, FileOutputStream, ObjectInputStream and FileInputStream and the respective methods to read/write the Singly linked list object
I would appreciate the feedback
The process of "parsing" can be described as reading in a data stream of some sort and building an in-memory model or representation of the semantic content of that data, in order to facilitate performing some kind of transformation on the data.
Some examples:
A compiler parses your source code to (usually) build an abstract syntax tree of the code, with the objective of generating object- (or byte-) code for execution by a machine.
An interpreter does the same thing but the syntax tree is then directly used to control execution (some interpreters are a mashup of byte-code generators and virtual machines and may generate intermediate byte-code).
A CSV parser reads a stream structured according to the rules of CSV (commas, quoting, etc) to extract the data items represented by each line in the file.
A JSON or XML parser does a similar operation for JSON- or XML-encoded data, building an in-memory representation of the semantic values of the data items and their hierarchical inter-relationships.
I need to parse complex (non fixed length) csv files to Java objects in order to compare its values.
I first tried the Flatform Parsing Framework, i liked the approach of describing the values in an extra (xml) document. Maybe it's the right tool for simple csv (and also flat) files. Nevertheless my csv files contains lines that vary in quantity of fields - sometimes they span across multiple lines. There are also dependencies among those fields.
Here's a little sample: (each type has a certain amount of extra parameters)
; <COMMENTS (to be ignored)>
<NAME>,<TYPE_A>,<DESCRIPTION>,<PARAMETER>
<NAME>,<TYPE_B>,<DESCRIPTION>,<PARAMETER>,<PARAMETER>
<NAME>,<TYPE_C>,<DESCRIPTION>,<PARAMETER>,<PARAMETER>,<PARAMETER>,<PARAMETER>
<NAME>,<TYPE_D>,<DESCRIPTION>,<PARAMETER>,<PARAMETER>,<PARAMETER>,<PARAMETER>, -
<PARAMETER>,<PARAMETER>, -
<PARAMETER>,<PARAMETER>
<NAME>,<TYPE_B>,<DESCRIPTION>,<PARAMETER>,<PARAMETER>
<NAME>,<TYPE_A>,<DESCRIPTION>,<PARAMETER>
So i need something to describe and parse the csv file in a more complex manner. I'm new to this, I've heard about parser generator - is that what I need?
Try OpenCSV (see http://opencsv.sourceforge.net/#what-features). It handles embedded carriage returns just fine.
One option is to use the Scanner class or you might want to check out the Spring Batch. Ive never actually used SB but given batch jobs often read from simple text files i believe i read it caters for this including all sorts of object mapping.
You may also try japaki
For my project, I need to store info about protocols (the data sent (most likely integers) and in the order it's sent) and info that might be formatted something like this:
'ID' 'STRING' 'ADDITIONAL INTEGER DATA'
This info will be read by a Java program and stored in memory for processing, but I don't know what would be the most sensible format to store this data in?
EDIT: Here's some extra information:
1)I will be using this data in a game server.
2)Since it is a game server, speed is not the primary concern, since this data will primary be read and utilized during startup, which shouldn't occur very often.
3)Memory consumption I would like to keep at a minimum, however.
4)The second data "example" will be used as a "dictionary" to look up names of specific in-game items, their stats and other integer data (and therefore might become very large, unlike the first data containing the protocol information, where each file will only note small protocol bites, like a login protocol for instance).
5)And yes, I would like the data to be "human-editable".
EDIT 2: Here's the choices that I've made:
JSON - For the protocol descriptions
CSV - For the dictionaries
There are many factors that could come to weigh--here are things that might help you figure this out:
1) Speed/memory usage: If the data needs to load very quickly or is very large, you'll probably want to consider rolling your own binary format.
2) Portability/compatibility: Balanced against #1 is the consideration that you might want to use the data elsewhere, with programs that won't read a custom binary format. In this case, your heavy hitters are probably going to be CSV, dBase, XML, and my personal favorite, JSON.
3) Simplicity: Delimited formats like CSV are easy to read, write, and edit by hand. Either use double-quoting with proper escaping or choose a delimiter that will not appear in the data.
If you could post more info about your situation and how important these factors are, we might be able to guide you further.
How about XML, JSON or CSV ?
I've written a similar protocol-specification using XML. (Available here.)
I think it is a good match, since it captures the hierarchal nature of specifying messages / network packages / fields etc. Order of fields are well defined and so on.
I even wrote a code-generator that generated the message sending / receiving classes with methods for each message type in XSLT.
The only drawback as I see it is the verbosity. If you have a really simple structure of the specification, I would suggest you use some simple home-brewed format and write a parser for it using a parser-generator of your choice.
In addition to the formats suggested by others here (CSV, XML, JSON, etc.) you might consider storing the info in a Java properties file. (See the java.util.Properties class.) The code is already there for you, so all you have to figure out is the properties names (or name prefixes) you want to use.
The Properties class also provides for storing/loading properties in a simple XML format.