Interested in elasticsearch and working with txt files not json. Can elasticsearch support plain text file? If yes, Is there any java API, I can use ( I tested crud operations with postman on JSON document and it's working fine )
Thanks for the help.
No, the elasticsearch document api supports only JSON.
But there is a workaround for this problem using ingestion pipelines running on ingestion nodes in your cluster https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html
. By defaut each elasticsearch server instance is a ingestion node.
Please have a look on this wery well described approach for CSV https://www.elastic.co/de/blog/indexing-csv-elasticsearch-ingest-node which is easily adaptable for flat files.
Another option is to use a second tool like Filebeat or Logstash for file ingestion. Have a look here: https://www.elastic.co/products/beats or here https://www.elastic.co/products/logstash
Having a Filebeat in place will solve many problems with minimal effort. Give it a chance ;)
Related
Background
We have a web server written in Java that communicates with thousands of mobile apps via HTTPS REST APIs.
For investigation purposes we have to log all API calls - currently this is implemented as a programming #Aspect, and for each API call we save an api_call_log object into a MySQL table with the following attributes:
tenant_id
username
device_uuid
api_method
api_version
api_start_time
api_processing_duration
request_parameters
full_request (JSON)
full_response (JSON)
response_code
Problem
As you can imagine after reaching a certain throughput this solution doesn't scale well, and also querying this table is very slow even with the use of the right MySQL indices.
Approach
That's why we want to use the Elastic Stack to re-implement this solution, however I am a bit stuck at the moment.
Question
I couldn't found any Logstash plugins yet that would suit my needs - should I output this api_call_log object into a log file instead and use Logstash to parse, filter and transform that file?
Exactly this is what I would do in this case. Write your log to a file using a framework like logback, rotate it. If you want easy parsing use json as logging format (also available in logback). Then use Filebeat in order to ingest the logfile as it gets written. If you need to transform/parse the messages in elasticsearch ingest nodes using pipelines.
Consider tagging/enriching the logfiles read by filebeat with machine or enviroment specific informations in order to ask for them in your visualisation or report etc.
The filebeat-to-elastic approach is the simplest one. Try this first. If you can't get your parsing done in elasticsearch pipelines, put a logstash in between.
Using filebeat you'll get many stuff for free like backpressure handling and daily indicies what comes very handy in the logging scenario we are discussing here.
When you need a visualization or search ui, have a look on kibana or grafana.
And if you have more questions, raise a new question here.
Have Fun!
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-installation.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html
I am working on a project which uses Elasticsearch 5.4 as a datastore. I need to update a field in all the documents in a index. We are using Java RestClient for connection between Java and Elasticsearch and i am not able to find any documentation or Resources for Update API in RestClient 5.4.3.
Can someone please guide me how to proceed from here!
Note : I cannot use Transport client.
Thanks
Did you try to perform POST request on _update_by_query endpoint?
Please take a look on Update By Query API:
The simplest usage of _update_by_query just performs an update on every document in the index without changing the source.
and
So far we’ve only been updating documents without changing their source. That is genuinely useful for things like picking up new properties but it’s only half the fun. _update_by_query supports a script object to update the document.
We have got a 3GB XML which we have to validate and then flatten. We are expected to use Spark-Java to validate it and flattening as well. Flattened data will be ingested into Hive table.
Also, the validation should throw the bad record in XML (so that we can write the same back to Kafka topic to make source system aware of it). And bad record shouldn't get stored inside hive table.
Flattening based on com.databricks.spark.xml is not recommended by client.
Kindly help . If not code, algorithm would also help.
You can use javax.xml.validation.Validator. This api will help you to validate XML.
I have a java class that sends http post requests to a solr instance to index json files. it is implemented in a multithreaded manner. However, I have realized that sending so many http requests (close to 20,000) is causing the network to be a bottle neck. I read online that I can do batch indexing, but I can't find any clear examples. Is there any advice to batch index solr?
Thank you.
For generic JSON, you must have a configuration somewhere in solrconfig.xml that defines how it is treated.
One of the parameters is split. You might be able to use it to combine your JSON documents into a one bigger one that Solr would split and process separately. Notice that the specific format may be a little different for different Solr versions. Get the correct version of the downloadable reference guide PDF, if something is not working.
Or, if you can generate it, use JSON format Solr understands directly and which has full support for multiple documents.
IS there any way to insert xml file data into cassandra ?
say XML file having lot of data , file size- 5MB.
Is there already available utility or Do i have to write some kind JAVA parser?
FYI, I am using cassandra 2.1.
what are option i have got ?
Either you can use COPY...FROM Command Simple data importing and exporting with Cassandra
or use Cassandra Bulk Loader Using the Cassandra Bulk Loader
.
For second one you need to do some like JAVA parser.
For large data file, I suggest you to go with Cassandra bulk loader.