Skip to content

libis/es_loader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Es-loader

Prepare record and load records to elasticsearch

ruby ./src/create_bulk.rb

-c specify a config file to use (default is config.yml)

-d specify a list directories of the json-records which must be uploaded to elasticsearh

or inside config-file with parameter record_dirs_to_load

-p file pattern of record-filename of the json-records which must be uploaded to elasticsearh

or inside config-file with parameter record_pattern

-l filename off the log-file

or inside config-file with parameter log_file

-t <load_type> action that has to be executed (update, reload or reindex)

or inside config-file with parameter load_type

-e filename off the log-file of the elasticsearch client (default is /logs/es_client.log)

-u <LAST_RUN> Time the command was last run. Only load records with a modification date after this time

The configuration determines

=> Elastic connection, cluster and index

=> mappings, settings, ...

the config file also contains te load_type

update, reload, reindex are the posible values

update:

load all records based on :record_dirs_to_load and :record_pattern that have an modification time (File.mtime) that later than :last_run_updates

reload:

creates a new index and load it with the records based on :record_dirs_to_load and :record_pattern and :last_run_updates

after loading and indexing the alias is set and the old index is deleted

reindex:

will create a new index (settin, mappings, ...) and after indexing sets the alias to the es_index value.

?????? Delete by query

?????? direct_load ??????

Examples

=> ruby ./src/load_to_es.rb -c config.yml -d Twitter => ruby ./src/load_to_es.rb -c config.yml -t reindex => ruby ./src/load_to_es.rb -c config.yml -t update -d "GoPress/2021**" -u '2021-03-15 16:00' -p "demorgen" => /usr/local/bin/docker-compose -f docker-compose-opendistro.yml run elastic_loader ruby /app/src/load_to_es.rb -c es_loader_config_opendistro.yml -t update -d Twitter/twitter_user_query_00003/2020 -u '2021-04-01 12:00'

=> docker-compose run --rm es_loader_dev bash -c "cd /app/src/; ruby compare_indices.rb"

=> docker-compose run --rm es_loader bash -c "ruby /app/src/load_to_es.rb -c config.yml -d Twitter"

=> docker-compose run --rm es_loader_dev bash cd /app/src; ruby load_enrichments_to_es.rb -c config_google_ai_translation_sv_en.yml -t enrichtment -u 01-01-2024 -l ${logfile}

ruby reindex_subset.rb -c reindex_config.yml

ruby /app/src/load_to_es.rb -c config_google_ai.yml -u 2020-01-01

=> local docker-compose run es_loader_dev bash -c "ruby /app/src/load_to_es.rb -c icandid_test.yml -t update -d scopeArchiv/kadoc_ead_query_0000001/ -p '.*00.json' -u '2000-01-01 12:00'"

docker-compose run es_loader_dev bash -c "ruby /app/src/load_to_es.rb -c config_google_ai_vision_api.yml -u '2000-01-01 12:00'"

docker-compose run es_loader_dev bash -c "ruby /app/src/extract_datamodel_from_mapping.rb -c config.yml"

ruby /app/src/load_to_es.rb -c tiktok_config.yml -d /records/tiktok/tiktok_query_0000002/new/

ruby /app/src/load_to_es.rb -c config_google_ai_video_intelligence_api_ena.yml -t enrichtment -u '2000-01-01 12:00'

ruby /app/src/load_to_es.rb -c config_whisper_turbo_ena.yml -t enrichtment -u '2000-01-01 12:00'

ruby /app/src/build_model.rb -c icandid_mapping.yml

Tests for iCANDID search API

rake api_tests API_KEY=************* INDEX=any,title,author --trace rake api_tests API_KEY=************* INDEX=any,title,author --trace

Run all api tests (in parallel)

rake api_tests_parallel

About

load, update, insert, reindex, process

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages