diff -r 05a4f2c07b84 -r e7d62e94392f src/source/plugins.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/source/plugins.rst Wed May 16 14:03:44 2018 +0200 @@ -0,0 +1,209 @@ +.. _plugins: + +PyAMS additional features and services +====================================== + + +Elasticsearch ++++++++++++++ + +At first you need to install ElasticSearch (ES); PyAMS is actually compatible with version 5.4. The Ingest attachment +plug-in is also required to handle attachments correctly. + +Visit https://www.elastic.co/ to learn how to install Elasticsearch Server and `ingest-attachment` plug-in + + +.. tip:: Documentation for installing ElasticSearch 5.4 + + - https://www.elastic.co/guide/en/elasticsearch/reference/5.4/gs-installation.html + - https://www.elastic.co/guide/en/elasticsearch/plugins/5.4/ingest-attachment.html + + +After Elasticsearch installation, following steps describe how to configure ES with PyAMS. + + +Initializing Elasticsearch index +-------------------------------- + +If you want to use an Elasticsearch index, you have to initialize index settings and mappings; +Elasticsearch integration is defined through the *PyAMS_content_es* package. + + +1. Enable service +''''''''''''''''' + +In Pyramid INI application files (*etc/development.ini* and *etc/production.ini*): + +.. code-block:: ini + + # Elasticsearch server settings + elastic.server = http://127.0.0.1:9200 + elastic.index = pyams + +Where: + - **elastic.server**: address of Elasticsearch server; you can include authentication arguments in the form + *http://login:password@w.x.y.z:9200* + - **elastic.index**: name of Elasticsearch index. + + +On startup, main PyAMS application process can start in *indexer* process which will handle indexing requests in +asynchronous mode; this process settings are defined like this: + +.. code-block:: ini + + # PyAMS content Elasticsearch indexer process settings + pyams_content.es.tcp_handler = 127.0.0.1:5557 + pyams_content.es.start_handler = false + pyams_content.es.allow_auth = admin:admin + pyams_content.es.allow_clients = 127.0.0.1 + +Where: + - **pyams_content.es.tcp_handler**: IP address and listening port of PyAMS indexer process + - **pyams_content.es.start_handler**: if *true*, the indexer process is started on PyAMS startup; otherwise (typically + in a cluster configuration), the process is supposed to be started from another *master* server + - **pyams_content.es.allow_auth**: login and password to be used to connect to indexer process (settings are defined + in the same way on indexer process and on all it's clients) + - **pyams_content.es.allow_clients**: list of IP addresses allowed to connect to indexer process. + + +2. Initialize Elasticsearch database +'''''''''''''''''''''''''''''''''''' + +Configuration files for attachment pipeline, index and mappings settings are available into `pyams_content_es` source +package or in PyAMS installation folder: + + +.. code-block:: bash + + (env) $ cd docs/elasticsearch + (env) $ curl --noproxy localhost -XPUT http://localhost:9200/_ingest/pipeline/attachment -d @attachment-pipeline.json + + +And with ``elastic.index = pyams`` defined as Elasticsearch index name: *"http://localhost:9200/pyams"*: + +.. code-block:: shell + + (env) $ curl -XDELETE http://localhost:9200/pyams + + (env) $ curl -XPUT http://localhost:9200/pyams -d @index-settings.json + + (env) $ curl -XPUT http://localhost:9200/pyams/WfTopic/_mapping -d @mappings/WfTopic.json + (env) $ curl -XPUT http://localhost:9200/pyams/WfNewsEvent/_mapping -d @mappings/WfNewsEvent.json + (env) $ curl -XPUT http://localhost:9200/pyams/WfBlogPost/_mapping -d @mappings/WfBlogPost.json + + +*Troubleshooting*: If you have a 406 error try to add ``-H 'Content-Type: application/json'`` in Curl command lines. + + +3. Update index contents +'''''''''''''''''''''''' + +If your ZODB database already store contents, you can update ElasticSearch indexes with all these contents with +``pymas_es_index`` command line script. From a shell: + +.. code-block:: bash + + (env) $ ./bin/pyams_es_index ../etc/development.ini + + + +Natural Language Toolkit - NLTK ++++++++++++++++++++++++++++++++ + +PyAMS is using NLTK features through the *PyAMS_calalog*. + +.. seealso:: + + Visit https://www.nltk.org/ to learn more about NLTK + + +Initializing NLTK (Natural Language ToolKit) +-------------------------------------------- + +Some NLTK collections like **tokenizers** and **stopwords** utilities are used to index fulltext contents +elements. You can enhance NLTK indexation according to your own needs. This package requires downloading and +configuration of several elements which are done as follow: + + +*1. Run the Python shell into PyAMS environment:* + +.. code-block:: bash + + (env) $ ./bin/py + + +*2. In the Python shell:* + +.. code-block:: pycon + + >>> import nltk + >>> nltk.download() + + +*3. Configuration installation directory:* + +.. tip:: + + On Debian GNU/Linux, you can choose any directory between '*~/nltk_data*' (where '~' is the homedir of user running + Pyramid application), '*/usr/share/nltk_data*', '*/usr/local/share/nltk_data*', '*/usr/lib/nltk_data*' and + '*/usr/local/lib/nltk_data*' + + Please check if you have permission to write to this directory! + + +.. code-block:: shell + + NLTK Downloader + --------------------------------------------------------------------------- + d) Download l) List u) Update c) Config h) Help q) Quit + --------------------------------------------------------------------------- + Downloader> c + + Data Server: + - URL: + - 6 Package Collections Available + - 107 Individual Packages Available + + Local Machine: + - Data directory: /home/tflorac/nltk_data + + Config> d + New directory> /usr/local/lib/nltk_data + + +*4. Return to the main menu:* + +.. code-block:: shell + + --------------------------------------------------------------------------- + s) Show Config u) Set Server URL d) Set Data Dir m) Main Menu + --------------------------------------------------------------------------- + Config> m + + +*5. Download utilities:* + + punkt + Punkt Tokenizer Models + stopwords + Stopwords Corpus + + +.. code-block:: shell + + --------------------------------------------------------------------------- + d) Download l) List u) Update c) Config h) Help q) Quit + --------------------------------------------------------------------------- + Downloader> d + Download which package (l=list; x=cancel)? + Identifier> punkt + Downloading package punkt to /usr/local/lib/nltk_data... + Downloader> d + Download which package (l=list; x=cancel)? + Identifier> stopwords + Downloading package stopwords to /usr/local/lib/nltk_data... + + +.. tip:: + + The full list of NTLK Collection can be displayed with the ``l) list`` option.