src/source/plugins.rst
changeset 99 b2be9a32f3fc
child 104 942151432421
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/src/source/plugins.rst	Thu Dec 06 08:24:10 2018 +0100
@@ -0,0 +1,209 @@
+.. _plugins:
+
+PyAMS additional features and services
+======================================
+
+
+Elasticsearch
++++++++++++++
+
+At first you need to install ElasticSearch (ES); PyAMS is actually compatible with version 5.4. The Ingest attachment
+plug-in is also required to handle attachments correctly.
+
+Visit https://www.elastic.co/ to learn how to install Elasticsearch Server and `ingest-attachment` plug-in
+
+
+.. tip:: Documentation for installing ElasticSearch 5.4
+
+    - https://www.elastic.co/guide/en/elasticsearch/reference/5.4/gs-installation.html
+    - https://www.elastic.co/guide/en/elasticsearch/plugins/5.4/ingest-attachment.html
+
+
+After Elasticsearch installation, following steps describe how to configure ES with PyAMS.
+
+
+Initializing Elasticsearch index
+--------------------------------
+
+If you want to use an Elasticsearch index, you have to initialize index settings and mappings;
+Elasticsearch integration is defined through the *PyAMS_content_es* package.
+
+
+1. Enable service
+'''''''''''''''''
+
+In Pyramid INI application files (*etc/development.ini* and *etc/production.ini*):
+
+.. code-block:: ini
+
+    # Elasticsearch server settings
+    elastic.server = http://127.0.0.1:9200
+    elastic.index = pyams
+
+Where:
+ - **elastic.server**: address of Elasticsearch server; you can include authentication arguments in the form
+   *http://login:password@w.x.y.z:9200*
+ - **elastic.index**: name of Elasticsearch index.
+
+
+On startup, main PyAMS application process can start in *indexer* process which will handle indexing requests in
+asynchronous mode; this process settings are defined like this:
+
+.. code-block:: ini
+
+    # PyAMS content Elasticsearch indexer process settings
+    pyams_content.es.tcp_handler = 127.0.0.1:5557
+    pyams_content.es.start_handler = false
+    pyams_content.es.allow_auth = admin:admin
+    pyams_content.es.allow_clients = 127.0.0.1
+
+Where:
+ - **pyams_content.es.tcp_handler**: IP address and listening port of PyAMS indexer process
+ - **pyams_content.es.start_handler**: if *true*, the indexer process is started on PyAMS startup; otherwise (typically
+   in a cluster configuration), the process is supposed to be started from another *master* server
+ - **pyams_content.es.allow_auth**: login and password to be used to connect to indexer process (settings are defined
+   in the same way on indexer process and on all it's clients)
+ - **pyams_content.es.allow_clients**: list of IP addresses allowed to connect to indexer process.
+
+
+2. Initialize Elasticsearch database
+''''''''''''''''''''''''''''''''''''
+
+Configuration files for attachment pipeline, index and mappings settings are available into `pyams_content_es` source
+package or in PyAMS installation folder:
+
+
+.. code-block:: bash
+
+    (env) $ cd docs/elasticsearch
+    (env) $ curl --noproxy localhost -XPUT http://localhost:9200/_ingest/pipeline/attachment -d @attachment-pipeline.json
+
+
+And with ``elastic.index = pyams`` defined as Elasticsearch index name: *"http://localhost:9200/pyams"*:
+
+.. code-block:: shell
+
+    (env) $ curl -XDELETE http://localhost:9200/pyams
+
+    (env) $ curl -XPUT http://localhost:9200/pyams -d @index-settings.json
+
+    (env) $ curl -XPUT http://localhost:9200/pyams/WfTopic/_mapping  -d @mappings/WfTopic.json
+    (env) $ curl -XPUT http://localhost:9200/pyams/WfNewsEvent/_mapping -d @mappings/WfNewsEvent.json
+    (env) $ curl -XPUT http://localhost:9200/pyams/WfBlogPost/_mapping -d @mappings/WfBlogPost.json
+
+
+*Troubleshooting*: If you have a 406 error try to add ``-H 'Content-Type: application/json'`` in Curl command lines.
+
+
+3. Update index contents
+''''''''''''''''''''''''
+
+If your ZODB database already store contents, you can update ElasticSearch indexes with all these contents with
+``pymas_es_index`` command line script. From a shell:
+
+.. code-block:: bash
+
+    (env) $ ./bin/pyams_es_index ../etc/development.ini
+
+
+
+Natural Language Toolkit - NLTK
++++++++++++++++++++++++++++++++
+
+PyAMS is using NLTK features through the *PyAMS_calalog*.
+
+.. seealso::
+
+    Visit https://www.nltk.org/ to learn more about NLTK
+
+
+Initializing NLTK (Natural Language ToolKit)
+--------------------------------------------
+
+Some NLTK collections like **tokenizers** and **stopwords** utilities are used to index fulltext contents
+elements. You can enhance NLTK indexation according to your own needs. This package requires downloading and
+configuration of several elements which are done as follow:
+
+
+*1. Run the Python shell into PyAMS environment:*
+
+.. code-block:: bash
+
+    (env) $ ./bin/py
+
+
+*2. In the Python shell:*
+
+.. code-block:: pycon
+
+    >>> import nltk
+    >>> nltk.download()
+
+
+*3. Configuration installation directory:*
+
+.. tip::
+
+    On Debian GNU/Linux, you can choose any directory between '*~/nltk_data*' (where '~' is the homedir of user running
+    Pyramid application), '*/usr/share/nltk_data*', '*/usr/local/share/nltk_data*', '*/usr/lib/nltk_data*' and
+    '*/usr/local/lib/nltk_data*'
+
+    Please check if you have permission to write to this directory!
+
+
+.. code-block:: shell
+
+    NLTK Downloader
+    ---------------------------------------------------------------------------
+        d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
+    ---------------------------------------------------------------------------
+    Downloader> c
+
+    Data Server:
+      - URL: <https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml>
+      - 6 Package Collections Available
+      - 107 Individual Packages Available
+
+    Local Machine:
+      - Data directory: /home/tflorac/nltk_data
+
+    Config> d
+      New directory> /usr/local/lib/nltk_data
+
+
+*4. Return to the main menu:*
+
+.. code-block:: shell
+
+        ---------------------------------------------------------------------------
+            s) Show Config   u) Set Server URL   d) Set Data Dir   m) Main Menu
+        ---------------------------------------------------------------------------
+        Config> m
+
+
+*5. Download utilities:*
+
+    punkt
+        Punkt Tokenizer Models
+    stopwords
+        Stopwords Corpus
+
+
+.. code-block:: shell
+
+        ---------------------------------------------------------------------------
+            d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
+        ---------------------------------------------------------------------------
+        Downloader> d
+        Download which package (l=list; x=cancel)?
+          Identifier> punkt
+            Downloading package punkt to /usr/local/lib/nltk_data...
+        Downloader> d
+        Download which package (l=list; x=cancel)?
+          Identifier> stopwords
+            Downloading package stopwords to /usr/local/lib/nltk_data...
+
+
+.. tip::
+
+    The full list of NTLK Collection can be displayed with the ``l) list`` option.