src/source/admln_guide/plugins.rst
branchdoc-dc
changeset 112 49e4432c0a1d
parent 111 097b0c025eec
child 113 5108336d3a4c
equal deleted inserted replaced
111:097b0c025eec 112:49e4432c0a1d
     1 .. _plugins:
       
     2 
       
     3 PyAMS additional features and services
       
     4 ======================================
       
     5 
       
     6 
       
     7 Elasticsearch
       
     8 +++++++++++++
       
     9 
       
    10 At first you need to install ElasticSearch (ES); PyAMS is actually compatible with version 5.4. The Ingest attachment
       
    11 plug-in is also required to handle attachments correctly.
       
    12 
       
    13 Visit https://www.elastic.co/ to learn how to install Elasticsearch Server and `ingest-attachment` plug-in
       
    14 
       
    15 
       
    16 .. tip:: Documentation for installing ElasticSearch 5.4
       
    17 
       
    18     - https://www.elastic.co/guide/en/elasticsearch/reference/5.4/gs-installation.html
       
    19     - https://www.elastic.co/guide/en/elasticsearch/plugins/5.4/ingest-attachment.html
       
    20 
       
    21 
       
    22 After Elasticsearch installation, following steps describe how to configure ES with PyAMS.
       
    23 
       
    24 
       
    25 Initializing Elasticsearch index
       
    26 --------------------------------
       
    27 
       
    28 If you want to use an Elasticsearch index, you have to initialize index settings and mappings;
       
    29 Elasticsearch integration is defined through the *PyAMS_content_es* package.
       
    30 
       
    31 
       
    32 1. Enable service
       
    33 '''''''''''''''''
       
    34 
       
    35 In Pyramid INI application files (*etc/development.ini* and *etc/production.ini*):
       
    36 
       
    37 .. code-block:: ini
       
    38 
       
    39     # Elasticsearch server settings
       
    40     elastic.server = http://127.0.0.1:9200
       
    41     elastic.index = pyams
       
    42 
       
    43 Where:
       
    44  - **elastic.server**: address of Elasticsearch server; you can include authentication arguments in the form
       
    45    *http://login:password@w.x.y.z:9200*
       
    46  - **elastic.index**: name of Elasticsearch index.
       
    47 
       
    48 
       
    49 On startup, main PyAMS application process can start in *indexer* process which will handle indexing requests in
       
    50 asynchronous mode; this process settings are defined like this:
       
    51 
       
    52 .. code-block:: ini
       
    53 
       
    54     # PyAMS content Elasticsearch indexer process settings
       
    55     pyams_content.es.tcp_handler = 127.0.0.1:5557
       
    56     pyams_content.es.start_handler = false
       
    57     pyams_content.es.allow_auth = admin:admin
       
    58     pyams_content.es.allow_clients = 127.0.0.1
       
    59 
       
    60 Where:
       
    61  - **pyams_content.es.tcp_handler**: IP address and listening port of PyAMS indexer process
       
    62  - **pyams_content.es.start_handler**: if *true*, the indexer process is started on PyAMS startup; otherwise (typically
       
    63    in a cluster configuration), the process is supposed to be started from another *master* server
       
    64  - **pyams_content.es.allow_auth**: login and password to be used to connect to indexer process (settings are defined
       
    65    in the same way on indexer process and on all it's clients)
       
    66  - **pyams_content.es.allow_clients**: list of IP addresses allowed to connect to indexer process.
       
    67 
       
    68 
       
    69 2. Initialize Elasticsearch database
       
    70 ''''''''''''''''''''''''''''''''''''
       
    71 
       
    72 Configuration files for attachment pipeline, index and mappings settings are available into `pyams_content_es` source
       
    73 package or in PyAMS installation folder:
       
    74 
       
    75 
       
    76 .. code-block:: bash
       
    77 
       
    78     (env) $ cd docs/elasticsearch
       
    79     (env) $ curl --noproxy localhost -XPUT http://localhost:9200/_ingest/pipeline/attachment -d @attachment-pipeline.json
       
    80 
       
    81 
       
    82 And with ``elastic.index = pyams`` defined as Elasticsearch index name: *"http://localhost:9200/pyams"*:
       
    83 
       
    84 .. code-block:: shell
       
    85 
       
    86     (env) $ curl -XDELETE http://localhost:9200/pyams
       
    87 
       
    88     (env) $ curl -XPUT http://localhost:9200/pyams -d @index-settings.json
       
    89 
       
    90     (env) $ curl -XPUT http://localhost:9200/pyams/WfTopic/_mapping  -d @mappings/WfTopic.json
       
    91     (env) $ curl -XPUT http://localhost:9200/pyams/WfNewsEvent/_mapping -d @mappings/WfNewsEvent.json
       
    92     (env) $ curl -XPUT http://localhost:9200/pyams/WfBlogPost/_mapping -d @mappings/WfBlogPost.json
       
    93 
       
    94 
       
    95 *Troubleshooting*: If you have a 406 error try to add ``-H 'Content-Type: application/json'`` in Curl command lines.
       
    96 
       
    97 
       
    98 3. Update index contents
       
    99 ''''''''''''''''''''''''
       
   100 
       
   101 If your ZODB database already store contents, you can update ElasticSearch indexes with all these contents with
       
   102 ``pymas_es_index`` command line script. From a shell:
       
   103 
       
   104 .. code-block:: bash
       
   105 
       
   106     (env) $ ./bin/pyams_es_index ../etc/development.ini
       
   107 
       
   108 
       
   109 
       
   110 Natural Language Toolkit - NLTK
       
   111 +++++++++++++++++++++++++++++++
       
   112 
       
   113 PyAMS is using NLTK features through the *PyAMS_calalog*.
       
   114 
       
   115 .. seealso::
       
   116 
       
   117     Visit https://www.nltk.org/ to learn more about NLTK
       
   118 
       
   119 
       
   120 Initializing NLTK (Natural Language ToolKit)
       
   121 --------------------------------------------
       
   122 
       
   123 Some NLTK collections like **tokenizers** and **stopwords** utilities are used to index fulltext contents
       
   124 elements. You can enhance NLTK indexation according to your own needs. This package requires downloading and
       
   125 configuration of several elements which are done as follow:
       
   126 
       
   127 
       
   128 *1. Run the Python shell into PyAMS environment:*
       
   129 
       
   130 .. code-block:: bash
       
   131 
       
   132     (env) $ ./bin/py
       
   133 
       
   134 
       
   135 *2. In the Python shell:*
       
   136 
       
   137 .. code-block:: pycon
       
   138 
       
   139     >>> import nltk
       
   140     >>> nltk.download()
       
   141 
       
   142 
       
   143 *3. Configuration installation directory:*
       
   144 
       
   145 .. tip::
       
   146 
       
   147     On Debian GNU/Linux, you can choose any directory between '*~/nltk_data*' (where '~' is the homedir of user running
       
   148     Pyramid application), '*/usr/share/nltk_data*', '*/usr/local/share/nltk_data*', '*/usr/lib/nltk_data*' and
       
   149     '*/usr/local/lib/nltk_data*'
       
   150 
       
   151     Please check if you have permission to write to this directory!
       
   152 
       
   153 
       
   154 .. code-block:: shell
       
   155 
       
   156     NLTK Downloader
       
   157     ---------------------------------------------------------------------------
       
   158         d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
       
   159     ---------------------------------------------------------------------------
       
   160     Downloader> c
       
   161 
       
   162     Data Server:
       
   163       - URL: <https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml>
       
   164       - 6 Package Collections Available
       
   165       - 107 Individual Packages Available
       
   166 
       
   167     Local Machine:
       
   168       - Data directory: /home/tflorac/nltk_data
       
   169 
       
   170     Config> d
       
   171       New directory> /usr/local/lib/nltk_data
       
   172 
       
   173 
       
   174 *4. Return to the main menu:*
       
   175 
       
   176 .. code-block:: shell
       
   177 
       
   178         ---------------------------------------------------------------------------
       
   179             s) Show Config   u) Set Server URL   d) Set Data Dir   m) Main Menu
       
   180         ---------------------------------------------------------------------------
       
   181         Config> m
       
   182 
       
   183 
       
   184 *5. Download utilities:*
       
   185 
       
   186     punkt
       
   187         Punkt Tokenizer Models
       
   188     stopwords
       
   189         Stopwords Corpus
       
   190 
       
   191 
       
   192 .. code-block:: shell
       
   193 
       
   194         ---------------------------------------------------------------------------
       
   195             d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
       
   196         ---------------------------------------------------------------------------
       
   197         Downloader> d
       
   198         Download which package (l=list; x=cancel)?
       
   199           Identifier> punkt
       
   200             Downloading package punkt to /usr/local/lib/nltk_data...
       
   201         Downloader> d
       
   202         Download which package (l=list; x=cancel)?
       
   203           Identifier> stopwords
       
   204             Downloading package stopwords to /usr/local/lib/nltk_data...
       
   205 
       
   206 
       
   207 .. tip::
       
   208 
       
   209     The full list of NTLK Collection can be displayed with the ``l) list`` option.