|
1 .. _plugins: |
|
2 |
|
3 PyAMS additional features and services |
|
4 ====================================== |
|
5 |
|
6 |
|
7 Elasticsearch |
|
8 +++++++++++++ |
|
9 |
|
10 At first you need to install ElasticSearch (ES); PyAMS is actually compatible with version 5.4. The Ingest attachment |
|
11 plug-in is also required to handle attachments correctly. |
|
12 |
|
13 Visit https://www.elastic.co/ to learn how to install Elasticsearch Server and `ingest-attachment` plug-in |
|
14 |
|
15 |
|
16 .. tip:: Documentation for installing ElasticSearch 5.4 |
|
17 |
|
18 - https://www.elastic.co/guide/en/elasticsearch/reference/5.4/gs-installation.html |
|
19 - https://www.elastic.co/guide/en/elasticsearch/plugins/5.4/ingest-attachment.html |
|
20 |
|
21 |
|
22 After Elasticsearch installation, following steps describe how to configure ES with PyAMS. |
|
23 |
|
24 |
|
25 Initializing Elasticsearch index |
|
26 -------------------------------- |
|
27 |
|
28 If you want to use an Elasticsearch index, you have to initialize index settings and mappings; |
|
29 Elasticsearch integration is defined through the *PyAMS_content_es* package. |
|
30 |
|
31 |
|
32 1. Enable service |
|
33 ''''''''''''''''' |
|
34 |
|
35 In Pyramid INI application files (*etc/development.ini* and *etc/production.ini*): |
|
36 |
|
37 .. code-block:: ini |
|
38 |
|
39 # Elasticsearch server settings |
|
40 elastic.server = http://127.0.0.1:9200 |
|
41 elastic.index = pyams |
|
42 |
|
43 Where: |
|
44 - **elastic.server**: address of Elasticsearch server; you can include authentication arguments in the form |
|
45 *http://login:password@w.x.y.z:9200* |
|
46 - **elastic.index**: name of Elasticsearch index. |
|
47 |
|
48 |
|
49 On startup, main PyAMS application process can start in *indexer* process which will handle indexing requests in |
|
50 asynchronous mode; this process settings are defined like this: |
|
51 |
|
52 .. code-block:: ini |
|
53 |
|
54 # PyAMS content Elasticsearch indexer process settings |
|
55 pyams_content.es.tcp_handler = 127.0.0.1:5557 |
|
56 pyams_content.es.start_handler = false |
|
57 pyams_content.es.allow_auth = admin:admin |
|
58 pyams_content.es.allow_clients = 127.0.0.1 |
|
59 |
|
60 Where: |
|
61 - **pyams_content.es.tcp_handler**: IP address and listening port of PyAMS indexer process |
|
62 - **pyams_content.es.start_handler**: if *true*, the indexer process is started on PyAMS startup; otherwise (typically |
|
63 in a cluster configuration), the process is supposed to be started from another *master* server |
|
64 - **pyams_content.es.allow_auth**: login and password to be used to connect to indexer process (settings are defined |
|
65 in the same way on indexer process and on all it's clients) |
|
66 - **pyams_content.es.allow_clients**: list of IP addresses allowed to connect to indexer process. |
|
67 |
|
68 |
|
69 2. Initialize Elasticsearch database |
|
70 '''''''''''''''''''''''''''''''''''' |
|
71 |
|
72 Configuration files for attachment pipeline, index and mappings settings are available into `pyams_content_es` source |
|
73 package or in PyAMS installation folder: |
|
74 |
|
75 |
|
76 .. code-block:: bash |
|
77 |
|
78 (env) $ cd docs/elasticsearch |
|
79 (env) $ curl --noproxy localhost -XPUT http://localhost:9200/_ingest/pipeline/attachment -d @attachment-pipeline.json |
|
80 |
|
81 |
|
82 And with ``elastic.index = pyams`` defined as Elasticsearch index name: *"http://localhost:9200/pyams"*: |
|
83 |
|
84 .. code-block:: shell |
|
85 |
|
86 (env) $ curl -XDELETE http://localhost:9200/pyams |
|
87 |
|
88 (env) $ curl -XPUT http://localhost:9200/pyams -d @index-settings.json |
|
89 |
|
90 (env) $ curl -XPUT http://localhost:9200/pyams/WfTopic/_mapping -d @mappings/WfTopic.json |
|
91 (env) $ curl -XPUT http://localhost:9200/pyams/WfNewsEvent/_mapping -d @mappings/WfNewsEvent.json |
|
92 (env) $ curl -XPUT http://localhost:9200/pyams/WfBlogPost/_mapping -d @mappings/WfBlogPost.json |
|
93 |
|
94 |
|
95 *Troubleshooting*: If you have a 406 error try to add ``-H 'Content-Type: application/json'`` in Curl command lines. |
|
96 |
|
97 |
|
98 3. Update index contents |
|
99 '''''''''''''''''''''''' |
|
100 |
|
101 If your ZODB database already store contents, you can update ElasticSearch indexes with all these contents with |
|
102 ``pymas_es_index`` command line script. From a shell: |
|
103 |
|
104 .. code-block:: bash |
|
105 |
|
106 (env) $ ./bin/pyams_es_index ../etc/development.ini |
|
107 |
|
108 |
|
109 |
|
110 Natural Language Toolkit - NLTK |
|
111 +++++++++++++++++++++++++++++++ |
|
112 |
|
113 PyAMS is using NLTK features through the *PyAMS_calalog*. |
|
114 |
|
115 .. seealso:: |
|
116 |
|
117 Visit https://www.nltk.org/ to learn more about NLTK |
|
118 |
|
119 |
|
120 Initializing NLTK (Natural Language ToolKit) |
|
121 -------------------------------------------- |
|
122 |
|
123 Some NLTK collections like **tokenizers** and **stopwords** utilities are used to index fulltext contents |
|
124 elements. You can enhance NLTK indexation according to your own needs. This package requires downloading and |
|
125 configuration of several elements which are done as follow: |
|
126 |
|
127 |
|
128 *1. Run the Python shell into PyAMS environment:* |
|
129 |
|
130 .. code-block:: bash |
|
131 |
|
132 (env) $ ./bin/py |
|
133 |
|
134 |
|
135 *2. In the Python shell:* |
|
136 |
|
137 .. code-block:: pycon |
|
138 |
|
139 >>> import nltk |
|
140 >>> nltk.download() |
|
141 |
|
142 |
|
143 *3. Configuration installation directory:* |
|
144 |
|
145 .. tip:: |
|
146 |
|
147 On Debian GNU/Linux, you can choose any directory between '*~/nltk_data*' (where '~' is the homedir of user running |
|
148 Pyramid application), '*/usr/share/nltk_data*', '*/usr/local/share/nltk_data*', '*/usr/lib/nltk_data*' and |
|
149 '*/usr/local/lib/nltk_data*' |
|
150 |
|
151 Please check if you have permission to write to this directory! |
|
152 |
|
153 |
|
154 .. code-block:: shell |
|
155 |
|
156 NLTK Downloader |
|
157 --------------------------------------------------------------------------- |
|
158 d) Download l) List u) Update c) Config h) Help q) Quit |
|
159 --------------------------------------------------------------------------- |
|
160 Downloader> c |
|
161 |
|
162 Data Server: |
|
163 - URL: <https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml> |
|
164 - 6 Package Collections Available |
|
165 - 107 Individual Packages Available |
|
166 |
|
167 Local Machine: |
|
168 - Data directory: /home/tflorac/nltk_data |
|
169 |
|
170 Config> d |
|
171 New directory> /usr/local/lib/nltk_data |
|
172 |
|
173 |
|
174 *4. Return to the main menu:* |
|
175 |
|
176 .. code-block:: shell |
|
177 |
|
178 --------------------------------------------------------------------------- |
|
179 s) Show Config u) Set Server URL d) Set Data Dir m) Main Menu |
|
180 --------------------------------------------------------------------------- |
|
181 Config> m |
|
182 |
|
183 |
|
184 *5. Download utilities:* |
|
185 |
|
186 punkt |
|
187 Punkt Tokenizer Models |
|
188 stopwords |
|
189 Stopwords Corpus |
|
190 |
|
191 |
|
192 .. code-block:: shell |
|
193 |
|
194 --------------------------------------------------------------------------- |
|
195 d) Download l) List u) Update c) Config h) Help q) Quit |
|
196 --------------------------------------------------------------------------- |
|
197 Downloader> d |
|
198 Download which package (l=list; x=cancel)? |
|
199 Identifier> punkt |
|
200 Downloading package punkt to /usr/local/lib/nltk_data... |
|
201 Downloader> d |
|
202 Download which package (l=list; x=cancel)? |
|
203 Identifier> stopwords |
|
204 Downloading package stopwords to /usr/local/lib/nltk_data... |
|
205 |
|
206 |
|
207 .. tip:: |
|
208 |
|
209 The full list of NTLK Collection can be displayed with the ``l) list`` option. |