|
1 .. _plugins: |
|
2 |
|
3 PyAMS additional features and services |
|
4 ++++++++++++++++++++++++++++++++++++++ |
|
5 |
|
6 |
|
7 Elasticsearch 5.4 |
|
8 ================= |
|
9 |
|
10 At first you need to install ElasticSearch (ES), currently PyAMS is compatible with the version 5.4, the Ingest attachment |
|
11 plug-in is also required to handle attachments correctly. |
|
12 |
|
13 Visit https://www.elastic.co/ to learn how to install Elasticsearch Server, and how install `ingest-attachment` plug-in |
|
14 |
|
15 |
|
16 .. tips:: Documentation for installing ElasticSearch 5.4 |
|
17 |
|
18 - https://www.elastic.co/guide/en/elasticsearch/reference/5.4/gs-installation.html |
|
19 - https://www.elastic.co/guide/en/elasticsearch/plugins/5.4/ingest-attachment.html |
|
20 |
|
21 |
|
22 After ElasticSearch installation, following steps describe how to configure ES with PyAMS; |
|
23 |
|
24 Initializing Elasticsearch index |
|
25 -------------------------------- |
|
26 |
|
27 If you want to use an Elasticsearch index, you have to initialize index settings and mappings; |
|
28 Elasticsearch integration is defined through the *PyAMS_content_es* package. |
|
29 |
|
30 |
|
31 1. Enable Service: |
|
32 '''''''''''''''''' |
|
33 |
|
34 In Pyramid INI application file *(etc/development.ini)*: |
|
35 |
|
36 .. code-block:: bash |
|
37 |
|
38 # ElasticSearch settings |
|
39 elastic.server = http://127.0.0.1:9200 |
|
40 elastic.index = pyams |
|
41 |
|
42 .. code-block:: bash |
|
43 |
|
44 # PyAMS content elasticsearch index settings |
|
45 pyams_content.es.tcp_handler = 127.0.0.1:5557 |
|
46 pyams_content.es.start_handler = false |
|
47 pyams_content.es.allow_auth = admin:admin |
|
48 pyams_content.es.allow_clients = 127.0.0.1 |
|
49 |
|
50 |
|
51 2. Initialize Elasticsearch Database: |
|
52 ''''''''''''''''''''''''''''''''''''' |
|
53 |
|
54 Configuration files for attachment pipeline, index settings and mappings are available `pyams_content_es` package or in PyAMS installation folder: |
|
55 |
|
56 |
|
57 .. code-block:: bash |
|
58 |
|
59 (env) $ cd docs/elasticsearch |
|
60 (env) $ curl --noproxy localhost -XPUT http://localhost:9200/_ingest/pipeline/attachment -d @attachment-pipeline.json |
|
61 |
|
62 |
|
63 With ``elastic.index = pyams`` defined as Elasticsearch index name : *"http://localhost:9200/pyams"* : |
|
64 |
|
65 .. code-block:: shell |
|
66 |
|
67 (env) $ curl -XDELETE http://localhost:9200/pyams |
|
68 |
|
69 (env) $ curl -XPUT http://localhost:9200/pyams -d @index-settings.json |
|
70 |
|
71 (env) $ curl -XPUT http://localhost:9200/pyams/WfTopic/_mapping -d @mappings/WfTopic.json |
|
72 (env) $ curl -XPUT http://localhost:9200/pyams/WfNewsEvent/_mapping -d @mappings/WfNewsEvent.json |
|
73 (env) $ curl -XPUT http://localhost:9200/pyams/WfBlogPost/_mapping -d @mappings/WfBlogPost.json |
|
74 |
|
75 |
|
76 *Troubleshooting*: If you have a 406 error try to add ``-H 'Content-Type: application/json'`` in curl option |
|
77 |
|
78 |
|
79 3. Create or update index: |
|
80 '''''''''''''''''''''''''' |
|
81 |
|
82 You have to index PyAMS objects into ES database. From a shell: |
|
83 |
|
84 .. code-block:: bash |
|
85 |
|
86 (env) $ ./bin/pyams_es_index ../etc/development.ini |
|
87 |
|
88 |
|
89 |
|
90 ------------------------------- |
|
91 |
|
92 Natural Language Toolkit - NLTK |
|
93 =============================== |
|
94 |
|
95 |
|
96 With the package *PyAMS_nltk* PyAMS can use the NLTK features |
|
97 |
|
98 .. seealso:: |
|
99 |
|
100 Visit https://www.nltk.org/ to learn more about NLTK |
|
101 |
|
102 |
|
103 |
|
104 |
|
105 Initializing NLTK |
|
106 ----------------- |
|
107 |
|
108 Some NLTK (Natural Language Toolkit) tokenizers and stopwords utilities are used to index fulltext contents elements. |
|
109 This package requires downloading and configuration of several elements which are done as follow: |
|
110 |
|
111 |
|
112 *1. Run the Python shell with PyAMS environment:* |
|
113 |
|
114 .. code-block:: bash |
|
115 |
|
116 (env) $ ./bin/py |
|
117 |
|
118 |
|
119 *2. In the Python shell:* |
|
120 |
|
121 .. code-block:: python |
|
122 |
|
123 >>> import nltk |
|
124 >>> nltk.download() |
|
125 |
|
126 .. code-block:: python |
|
127 |
|
128 NLTK Downloader |
|
129 --------------------------------------------------------------------------- |
|
130 d) Download l) List u) Update c) Config h) Help q) Quit |
|
131 --------------------------------------------------------------------------- |
|
132 Downloader> c |
|
133 |
|
134 Data Server: |
|
135 - URL: <https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml> |
|
136 - 6 Package Collections Available |
|
137 - 107 Individual Packages Available |
|
138 |
|
139 Local Machine: |
|
140 - Data directory: /home/tflorac/nltk_data |
|
141 --------------------------------------------------------------------------- |
|
142 s) Show Config u) Set Server URL d) Set Data Dir m) Main Menu |
|
143 --------------------------------------------------------------------------- |
|
144 Config> d |
|
145 New directory> /usr/local/lib/nltk_data |
|
146 |
|
147 .. tip:: |
|
148 |
|
149 On Debian GNU/Linux, you can choose any directory between '*~/nltk_data*' (where '~' is the homedir of user running |
|
150 Pyramid application), '*/usr/share/nltk_data*', '*/usr/local/share/nltk_data*', '*/usr/lib/nltk_data*' and |
|
151 '*/usr/local/lib/nltk_data*' |
|
152 |
|
153 |
|
154 .. code-block:: pycon |
|
155 |
|
156 Config> m |
|
157 --------------------------------------------------------------------------- |
|
158 d) Download l) List u) Update c) Config h) Help q) Quit |
|
159 --------------------------------------------------------------------------- |
|
160 Downloader> d |
|
161 |
|
162 Download which package (l=list; x=cancel)? |
|
163 Identifier> punkt |
|
164 Downloading package punkt to /usr/local/lib/nltk_data... |
|
165 |
|
166 Downloader> d |
|
167 |
|
168 Download which package (l=list; x=cancel)? |
|
169 Identifier> stopwords |
|
170 Downloading package stopwords to /usr/local/lib/nltk_data... |