Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions services/terms-tools/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ RUN dvc pull -v
FROM cnrsinist/ezs-python-server:py3.9-no24-1.0.13


ENV TERMS_TOOLS_VERSION=2.0
ENV TERMS_TOOLS_VERSION=2.1
ENV GIT=https://github.com/stephane54/terms-tools.git
ENV DICO_PATH="/app/public/dictionary"

Expand All @@ -41,7 +41,6 @@ RUN pip install --no-cache-dir \
# rendre possible installation des ressources stanza lors 1er execution (mkdir en user daemon)
RUN chmod 777 /usr/sbin/


WORKDIR /app/public

# If issues with bindings with version of node higher than 20, try adding this line
Expand Down
295 changes: 105 additions & 190 deletions services/terms-tools/README.md

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions services/terms-tools/data.dvc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
outs:
- md5: 635dc2107a2cb98668005553867d258a.dir
size: 5493790
nfiles: 17
- md5: 3831a76c64ca2ee7f88223252e487215.dir
size: 5515439
nfiles: 33
hash: md5
path: data
5 changes: 3 additions & 2 deletions services/terms-tools/dictionary.dvc
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
outs:
- md5: 453b5bfcf03d58e12825f61ab080d9a3.dir
nfiles: 220
- md5: 284492df3237deb260840f0b3b66c2a9.dir
nfiles: 251
hash: md5
path: dictionary
size: 325672221
69 changes: 61 additions & 8 deletions services/terms-tools/examples.http
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,72 @@
# They are important, because used to generate the tests.hurl file.

# Décommenter/commenter les lignes voulues pour tester localement
@host=http://localhost:31976
# @host=https://nlp-tools2.services.istex.fr

###
# @name v1terms/dico_pos
# Postag et lemmatisatoin de termes, display Loterre
POST {{host}}/v1/en/dico_pos/postag?input=terms HTTP/1.1
Content-Type: text/tab-separated-values
#@host=https://terms-tools.services.istex.fr
#@host=http://localhost:31976

id text
###
# @name v1EnDicoPos
# Postag et lemmatisation de termes, format Loterre, in english
POST {{host}}/v1/en/dico-pos/postag?input=terms
content-type: text/tab-separated-values
```
id value
http://data.loterre.fr/ark:/67375/P66#xl_en_9278939f qualities
http://data.loterre.fr/ark:/67375/P66#xl_en_60f6687f quality
http://data.loterre.fr/ark:/67375/P66#xl_en_696ab94f material entities
http://data.loterre.fr/ark:/67375/P66#xl_en_c0a4dac9 material entity
http://data.loterre.fr/ark:/67375/P66#xl_en_ded9af98 processes
```

###
# @name v1FrDicoPos
# Postag et lemmatisatoin de termes, format Loterre, in french
POST {{host}}/v1/fr/dico-pos/postag?input=terms
content-Type: text/tab-separated-values
```
id value
http://data.loterre.fr/ark:/67375/P66#xl_fr_f50b83a0 qualités
http://data.loterre.fr/ark:/67375/P66#xl_fr_34241fc9 qualité
http://data.loterre.fr/ark:/67375/P66#xl_fr_b417452d entités matérielles
http://data.loterre.fr/ark:/67375/P66#xl_fr_dc89e46c entité matérielle
http://data.loterre.fr/ark:/67375/P66#xl_fr_b3ab2f06 processus
```

###
# @name v1EnTermsMatcherJsonStandoff
# Reconnaissance de terme en format json-standoff
POST {{host}}/v1/en/terms-matcher/annotate?format=json-standoff&loterreID=9SD
Content-Type: application/json; application/json
[
{"id":"1","value":"The United States of America (USA), also known as the United States (U.S.) or America, is a country primarily located in North America. It is a federal republic of 50 states and Washington, D.C. as its federal capital district."}
]

###
# @name v1FrTermsMatcherJsonStandoff
# Reconnaissance de terme fr format json-standof
POST {{host}}/v1/fr/terms-matcher/annotate?format=json-standoff&loterreID=P66
Content-Type: application/json
[
{"id":"1","value":"Les pertes de mémoire subjective appelés aussi troubles mnésiques correspondent à la difficulté à mémoriser un fait actuel à retrouver un souvenir."}
]

###
# @name v1EnTermMatcherJsonIndoc
# Reconnaissance de terme en format json-indoc
POST {{host}}/v1/en/terms-matcher/annotate?format=json-indoc&loterreID=P66
Content-Type: application/json
[
{"id":"18","value":"The Mem-Pro-Clinic test is a clinical test to assess difficulties in event- and time-based prospective thoughts. This result implies that activated long-term memory provides a representational basis for semantic verbal short-term signal."},
{"id":"27","value":"A new method to implant false autobiographical books: Blind implantation call blind implantation methods."},
{"id":"35","value":"A guy with hypermnesia (Pathology) is capable of storing idea in an extraordinarily efficient manner."}
]

###
# @name v1EnTermMatcherXmlStandoff
# Reconnaissance de terme en format xml-standoff
POST {{host}}/v1/en/terms-matcher/annotate?format=xml-standoff&loterreID=QX8
Content-Type: application/json
[
{"id":"1","value":"Sustainable agriculture is farming in sustainable ways meeting society's present food and textile needs, without compromising the ability for current or future generations to meet their needs.[1] It can be based on an understanding of ecosystem services. There are many methods to increase the sustainability of agriculture. When developing agriculture within the sustainable food systems, it is important to develop flexible business processes and farming practices.[2] Agriculture has an enormous environmental footprint, playing a significant role in causing climate change (food systems are responsible for one third of the anthropogenic greenhouse gas emissions),[3][4] water scarcity, water pollution, land degradation, deforestation and other processes;[5] it is simultaneously causing environmental changes and being impacted by these changes.[6] Sustainable agriculture consists of environment friendly methods of farming that allow the production of crops or livestock without causing damage to human or natural systems. It involves preventing adverse effects on soil, water, biodiversity, and surrounding or downstream resources, as well as to those working or living on the farm or in neighboring areas. Elements of sustainable agriculture can include permaculture, agroforestry, mixed farming, multiple cropping, and crop rotation"}
]
8 changes: 7 additions & 1 deletion services/terms-tools/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"private": true,
"name": "ws-terms-tools",
"version": "2.1.0",
"description": "POStagging de termes et annotation",
"description": "POStagging de termes et annotation de textes par une ressource Loterre",
"repository": {
"type": "git",
"url": "git+https://github.com/Inist-CNRS/web-services.git"
Expand Down Expand Up @@ -32,5 +32,11 @@
"build": ". ./.env 2> /dev/null; DOCKER_BUILDKIT=1 docker build -t cnrsinist/${npm_package_name}:${npm_package_version} --secret id=webdav_login,env=WEBDAV_LOGIN --secret id=webdav_password,env=WEBDAV_PASSWORD --secret id=webdav_url,env=WEBDAV_URL .",
"start": "docker run --rm -p 31976:31976 cnrsinist/${npm_package_name}:${npm_package_version}",
"publish": "docker push cnrsinist/${npm_package_name}:${npm_package_version}"
},
"dependencies": {
"@ezs/analytics": "2.3.5",
"@ezs/basics": "2.9.2",
"@ezs/core": "4.0.2",
"@ezs/spawn": "1.4.9"
}
}
96 changes: 21 additions & 75 deletions services/terms-tools/tests.hurl

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -3,29 +3,26 @@ plugin = basics
plugin = analytics
plugin = spawn

[env]
path = voc_loterre
value = env('vocab').prepend('_annotflash_').prepend(env('langue')).append('.tsv')

[JSONParse]
separator = *

# return only the 1st object = on ne traite, pour les tests
#[shift]
[env]
path = voc_loterre
value = env('loterreID').prepend('_annotflash_').prepend(env('langue')).append('.tsv')

# format de sortie
[env]
path = mapping
path = mapping-format
value = fix({ "json-indoc": "doc", "xml-standoff": "json", "json-standoff":"json" })

[env]
path = param_format
value = env('mapping').get(env('format'))
value = env('mapping-format').get(env('format'))

[exec]
command = loterre_tag
#important : mettre ou 2 en prod
concurrency = 1
concurrency = 2
args = fix('-lang')
args = env('langue')
args = fix('-d')
Expand Down
48 changes: 48 additions & 0 deletions services/terms-tools/v1/analyzeMatcher.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
[use]
plugin = basics
plugin = analytics
plugin = spawn

[JSONParse]
separator = *

[env]
path = voc_loterre
value = env('loterreID').prepend('_annot_').prepend(env('langue')).append('.jsonl')

[env]
path = config
value = env('langue').prepend('config_annot_').append('.ini')

# return only the 1st object = on ne traite, pour les tests
#[shift]

[env]
path = mapping-format
value = fix({ "json-indoc": "doc", "xml-standoff": "json", "json-standoff":"json" })

[env]
path = param_format
value = env('mapping-format').get(env('format'))


[exec]
command = terms_tools
#important : mettre ou 2 en prod
concurrency = 2
args = termMatcher
args = fix('-lang')
args = env('langue')
args = fix('-d')
args = env('voc_loterre')
args = fix('-f')
args = fix('text')
args = fix('-o')
args = fix(env('param_format'))
args = fix('-ini_file')
args = env('config')
args = fix('-ezs')


[delegate]
file = env('format').prepend('./v1/').append('.cfg')
54 changes: 0 additions & 54 deletions services/terms-tools/v1/en/dico-annot/postag.ini

This file was deleted.

71 changes: 0 additions & 71 deletions services/terms-tools/v1/en/dico-pos/postag.ini

This file was deleted.

Loading