NLP Python Linux Tool needed

Completado Publicado hace 3 años Pagado a la entrega
Completado Pagado a la entrega

I need a tool that will work similarly to [login to view URL] pipeline and support following languages PL DE IT EN UA FR CZ

The tool should process parallel texts in pure text format, as can be found in the [login to view URL] repository (to be more precise, moses format). Based on file extension, the program should automatically detect what language it is.

The tool should have the following capabilities, executed one after another in exactly that order:

step 1: reducing the whole text to lowercase letters (it should be optional and disabled by default)

Step 2: pre-clean the text (optional, standard enabled, we want to use [login to view URL] scripts) i.e. [login to view URL], [login to view URL], [login to view URL], [login to view URL] – maybe you will find something else essential?

Step 3: normalize punctuation marks (optional, standard enabled), we want to use the same tool as here: [login to view URL] i.e. [login to view URL]

Step 4: Tokenization - should be performed with the use of the SpaCy tool, and for the Polish language SpaCy-pl [login to view URL]

Step 5: Truecasing - (optional, standard enabled) you can use a fragment of [login to view URL] because the whole thing comes from [login to view URL] anyway.

I don't have any prepared models, I want such models to be trained based on the input data and then applied on the same data. Just like it is done in Moses

Step 6: division into units smaller than words with the BPE algorithm [login to view URL] (optional function, standard on with a 50,000 dictionary) it must be possible to adjust the size of the dictionary with the appropriate parameter.

The result should be pure text encoded in utf8, in the same format as the input format. The number of lines MUST MATCH, the text must be still PARALLEL after processing. The program should write on the console what it is currently doing, it should easily work under Linux Ubuntu control and be easy to install. Ideally it should provide an installation script. You will also need to create short documentation and user manual with simple examples.

Python Perl Linux UNIX Programación

Nº del proyecto: #26553197

Sobre el proyecto

7 propuestas Proyecto remoto Activo hace 3 años

Adjudicado a:

computerroman

We have discussed the project in the chat so I just trying to put here enough characters to bid, cause here should be more than 100 characters.

$150 USD en 7 días
(15 comentarios)
4.2

7 freelancers están ofertando un promedio de $151 por este trabajo

Demenntor

Dear Employer, I have read the project details and confident to work on NLP python linux tool. I have extensive knowledge on perl, python, Linux and UNIX. Kindly message me so that we can discuss more about the work. Más

$200 USD en 2 días
(18 comentarios)
4.3
engrfarooq04

Hi, Good day. I read your project description very carefully. I've really rich experience in python,linux and C programming and excellent a software architecure skills. I'm really confident about your project, and very Más

$200 USD en 7 días
(1 comentario)
2.6
rukshanlancer

Hi Thanks for your contact. I've carefully checked your requirements and really interested in this job. I'm full stack developer working at large-scale websites as a developer . I can complete your project on time an Más

$100 USD en 2 días
(0 comentarios)
0.0
utkarsh7238

Hey, I can help you in NLP Python Linus Tool In how much time you want it to be completed???? Let's talk upon your project Waiting for your response!!!

$30 USD en 1 día
(0 comentarios)
0.0
oxanarvayva

Hi! Agnieszka K. I have read your job description and assure you that I am a perfect fit for the job. Available NOW and can start Immediately. Looking for soonest reply from you. Thanks

$150 USD en 3 días
(0 comentarios)
0.0