Find Jobs
Hire Freelancers

Genre classification based on pure URL

$30-250 USD

Cerrado
Publicado hace más de 2 años

$30-250 USD

Pagado a la entrega
Ultimate goal is to classify urls just based on the urls, not using other features. There are two tasks: one is a url classification project with using Pytorch and RoBERTa, fine-tuned with descriptions from DMOZ, with a DMOZ dataset and other 2 datasets. The other task is using sentence-transformer to predict a url from meta-description. Implementing MultiNegativeRankingLoss. Extra information will be shared via email. [login to view URL] Description. 1. Datasets a. Mainly using the DMOZ dataset. [login to view URL] b. Malicious URLs dataset. [login to view URL] c. URL dataset (ISCX-URL2016) [login to view URL] d. Detecting Malicious URLs [login to view URL] e. ANT Datasets [login to view URL] Bottom four datasets are for comparison for classification. 2. Experimental setting Task 1. This will be just a simple genre classification of urls. a. Using RoBERTa-base, RoBERTa-large models run the genre or phish classification. b. Only using urls first, split the urls by '/', then punctuations, then word segmenter in python, for last Universal Word Segmentation ([login to view URL]). Github for Universal Word Segmentation: [login to view URL] c. Fine-tune RoBERTa models with the descriptions from DMOZ. d. Run the models again. e. Result tables and implemented equations are required here. Task 2. This will be basically predicting urls from descriptions. a. Using sentence_transformers embed the DMOZ's descriptions to model(From pre-trained models use "all-mpnet-base-v2" and "all-MiniLM-L6-v2". Starting from scratch, which means building models, use RoBERTa-base. b. After embedding descriptions with matching urls run the sentence-transformers(Look at the usages in the following link: [login to view URL]). c. For the loss function try to use BatchAllTripletLoss, BatchHardSoftMarginTripletLoss, MultipleNegativesRankingLoss, TripletLoss. d. Need comparison table of each model and loss function. Task 3. This will be a combined work of task 1 and 2. With random Description predict the url and classify the url's genre. * Equation for loss functions and some sequential explanation of models is needed. For example, we can implement a fully connected dense layer with some activation after pooling layer for sentence-transformers. it has to be of masters level, have an abstract, with APA formatting, and at least 20 references with proper intext citations. also, do it in US English.
ID del proyecto: 32963093

Información sobre el proyecto

1 propuesta
Proyecto remoto
Activo hace 2 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
1 freelancer está ofertando un promedio de $250 USD por este trabajo
Avatar del usuario
HI, Its an easy task for us. We have experienced developer in php c programing web scramping . We are operating since 2012 . Please come on chat to discuss the project in detail. Project Milestones will be decided during chat. Thank You Regards: Arpit Jain Black Grapes Softech
$250 USD en 7 días
0,0 (0 comentarios)
0,0
0,0

Sobre este cliente

Bandera de PAKISTAN
islamabad, Pakistan
5,0
2
Forma de pago verificada
Miembro desde sept 29, 2021

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.