Find Jobs
Hire Freelancers

Counting duplicates in big files

$10-30 USD

Cerrado
Publicado hace casi 5 años

$10-30 USD

Pagado a la entrega
I have about 14million rows of data with 6 columns in csv format. Created a working solution in Power BI that do the trick within 30mins but the program has limitation of row size that can be exported for further processing and can only run 2 files (sometimes buggy) whereas i need to run 6 files in a day. Target: -a program or any data manipulation software, sql codes that return the counts of the number of rows or entries that have similar content as the current row - from 1 entry only to all 6 columns/entries -the position of the column is not important in the check e.g. for count of 5 similar entries, the following 2 (representative entries, not actual) rows will have the result of 1 because of 2,3,4,5,6 1,2,3,4,5,6 - 1 2,3,4,5,6,7 - 1 -It should able to return the result fast - not more than 30mins (can be discussed)/ or maximum 4 hours for 6 files. note: Unfortunately, I cannot give milestone payment for program/solution that cannot meet the processing timing.
ID del proyecto: 20307066

Información sobre el proyecto

25 propuestas
Proyecto remoto
Activo hace 5 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
25 freelancers están ofertando un promedio de $48 USD por este trabajo
Avatar del usuario
Hi, I'm a data engineer with over 5 years of industry experience on a wide array of tech stacks including databases, data warehouses, machine learning, big data/Hadoop. I'm currently pursuing my Master's in Data Science from Trinity College Dublin and would like to discuss the requirements with you. All my best, Tousif
$50 USD en 3 días
4,9 (53 comentarios)
5,7
5,7
Avatar del usuario
I can upload the file into SQL db using SSIS ETL with removal of duplicate records with efficient performance. And there will not be any restriction of no. of files. You can load N number of files in one go. Let me know if you are ready to go with this solution.
$30 USD en 1 día
5,0 (21 comentarios)
4,7
4,7
Avatar del usuario
Hi, I understood your problem very nicely, processing large amount of csv data in an efficient and speedy way. Well Python is your tool for this task. This is the type of problem (Data Processing) Python solves the best. Meet me, an experienced and knowledgeable (yet friendly) Python programming, using Python in many areas. Recently became even Full Stack Python Developer. Before bidding I took a speed test; I created a 100000 rows (with 6 columns) csv. Then in speed testing script, I opened that file and processed each row with some light processing; It took 0.18 secs. Multiplying it by 140 (for 14 million rows calculation) makes 25.2 secs only. This was just a demonstration how faster your work can be done! And for sure without any bugs. Your will simply get an executable which will do your task. Can you please trust and hire me? And will rise me up on this site? You: FOR SURE NOT :( And please don't expire your project at least; as many others do that there. I shall be very thankful to you, Ishaq
$20 USD en 2 días
5,0 (23 comentarios)
4,4
4,4
Avatar del usuario
Hi, My name is Ali and I can work on the task with immediate availability. I can do duplication check in SQL Server. Let's have quick discussion so I can work on it.
$30 USD en 3 días
5,0 (29 comentarios)
4,6
4,6
Avatar del usuario
Hi. I can make a program that can solve your problem. I have enough experience to tackle the problem. Message me to discuss
$25 USD en 7 días
4,8 (14 comentarios)
4,0
4,0
Avatar del usuario
Hi. I can write this program on native language (not c# or pypton) and it will calculate very fast. See my reviews and completion rate on this site. Regards, Alex.
$250 USD en 3 días
5,0 (3 comentarios)
4,0
4,0
Avatar del usuario
Hey I have got your requirement and can deliver you a SQL script that will compute results within maximum 10 minutes. You can message me to get query and check if it is giving you result within time and then you can award project to me.
$35 USD en 1 día
5,0 (3 comentarios)
3,6
3,6
Avatar del usuario
Myself Anil have more then 10 years of experience in SQL Server databse development and Administration. I have worked with big Databases for clients like match. Com, nationstar mortgage and with TCS. I am also good for BI part like SSIS, SSRS and SSAS. I am looking for full time /part time or hourly remote job. I am immediate available. 
$35 USD en 1 día
5,0 (5 comentarios)
3,3
3,3
Avatar del usuario
Okay the program will process in your given time. But you need to discuss more over chat about job. Thanks
$30 USD en 2 días
5,0 (8 comentarios)
2,9
2,9
Avatar del usuario
Hi, I can manupulate your csv file by python in 1 day. Please send me message so that we could discuss it further. To make sure that employment will truly serve your requirement, you can evaluate my skill by giving part of your coding for skill testing before hiring. about me: I am keen to work for those who need an expert in SQL Language and Database System. More than 7 years experience in many project on SQL, non-SQL, and Database Management such as MS SQL, Oracle, mySQL, Postgresql, MS Access, MongoDB, Firebase and so on. I developed many web applications using Angular, Bootstrap, Angular Material, Prime NG, Ant Design, and Express NodeJS. 3 years experience Data science , Data mining project using Python, Pandas, Numpy, R, Scikit learn, XGboost and Lightgbm. I have top 11% on Kaggle competitions See my Kaggle profile : @l0ginx or "Pratya Thanwatthanakit" I like to face technical challenges and feel pleasure to resolve.
$30 USD en 1 día
5,0 (6 comentarios)
2,9
2,9
Avatar del usuario
Did you manage to make a decision to pick the freelancer? I have got the code ready and I will test it with the 14million rows of data if you can get me a sample CSV. It’s written in Python and is fairly looks for a set range of observations (columns). Ping me on chat. Take care!
$10 USD en 2 días
4,6 (1 comentario)
2,2
2,2
Avatar del usuario
I see what you want, however its not completely clear. So, I might want to ask a few things first if we decide to work on it. It won't take more than 2 days to complete such a program, so 7 days which I am proposing is just to be on the safer side you know. I understand that you want the program to run in limited time, and I won't ask for any milestone payment if I am unable to deliver that. So no worries. Thanks, looking forward to work on this.
$25 USD en 7 días
4,8 (1 comentario)
1,0
1,0
Avatar del usuario
Hi there! I am 4+ years experienced developer as Python, Django, RoR & ReactJS. Please open the chat box for further discussion. Regards,
$25 USD en 10 días
5,0 (1 comentario)
0,4
0,4
Avatar del usuario
Thank you for your post, sir. I have a good chance of bidding your project. I want to share opinions about your project by chat. Collaboration with you will be a great boon to me. I put time and quality first in all my work. I have served on the web for many years and have superior knowledge and function in managing the database. So I want to get some issues from your project and to discuss that with you. If you hire me, I will do the best for you. I want to keep in touch with you in a short time. I will wait for your response. Best Regards. In addition, what files are your basic data in?
$20 USD en 7 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
Hi, We are a young startup based out of Bangalore specializing in the data analysis and data science domain. Recently, we completed a similar project in R language where we checked the words matching (instead of numbers in your example). Let us connect and discuss further to get this done asap. Thanks, Abhishek
$30 USD en 4 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
If it does not work in Python, then we can try to do it Perl. If you have the data in an relational database, we can do something than includes both of them. I am very curious to understand... Singapore is one of the most richest countries in the world and you want to pay only 30$ for a task that something that even you cannot do? Kind Regards N.
$250 USD en 7 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
Hi! I can make an application for you on C#. It will be maximally fast and process files in minimum time. I can do that in 1-2 hours. Write me to discuss details. Thanks!
$30 USD en 1 día
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
Hi, I am an expert in java and python and I can complete this job within a day. I have read your requirements and look forward to working with you. Let's continue this in freelance chat
$40 USD en 1 día
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
Hello, Thanks for posting this job and giving us opportunity to apply on it. I have read project description and can assure you that I can handle this job. Please reply back to get into more details over chat board. Lets discuss complete plan and approach of mine for this job. Looking forward for your positive response! Regards, Page O.
$20 USD en 7 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
I can upload the file into free version on SQL Express DB using Openquery/Openrowset .The csv file to dump in a location in the system where SQL Express .Then using a Tsql script to get the desire result. The whole process to read data from csv and get the desired output will take 10 minutes. I have worked on this solution,
$70 USD en 2 días
0,0 (0 comentarios)
0,0
0,0

Sobre este cliente

Bandera de SINGAPORE
Singapore, Singapore
5,0
1
Forma de pago verificada
Miembro desde jun 25, 2019

Verificación del cliente

Otros trabajos de este cliente

I need a coder
$10-30 USD
¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.