hero

Join a
world-changing
startup

Senior Machine Learning Engineer

Betterdata

Betterdata

Software Engineering
Singapore · Remote · Remote
Posted on Mar 21, 2024

Senior Machine Learning Engineer

Job Posted
Empty
Location
Singapore
Full time
Remote
Non-Remote

Who are we looking for:

We seek a Senior ML Engineer to spearhead the conversion of academic research into scalable, robust solutions for synthetic data generation. In this role, you will ensure our platform adheres to the most rigorous standards for an enterprise data platform with specific attention to scalability, data ingestion, error handling, database connectivity, and distributed computing techniques. Your unique blend of machine learning expertise, data engineering skills, and a knack for rapid project advancement will be critical. Collaborating with our software and research teams, you'll play a central role in propelling our initiatives beyond the expectations of enterprise applications.

Key Responsibilities:

Data Pipeline Architecture: Architect data processing pipelines capable of efficiently managing large-scale datasets, optimising for performance and error resilience.
Database Connectivity: Develop and optimise connectors for a variety of databases (SQL, NoSQL, cloud storages), ensuring efficient data integration and flow.
Data Ingestion & Processing: Create data ingestion pipelines from diverse formats (Excel, CSV, Parquet) using big data technologies (Hadoop, Spark, Dask) for scalable data processing.
Error Handling & Data Quality: Implement comprehensive error and edge case handling mechanisms to ensure system reliability across processing pipelines.
Scalable ML Model Development: Construct scalable machine learning models for synthetic data generation, including leveraging GPU parallelization for deep learning models and LLMs.
Accelerate Research: Actively work to unblock the research team, facilitating scalable research and efficient deep learning model training. This involves close collaboration to translate research insights into scalable models.
Cross-functional Team Collaboration: Spearhead the swift integration of ML models and data engineering solutions within our enterprise data platform, ensuring effective and efficient collaboration with software development and AI research teams.

Essential Skills and Qualifications:

High priority
4+ years of relevant experience, with a focus on scalable machine learning solutions and large-scale data engineering.
Proficiency in Python and ML frameworks (PyTorch > TensorFlow)
Deep expertise in big data technologies (Dask > Spark) for efficient, scalable data processing.
Solid understanding of distributed systems, data storage (e.g. S3), and data formats (e.g. Parquet), with a focus on performance and scalability.
Expertise in GPU parallelization techniques for accelerating the development and training of deep learning models.
Good to have
Degree in Computer Science, Machine Learning, or related field.
Strong background in developing database connectors, with extensive experience with SQL, NoSQL, and cloud storage solutions.
Exceptional analytical and problem-solving skills, capable of navigating complex data integration and quality challenges in fast-paced settings.

Why Join Us:

Join a team poised to make significant impacts in the synthetic data technology space, where your contributions will shape the future of the field. Our start-up environment offers the excitement of innovation and problem-solving, coupled with competitive compensation, equity options, and the chance to challenge and expand your skills at the intersection of data privacy and utility.

Benefits

Flexible work schedule - complete autonomy with no unnecessary meetings that take your time away from building
Flexible work arrangements - desk space at our office in One North Singapore or fully remote

How to apply

Does this role sound like a good fit to you?
Submit your application here
If the above does not work, you may email us your CV (pdf format) at jobs@betterdata.ai
Include the title of the role in your subject
Indicate your available start - end dates (DDMMYY - DDMMYY)
Send along links/supporting information that best showcase the relevant things you have built and done