Joseph Helbing
Machine Learning Engineer + Data Scientist
chatContact Information
psychologySkills & Expertise
auto_awesomeTechnical Skills
- smart_toyMachine Learning (ML)
- speedHigh Performance Computing (HPC)
- chatNatural Language Processing (NLP)
- psychology_altReinforcement Learning (RL)
- peopleAgent Based Modeling
buildTools & Technologies
- codePython, R, SQL, HTML, CSS, HTMX
- webWeb Scraping
- cloudAWS Textract, S3, GovCloud
- analyticsData Collection, Analysis & Visualization
- bar_chartPlotly, Pyvis
- campaignCommunicating Analysis to Non-Technical Audiences
workProfessional Experience
analytics
Imprivata
Data Scientist2025 – Present
- speedRewrote legacy EMR data processing pipeline using Polars lazy frames, reducing processing time from 5 hours to 2 minutes per ~30GB file (150x speedup) across 1000+ daily runs, saving ~$200K annually in compute costs
- groupsLeading cross-team collaboration with Data Science, DevOps, and Implementation to migrate production systems while maintaining backward compatibility and addressing critical security vulnerabilities for new EMR pipeline
- smart_toyDeveloping agentic RAG system using Amazon Bedrock and Pydantic AI to automate analysis of security datasets, triaging authentication failures and accelerating incident response times.
security
US Digital Corps at EEOC
Data Scientist2024 – 2025
- searchAssisted EEOC investigators on AI and ML related investigations with technical consulting, forensic source code review, and case related testing
- cloud_uploadModernized analytics infrastructure with cloud computing and VM systems for data science tasks
- smart_toyAssisted EEOC data analysts and statisticians on case investigations by training Language Model based classification systems, network analysis, and other modern data science techniques
- tuneTook ownership of existing codebases of internal analytics tools—upgrading and refactoring with modern best practices to achieve large speedups, limit concurrency issues, and minimize resource footprints, while adapting them to lower cost platforms
- groupsCo-led US Digital Corps NLP working group, managing group administration, facilitating inter-agency knowledge sharing, and training on NLP methods
starHighlighted Project — Language Model Case Analysis Support
- Fine-tuned bi-directional encoder decoder vectorization models using BERT and modernBERT architectures for free form textual resume classification in support of case investigations for applicant flow analysis
- Used unstructured application text to predict job titles for unhired applicants, using hired applicants' job titles as training data with test-train splits and post training accuracy exploration to assist statisticians in identifying group differences in discrimination investigations
library_books
Library of Congress – Federal Research Division
Data Analyst2023 – 2024
- psychologyLong form research reporting for military and federal clients, including literature review, statistical data analysis, and visualization
- architectureTechnical lead on large-scale Natural Language Processing (NLP) project utilizing a combination of AWS GovCloud and local compute based open weights foundation models in a data extraction pipeline from legal document images
- storageContributor on dual-use technologies research reports via Chinese source material for US government client
starHighlighted Project — Large-Scale Document Processing System
- Architected and implemented an end-to-end pipeline for extracting structured data from 500,000+ military court martial documents across all U.S. military branches
- Architected an advanced document processing pipeline integrating AWS Textract, custom bounding box algorithms, and LLM refinement to extract 60+ variables from heterogeneous military forms, with built-in quality assurance through a purpose-built GUI for sampling based human verification
- Designed and implemented SQL database architecture for efficient storage and retrieval of extracted information
- Led technical training sessions for US Digital Corps NLP working group on LoC standardize form extraction methodology
school
University of Chicago Data Science Institute
Research Assistant2022
- downloadDeveloped a web scraper for the Security and Exchange Commission (SEC) EDGAR API to access corporate reports
- psychologyUtilized statistical textual matching techniques, XBRL scraping, and open-source pretrained machine learning models to create an information extraction pipeline
business
Paratech Inc
Marketing Coordinator, Industrial Sales Manager, China Regional Sales Manager2015 – 2021
Marketing Coordinator (2019 – 2021)
- campaignOversaw the update of Paratech's corporate website, managed marketing materials, and fostered dealer partnerships, focusing on tech integration and staff training
- webLaunched a new WordPress website, handling design, content, and CMS
- video_libraryDeveloped webinar series utilizing YouTube, trained the sales team, and ran live training events in the field broadcast to customers
Industrial Sales Manager (2017 – 2019)
- engineeringLed the development of a new industrial sales market, establishing partnerships with distributors and manufacturing representatives
- design_servicesProduced industrial, military, and maritime marketing materials, including brochures and videos, using tools like InDesign, Photoshop, and DaVinci Resolve
- precision_manufacturingContributed on the production floor in product assembly working from engineering drawings during peak times to meet manufacturing targets
China Regional Sales Manager (2015 – 2017)
- languageOverhauled Chinese operations strategically and organizationally, improving the company's position and navigating away from problematic relationships without harming client networks
- handshakeTransitioned local contacts to direct company relationships, retaining all dealers and clients
- translateHandled contract design and translation between Chinese and English, and directly negotiated partnerships through multiple visits to the country
rocket_launch
SiMple International Inc
Founder2014 – 2015
- phone_androidFounded company selling international telecommunications to exchange students through partnership distribution channels
public
INTO University Partnerships
US Recruitment Manager2013 – 2014
- translateWorked with education consultancy network to recruit students to attend degree programs in the United States in representation of 6 US universities
- placeNorthern China recruitment territory based in Dalian, Liaoning China. Self-planned and executed travel, events, and position responsibilities executed exclusively in Mandarin Chinese
schoolEducation
auto_awesomeAdvanced Degrees
- psychologyUniversity of Chicago – M.A. Computational Social Science (2023)
- languageOhio State University – M.A. East Asian Languages & Literatures, Mandarin (2013)
- publicUniversity of Illinois at Urbana-Champaign – B.A. Political Science & China Studies (2011)
translateLanguages
- flagEnglish – Native
- languageMandarin – Professionally Fluent
codeFeatured Projects
- psychology_altRL-ABM Experiments — Reinforcement Learning Enhanced Schelling Segregation Model
- waterfall_chartCascade — Data flow and processing framework
- descriptionDocument Extraction Gist — Advanced document processing techniques