Job Description:
• Lead the design, development, and deployment of ML and graph-based algorithms for international entity resolution, identity trust scoring, and anomaly detection across heterogeneous, country‑specific datasets.
• Architect reusable matching and linking frameworks that work across multiple ID schemes (e.g., national ID numbers, passports, voter IDs, mobile accounts, bank accounts) and local name/address conventions.
• Develop probabilistic and rule‑augmented models that handle noisy, sparse, or partially labeled international data while maintaining explainability and regulatory defensibility.
• Define and evolve the international extension of Socure’s identity graph: schema design, linkage strategies, quality tiers, and confidence scoring that can be leveraged by multiple products (Verify, KYC, watchlists, fraud).
• Design and implement robust data quality and monitoring frameworks for international identity data (coverage, stability, drift, regional bias, label quality) and integrate them into modeling and production monitoring workflows.
• Own experimentation strategy for major international eKYC initiatives: Design offline evaluations and online A/B tests that reflect local ground truth constraints and data sparsity.
• Define success metrics that balance approval rates, fraud capture, and regulatory/operational constraints per market.
• Analyze lift, stability, and fairness trade‑offs and drive go/no‑go decisions with Product and Engineering.
• Contribute to model governance documentation and support responses to regulators and large enterprise customers regarding model logic, data provenance, fairness, and monitoring for international markets.
Requirements:
• Master’s or Ph.D. in Computer Science, Data Science, Machine Learning, Statistics, Mathematics, or a related field, or equivalent practical experience.
• 6+ years of hands-on applied ML / data science experience (4+ with Ph.D.), including owning production models and pipelines in high‑stakes domains (fraud, risk, identity, payments, credit, or similar).
• Significant prior work on international or multi‑region products is strongly preferred (e.g., cross‑country KYC, credit risk, payments, or compliance systems).
• Expert‑level proficiency in Python and SQL, with extensive experience in distributed data processing (Spark/PySpark, Databricks or similar) on very large datasets.
• Deep experience designing, training, and deploying models for classification, ranking, anomaly detection, and/or graph learning, including:
• Feature engineering for noisy/heterogeneous identity data.
• Robust evaluation under label sparsity and feedback delays.
• Calibration and thresholding tailored to regional risk and regulatory constraints.
• Proven expertise with graph technologies (e.g., Neo4j, AWS Neptune, GraphFrames, DGL, PyTorch Geometric) and graph algorithms (entity resolution, link prediction, community detection, label propagation) at scale.
Benefits:
• Offers Equity
• Offers Bonus
Apply tot his job
Apply To this Job