Summary
Overview
Work History
Education
Skills
Languages
Certification
Publications
Additional Information
Timeline
Generic
Benjamin Chu Min Xian

Benjamin Chu Min Xian

Summary

I am a data scientist specialising in natural language processing and graph technology. I translate research into applications via components, patents, and papers. Recently, I have worked with generative AI and retrieval-augmented generation frameworks. Along with contributing to hands-on development, I enjoy mentoring fellow data scientists.

Overview

17
17
years of professional experience
1
1
Certification

Work History

Senior Lead Specialist

Synapxe
03.2022 - Current
  • Visual Pillbox - Completed a Gen-AI proof of concept in collaboration with Khoo Teck Puat Hospital, as part of the AI Trailblazer initiative, leveraging Google’s Vertex AI and Retrieval Augmented Generation (RAG). I developed a system that processes medication images—whether single or multiple uploads—using Google Vision OCR to generate patient-centric summaries that include storage guidelines, administration instructions, and common side effects. It also produces a visual schedule that arranges medication doses into morning, afternoon, and evening slots throughout the week. The system accesses a vector database of medication leaflets for targeted context retrieval and utilises Vertex AI’s data source grounding to prevent hallucinations. With LLM chaining via LangChain’s MapReduce, it condenses large contexts into clear, concise summaries. This solution automates visual handouts for pharmacists, significantly enhancing patient education and communication.
  • HealthKaki - a 14-week proof of concept, was developed in collaboration with the Ministry of Health's Public Health Division, Synapxe, and Temus to empower residents in managing their health through personalised recommendations. The PoC is completed, where my key role is to develop an agentic workflow using a stateful, graph-based mechanism within the LangGraph framework to dynamically generate and refine exercise plans. I designed the components to be modular and adaptable, employing state management for real-time updates, condition-specific decisions, and integration of Tier 1 health actions from the Ministry of Health and Health Promotion Board guidelines. Workflows offered personalised exercise recommendations by merging contexts from structured knowledge bases (e.g., HPB MoveIt series) and open-source data, enriched with metadata for targeted muscle groups. I integrated AWS Bedrock's Claude 3.5 LLM to create tailored exercise plans that considered user preferences, activity history, and health conditions, ensuring safety and customisation. Furthermore, I utilised AWS DynamoDB to store user settings, allowing personalised exercise plans to evolve based on previous configurations.
  • C3 Graph - Completed a POC exploration with TTSH, focusing on optimising patient flow at triage through graph analytics to enhance bed utilisation and reduce congestion. Patient movement was represented as a graph network to visualise admission flow and monitor multiple transfers between wards until discharge. Graph clustering and centrality analysis were utilised to identify daily and hourly hotspots, highlighting wards with the highest incoming admissions. Predictive models leveraging graph-derived features were developed to forecast bed type requirements upon admission, providing a 1.5-hour lead time for proactive resource management. This approach facilitated earlier discharge planning and minimised supply-demand mismatches, improving overall patient flow efficiency.
  • Patient Journey - Conducted exploratory work for HSA to leverage graph analytics to enhance the analysis of patient journeys. Developed patient-centric views with interactive widgets to summarise patient pathways, capturing visits, diagnoses, and prescriptions over time. Created sub-queries with filter parameters to generate detailed visualisations, including medication timelines, diagrams of diagnosis-medication interactions, and lists of prescriptions. Built a dashboard for cohort-level analysis, segmenting patients into groups and identifying trends in disease and treatment patterns using graph algorithms and traversals. The dashboard provides insights into diseases and medications, allowing users to quickly identify effective treatments and assess the efficacy of medical prescriptions and procedures after a diagnosis.

Senior Sales Engineer

TigerGraph
08.2021 - 03.2022

POV Scoping and Technical Implementation with TigerGraph
Led the scoping and solutioning of Proof-of-Value (POV) projects, defining technical implementations using TigerGraph functionalities to address client use cases. Assessed client data points to design graph schemas and execute data mapping, transforming raw data into TigerGraph's native representation for ingestion. Collaborated with sales teams during customer calls to understand problem contexts and draft detailed scopes, success criteria, and deliverables for POV plans. Developed graph logic and queries tailored to client-specific use cases, including path traversals and graph algorithms.

Key POV Projects:

  • Singtel Network Data Connector (NDC): Enabled correlation of poor-performing cell towers with users connected to them at specific locations and times, allowing detailed drill-downs into network performance issues. Implemented use cases demonstrating the ability to identify low-experience cells and determine the affected users, pinpointing the number of impacted users and their experience scores at specified hours.
  • GIC: Conducted deep link analysis to uncover hidden relationships between individuals and companies across multiple degrees of connection. Demonstrated capabilities to trace connections between GIC personnel through activities, meetings, and shared links. Implemented recommendation systems using graph algorithms to suggest meeting notes relevant to an individual’s network or recommend additional participants for meetings based on past interactions and similarities.

Senior Data Scientist

Refinitiv, Thomson Reuters
03.2018 - 07.2021
  • Project IRIS: Designed and implemented a framework for risk propagation to identify and score potential suspicious actors in graph networks, utilising centrality measures and risk propagation mechanisms. Scored entities and subgraphs based on network density and structure, creating an anomaly detection module to flag unusual graph patterns. Developed a scalable graph analytics SDK, architecting a graph processing platform integrated with AWS Neptune as a data lake, along with custom adapters for AllegroGraph and Neo4J, and NetworkX for scoring pipelines.
  • Sherlock: Developed a hybrid, unsupervised approach to topic modelling that combines customised LDA with Gibbs sampling, graph centrality, and clustering. Applied this generalised method to financial equity reports, earnings transcripts, and Reuters News, revealing actionable topics for downstream applications.
  • Infura (Infrastructure 360°): Developed a robust methodology to integrate various infrastructure project datasets—financing, deal, and ESG data—using a graph-based approach. This solution addressed the investment gap in the infrastructure sector, where private equity capital struggles due to issues with data transparency. I have developed an implicit linking method that combines NLP and Refinitiv PermID technology, covering over 60,000 projects, 24,000 entities, 30,000 connected deals, and more than four million relationships across six linked databases, accessible via a RESTful API. Data exploration and visualisation were conducted in StarDog Studio, with RDF graphs exported to AWS Neptune for efficient querying and integration. A study report for the Global Infrastructure team highlighted the limitations of the previous SQL-based approach, particularly its rigid schemas and interdependencies. The graph-based solution enabled faster insights and allowed for detailed analysis of projects by type, deals, transactions, and related dependencies. By linking unstructured data—such as comments and text—through Named Entity Recognition (NER) and associating entities with unique PermIDs, the knowledge graph significantly enhances data access and transparency, thereby improving infrastructure data visibility from existing datasets.

Data Scientist

MediaCorp Pte Ltd
08.2017 - 02.2018
  • Toggle TV Graph: Designed and developed a graph-based search engine for Toggle, organising media content such as series, episodes, and clips into a graph structure. Implemented relevance, phonetic, and semantic search algorithms to enhance user search experiences.
  • Exploration Project with DBS: Built a linguistic analyser to classify verbatim feedback (short comments indicating praise or complaints) related to specific branches and services. Leveraged machine learning models to categorise verbatims into themes and applied a hybrid approach combining linguistic patterns and ML for sentiment classification.
  • Auto News Tagger for Channel NewsAsia: Developed an enhanced auto news tagger to analyse news articles and generate relevant topic-based keywords and named entities. Utilised DBpedia for name resolution and topic identification, and Internet Advertising Bureau (IAB) standards for generating relevant keyword topics.
  • Project Greenlight: Designed and built a movie analyser, "Greenlight," to evaluate movie synopses. Integrated predictive models, natural language processing (NLP), and Dynamic Time Warping (DTW) to analyse sentiment shape sequences within storylines. Classified synopses into Blake Snyder’s story archetypes and IMDb categories. Employed Shapley attribution values to assess the influence of specific story archetypes on a movie's success, providing actionable insights into narrative-driven factors.

Lead Data Scientist

Jewel Paymentech
09.2016 - 07.2017
  • Marketplace AI: Designed, tested, and refined a predictive model for Marketplace AI on 11street, one of Malaysia’s largest e-commerce platforms, a joint venture with Celcom Axiata Berhad and SK Planet. The model scanned product listings, including labels, descriptions, and OCR-extracted text, to identify and classify illegal or banned substances, counterfeit products, e-cigarettes, and other prohibited items. Each product was scored against designated risk categories.
  • OneSentry AI: Developed and optimised a classifier model for OneSentry, a solution to monitor merchants for high-risk behaviours during onboarding and operations. Implemented a web heuristic module to assess merchant websites and social media platforms for traits indicative of risk, such as non-compliance with regulations on pharmaceuticals, gambling, and deceptive practices. Utilised sentiment analysis to detect negative customer sentiment, a key indicator of chargeback risks, enabling proactive compliance monitoring.
  • 11street POC: Leveraged Neo4J to create a methodology for detecting coupon fraud by extracting network insights.
    Developed a spam detection system for the marketplace using Natural Language Processing (NLP) to extract linguistic patterns. Applied topic modelling and predictive algorithms to identify outliers and potential fraud.
  • FraudWall AI: Led the design of FraudWall AI, transforming transactional records into graph structures. Utilised graph analytics to identify anomalous patterns, enabling detection of suspicious transactions and strengthening fraud prevention rules.

Senior Researcher

MIMOS Berhad
08.2009 - 08.2016
  • Text-to-Graph Platform Development: Designed, and developed a Natural Language Processing (NLP) platform for knowledge analytics and extraction from unstructured texts in English, Malay, and Mandarin. Implemented web-service-enabled components to identify entities and relationships, transforming text into knowledge graphs stored in AllegroGraph for querying, insights generation, and visualisation on a dashboard. Benchmarked and published research papers evaluating the platform's components.
  • Algorithm Design & Complexity Analysis: Analysed module complexities and designed algorithms to optimise language processing workflows across multiple languages.
    Authored requirement and design documentation, conducted functional and performance testing using tools such as JMeter and SoapUI, and guided testers in preparing comprehensive test plans.
  • Leadership & Team Management: Led a multidisciplinary team in researching, developing, and testing individual modules of the NLP platform.
    Oversaw benchmarking and evaluation processes that led to research publications and patent filings for the platform's innovative components.
  • Academic & Research Collaborations: Directed collaboration with Universiti Malaya on an E-Science-funded project to develop a Malay Text Understanding system for processing unstructured Malay texts.
    Coordinated partnerships with local and international institutions, including Universiti Malaya (UM), Multimedia University (MMU), the German National Library of Economics (ZBW), and the Food and Agriculture Organisation (FAO), fostering knowledge exchange and research advancements.

Analyst Developer

Phillip Capital
06.2008 - 07.2009
  • Business Analysis: Functioned as a business analyst to evaluate client requirements, translating business needs and process flows into detailed requirement and design documents.
    Delivered documentation crucial for developers to build key modules for the FAME wealth management platform.
  • System Development: Contributed to developing new features and enhancements on the FAME platform, including writing SQL stored procedures for transactional modules.
    Customised and improved the platform interface, resolving issues and ensuring seamless functionality.

Education

Master of Science - Information Technology

Malaysia University of Science And Technology
Selangor, Malaysia
01-2008

Bachelor of Science - Information Technology

University of Malaya
Kuala Lumpur, Malaysia
01-2004

Skills

  • Knowledge Graph & Graph Databases
  • Retrieval Augmented Generation (RAG) Frameworks
  • Search Retrieval & Vector Databases
  • Multimodal Processing
  • Agentic Frameworks
  • LangChain & LlamaIndex
  • AWS Bedrock, Azure AI & Google Vertex AI

Languages

English
Bilingual or Proficient (C2)
Malay
Bilingual or Proficient (C2)

Certification

  • Multi AI Agent Systems with CrewAI (CrewAI, issued May 2024)
  • Diagnostic Network Optimization (World Health Organisation, issued May 2023)
  • Graph Algorithms for Machine Learning (TigerGraph, issued Aug 2022)
  • Neo4j Graph Data Science (Neo4j, issued Aug 2022)

Publications

  • Xian, B., Lubani, M., Liew, K., Bouzekri, K., Mahmud, R., & Lukose, D. (2016). Benchmarking Mi-POS: Malay Part-of-Speech Tagger. International Journal of Knowledge Engineering, 2, 115-121. https://doi.org/10.18178/ijke.2016.2.3.064


  • Chu, B., Zahari, F., & Lukose, D. (2012). Benchmarking T-ANNE: Text annotation system. In Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies (pp. 6). Association for Computing Machinery. https://doi.org/10.1145/2362456.2362464


  • Xian, B. C. M., Zahari, F., & Lukose, D. (2011). Benchmarking ARS: Anaphora resolution system. In Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies (pp. 39). Association for Computing Machinery. https://doi.org/10.1145/2024288.2024334

Additional Information

Assignee: Refinitiv LLC MAY 2020

US20200160121A1: SYSTEMS AND METHOD FOR SCORING ENTITIES AND NETWORKS IN A KNOWLEDGE GRAPH

Published: May 22, 2020

URL: https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2020100108&_fid=US295247752

Systems and methods of improved network analytics are disclosed. A system may determine feature propagation in a network of nodes of a graph database. The system may compute, at scale, datasets having complex relationships using graph analysis to determine network effects of entities in a network of entities stored in a graph database. The system may identify entities of interest, which may be associated with a quantitative feature value. The system may compute paths from an entity to the entities of interest, centrality metrics for entities in each of the paths, and path lengths to determine network effects of the entity of interests on the entity. The system may use the computed network effects, taking into account types of relationships between entities in the paths, to determine scaled quantitative feature values for the entity that is subject to the network effects of the entities of interest.

Assignee: Jewel Paymentech Pte Ltd APR 2020

US20200175518A1: APPARATUS AND METHOD FOR REAL-TIME DETECTION OF FRAUDULENT DIGITAL TRANSACTIONS

Published: Jun 4, 2020

URL: https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2018164635

An apparatus (100) for real-time detection of fraudulent digital transactions is disclosed. The apparatus comprises: a transceiver module arranged to receive information data of a digital transaction; a model generator module (102) arranged to dynamically generate a predictive model for frauddetection based collectively on historical information data relating to identified fraudulent transactions and the received information data; and a fraud detection module (104) having a plurality of anomaly detection modules (1042, 1044, 1046) arranged to respectively process the received information data differently to generate a plurality of scores, which are aggregated to provide an aggregated score to enable real-time determination of whether the digital transaction is a fraudulent digital transaction. A first anomaly detection module (1042) is configured to process the received information data using the predictive model to generate a first score. A related method is disclosed too.

Assignee: MIMOS Berhad

WO2015080561 - A METHOD AND SYSTEM FOR AUTOMATED RELATION DISCOVERY FROM TEXTS

Published: Jun 4, 2015

URL: https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2015080561&_cid=P11-M5WK64-04236-2

The present invention provides a system (100) for discovering relations between texts in sentence of a machine-readable document. The system

comprises a text preprocessor (101) and a relation discovery module (102). The text preprocessor (101) processes the documents to identify and extract entities, noun phrases and verb from therefrom. The relation discovery module (102) discovers the relation through a generic and

semantic relation extraction for unstructured and structured texts to resolves intra-sentential and inter-sentential contexts.

Timeline

Senior Lead Specialist

Synapxe
03.2022 - Current

Senior Sales Engineer

TigerGraph
08.2021 - 03.2022

Senior Data Scientist

Refinitiv, Thomson Reuters
03.2018 - 07.2021

Data Scientist

MediaCorp Pte Ltd
08.2017 - 02.2018

Lead Data Scientist

Jewel Paymentech
09.2016 - 07.2017

Senior Researcher

MIMOS Berhad
08.2009 - 08.2016

Analyst Developer

Phillip Capital
06.2008 - 07.2009

Master of Science - Information Technology

Malaysia University of Science And Technology

Bachelor of Science - Information Technology

University of Malaya
Benjamin Chu Min Xian