Process | Japan Oncology RWD Portal

Data flow — from hospital EMR to client deliverable

How hospital registry and EMR data become aggregate reports and de-identified client datasets.

🏥

Medical Institution

Cancer registry combined with EMR records.

▶

🔍

Sumitomo + RWD Partner

Build the hospital database, query target patients, and supplement missing fields.

▶

📋

Sumitomo DB

Aggregate reports and research-ready raw data prepared.

▶

📊

Central Server

Reports and curated data delivered through a controlled server.

▶

✉

Pharma Client

Client receives agreed deliverables for analysis and regulatory use.

Pseudonymized raw patient-level data requires IRB protocol review at each participating institution before release.

Detailed source-deck data flow

EMR IDs support in-hospital review before de-identification or pseudonymization.

01	Import hospital-based cancer registry data.
02	Automatically import test and medication data for registry patients.
03	Import biomarker data for those patients using AI.
04	Build an in-hospital database with EMR IDs linked.
05	Query the in-hospital database to extract target patients.
06	Verify missing data in the EMR using the EMR IDs of target patients.
07	Add EMR information to resolve missing fields.
08	Query the Sumitomo DB and send aggregate reports to the central server.
09	Send raw data after replacing EMR ID with case ID where required.
10	Download aggregate reports and raw data from the central server.
11	Deliver aggregate reports and raw data to the client.

Scope by step

Step 1 vs Step 2 — data scope and effort

Step 1 answers feasibility quickly. Step 2 unlocks deeper clinical and molecular curation.

Step 1

Reference catalog · No JRA required

Quick feasibility read

Patient counts by tumor type and stage
Biomarker test results obtained per population
Field confirmation for tests and assays
Sample sizes by participating hospital
Coverage completeness across treatment lines

Limited fields, sufficient to confirm fit before contracting.

Step 2

Joint Research Agreement · Custom collection

Specific needs delivery

Demographics — gender, region, race, DOB, smoking, comorbidity, menopausal status
Family History
Diagnosis — cancer type, dates, stages, histology, grade, metastasis
Molecular Pathology — specimen, assay, biomarker, variant, clinical significance
Surgery
Medications — drug name, dose, cycles, status
Radiation
Labs
Adverse Events (CTCAE)
Outcomes — vital status, dates, disease status
Performance Status (ECOG)
Vitals

Delivered through the hospital, Sumitomo, RWD partner, and pharma client.

Source-deck data detail

Reference catalog and custom collection fields

Step 1 uses limited reference fields. Step 2 adds deeper custom collection after a JRA.

Step 1 data items

Reference Japanese dataset · Patients and hospitals

Reference database for cancer patients, including biomarkers

Data on 300,000 cancer patients since 2018 across 25 Japanese hospitals
Clinical data for solid tumors treated with chemotherapy during initial treatment
Reference dataset continues to grow
Biomarker test results such as EGFR and HER2 are available for feasibility screening
Additional data can be collected through in-hospital EMR review

Reference fields

Age, gender, date of diagnosis, cancer type
Clinical stage and TNM classification
Date of initial chemotherapy
Drug data, laboratory test results, biomarker test results
Death date and survival confirmation fields

Step 2 data items

JRA-governed collection · Specific client needs

Custom clinical data collection by research question

Demographics: gender, address, race, date of birth, smoking status, comorbidity, menopausal status
Family history: family member, cancer type, date of cancer
Diagnosis: cancer type, dates, clinical/pathologic stage, histology, grade
Metastasis: date of metastatic diagnosis, site of metastasis, recurrence date
Molecular pathology: specimen, dates, lab, methodology, biomarker, variant, significance
Treatment: surgery, radiation, medication, dose, cycles, administrations, line of therapy, status and reason
Evidence endpoints: labs, performance status, adverse events, outcomes, imaging evaluation, vitals, last follow-up, death date

Step 1 standard reference fields

Catalog fields from cancer registry data plus EMR-extracted medications, labs, and biomarkers.

Domain / file	Representative fields / examples	Source / route
Patient foundation	Patient ID, Episode ID, patient name, gender, date of birth	Medical record number, duplicate number, identity, gender, date of birth
Cancer type	Date of diagnosis, cancer type classification, position, detailed part, laterality	Date of diagnosis; primary site localization code; primary root site; laterality
Cancer type	Differentiation, organization, pathological diagnosis name	Pathology diagnosis morphological code and histological text
Stage	UICC version, cT/cN/cM/cStage, pT/pN/pM/pStage	UICC Ver. 8 and TNM classification fields
Treatment methods	Date of start of treatment, treatment methods, lines of treatment	Surgery, endoscopic treatment, radiotherapy, chemotherapy, endocrine therapy; line is primary only if chemotherapy is included
Outcomes	Outcomes, last survival confirmation date, date of death	Survival status, last confirmed survival date, death date
Direct EMR extraction	Medication, laboratory test, biomarker test result data	Daily medication and lab updates; biomarker test result data

Step 2 standard collection files

JRA-stage files collected and curated for the agreed research question.

Domain / file	Representative fields / examples	Source / route
cow_consent.csv	Case management number, consent type, consent status, consent date, representative	Consent and withdrawal records
cow_patient.csv	Case management number, gender, date of birth, age	Patient demographics
cow_patient_background.csv	Family history, ECOG PS, ECOG PS evaluation date, smoking status, smoking years, cigarettes per day, Brinkman Index	Manual curation and EMR review
cow_comorbidity.csv / cow_medical_history.csv	Disease name, age at onset, other disease name	Comorbidity and medical history review
cow_cancer.csv	Date of diagnosis, pathology report date, cancer type classification, OncoTree code and version	Cancer diagnosis and pathology records
cow_therapeutic.csv	Treatment content ID, start/end dates, treatment method, treatment details, best overall response	Treatment history curation
cow_therapeutic_medicine.csv	Brand name, generic name, drug code, YJ code, HOT9 code, NHI drug price standard code	Prescription and injection orders
cow_recist.csv	Outcome assessment ID, overall response, assessment date, sum of diameters, test type	Imaging and response assessment
cow_reaction_data.csv	Adverse event ID, observation/onset/resolution/action dates, causality, English and Japanese AE names	Adverse event curation
cow_laboratory.csv	Test date, original test item code, test item name, value and unit, including CYFRA, CEA, CA19-9, neutrophils, lymphocytes, AST, ALT, sodium, potassium, creatinine/eGFR, CK	Laboratory test records
cow_outcome.csv	Outcome, last survival confirmation date, date of death, cause of death	Outcome and survival follow-up

Governance

Japan research data framework

Both datasets use research-grade governance for regulatory use and patient privacy.

Research-Grade Collection

Approved research agreements support regulatory and external-control use.

Data Access Requires a JRA

Patient-level data requires a JRA under Japanese research governance.

Feasibility Derisks Investment

Catalog queries show population availability before custom curation.

De-Identified Patient Data

Raw data is de-identified or pseudonymized, with sharing governed by JRA and IRB review.

Note: Patient-level raw data requires a Joint Research Agreement among the institution, Sumitomo, the RWD partner, and the pharma client. Catalog feasibility does not require a JRA.

Engagement Process

A 2-step path to Japan oncology data

Refer to the dataset catalog

Joint Research Agreement (JRA)

Data flow — from hospital EMR to client deliverable

Medical Institution

Sumitomo + RWD Partner

Sumitomo DB

Central Server

Pharma Client

Detailed source-deck data flow

Step 1 vs Step 2 — data scope and effort

Quick feasibility read

Specific needs delivery

Reference catalog and custom collection fields

Reference database for cancer patients, including biomarkers

Reference fields

Custom clinical data collection by research question

Step 1 standard reference fields

Step 2 standard collection files

Japan research data framework

Research-Grade Collection

Data Access Requires a JRA

Feasibility Derisks Investment

De-Identified Patient Data

Ready to start with Step 1?