Home / Process

Engagement Process

A two-step path from catalog feasibility to governed custom oncology data delivery.

How it works

A 2-step path to Japan oncology data

Start with catalog-level feasibility, then move to a Joint Research Agreement once the indication and scope are clear.

01

Refer to the dataset catalog

Check patient counts, hospitals, and biomarker availability. No Joint Research Agreement is required at this stage.

02

Joint Research Agreement (JRA)

A four-party agreement among the institution, Sumitomo, the RWD partner, and the pharma client. It enables custom clinical and molecular data collection.

Data flow — from hospital EMR to client deliverable

How hospital registry and EMR data become aggregate reports and de-identified client datasets.

🏥

Medical Institution

Cancer registry combined with EMR records.

🔍

Sumitomo + RWD Partner

Build the hospital database, query target patients, and supplement missing fields.

📋

Sumitomo DB

Aggregate reports and research-ready raw data prepared.

📊

Central Server

Reports and curated data delivered through a controlled server.

Pharma Client

Client receives agreed deliverables for analysis and regulatory use.

Pseudonymized raw patient-level data requires IRB protocol review at each participating institution before release.

Detailed source-deck data flow

EMR IDs support in-hospital review before de-identification or pseudonymization.

01Import hospital-based cancer registry data.
02Automatically import test and medication data for registry patients.
03Import biomarker data for those patients using AI.
04Build an in-hospital database with EMR IDs linked.
05Query the in-hospital database to extract target patients.
06Verify missing data in the EMR using the EMR IDs of target patients.
07Add EMR information to resolve missing fields.
08Query the Sumitomo DB and send aggregate reports to the central server.
09Send raw data after replacing EMR ID with case ID where required.
10Download aggregate reports and raw data from the central server.
11Deliver aggregate reports and raw data to the client.
Scope by step

Step 1 vs Step 2 — data scope and effort

Step 1 answers feasibility quickly. Step 2 unlocks deeper clinical and molecular curation.

Step 1
Reference catalog · No JRA required

Quick feasibility read

  • Patient counts by tumor type and stage
  • Biomarker test results obtained per population
  • Field confirmation for tests and assays
  • Sample sizes by participating hospital
  • Coverage completeness across treatment lines

Limited fields, sufficient to confirm fit before contracting.

Step 2
Joint Research Agreement · Custom collection

Specific needs delivery

  • Demographics — gender, region, race, DOB, smoking, comorbidity, menopausal status
  • Family History
  • Diagnosis — cancer type, dates, stages, histology, grade, metastasis
  • Molecular Pathology — specimen, assay, biomarker, variant, clinical significance
  • Surgery
  • Medications — drug name, dose, cycles, status
  • Radiation
  • Labs
  • Adverse Events (CTCAE)
  • Outcomes — vital status, dates, disease status
  • Performance Status (ECOG)
  • Vitals

Delivered through the hospital, Sumitomo, RWD partner, and pharma client.

Source-deck data detail

Reference catalog and custom collection fields

Step 1 uses limited reference fields. Step 2 adds deeper custom collection after a JRA.

Step 1 data items
Reference Japanese dataset · Patients and hospitals

Reference database for cancer patients, including biomarkers

  • Data on 300,000 cancer patients since 2018 across 25 Japanese hospitals
  • Clinical data for solid tumors treated with chemotherapy during initial treatment
  • Reference dataset continues to grow
  • Biomarker test results such as EGFR and HER2 are available for feasibility screening
  • Additional data can be collected through in-hospital EMR review

Reference fields

  • Age, gender, date of diagnosis, cancer type
  • Clinical stage and TNM classification
  • Date of initial chemotherapy
  • Drug data, laboratory test results, biomarker test results
  • Death date and survival confirmation fields
Step 2 data items
JRA-governed collection · Specific client needs

Custom clinical data collection by research question

  • Demographics: gender, address, race, date of birth, smoking status, comorbidity, menopausal status
  • Family history: family member, cancer type, date of cancer
  • Diagnosis: cancer type, dates, clinical/pathologic stage, histology, grade
  • Metastasis: date of metastatic diagnosis, site of metastasis, recurrence date
  • Molecular pathology: specimen, dates, lab, methodology, biomarker, variant, significance
  • Treatment: surgery, radiation, medication, dose, cycles, administrations, line of therapy, status and reason
  • Evidence endpoints: labs, performance status, adverse events, outcomes, imaging evaluation, vitals, last follow-up, death date

Step 1 standard reference fields

Catalog fields from cancer registry data plus EMR-extracted medications, labs, and biomarkers.

Domain / file Representative fields / examples Source / route
Patient foundationPatient ID, Episode ID, patient name, gender, date of birthMedical record number, duplicate number, identity, gender, date of birth
Cancer typeDate of diagnosis, cancer type classification, position, detailed part, lateralityDate of diagnosis; primary site localization code; primary root site; laterality
Cancer typeDifferentiation, organization, pathological diagnosis namePathology diagnosis morphological code and histological text
StageUICC version, cT/cN/cM/cStage, pT/pN/pM/pStageUICC Ver. 8 and TNM classification fields
Treatment methodsDate of start of treatment, treatment methods, lines of treatmentSurgery, endoscopic treatment, radiotherapy, chemotherapy, endocrine therapy; line is primary only if chemotherapy is included
OutcomesOutcomes, last survival confirmation date, date of deathSurvival status, last confirmed survival date, death date
Direct EMR extractionMedication, laboratory test, biomarker test result dataDaily medication and lab updates; biomarker test result data

Step 2 standard collection files

JRA-stage files collected and curated for the agreed research question.

Domain / file Representative fields / examples Source / route
cow_consent.csvCase management number, consent type, consent status, consent date, representativeConsent and withdrawal records
cow_patient.csvCase management number, gender, date of birth, agePatient demographics
cow_patient_background.csvFamily history, ECOG PS, ECOG PS evaluation date, smoking status, smoking years, cigarettes per day, Brinkman IndexManual curation and EMR review
cow_comorbidity.csv / cow_medical_history.csvDisease name, age at onset, other disease nameComorbidity and medical history review
cow_cancer.csvDate of diagnosis, pathology report date, cancer type classification, OncoTree code and versionCancer diagnosis and pathology records
cow_therapeutic.csvTreatment content ID, start/end dates, treatment method, treatment details, best overall responseTreatment history curation
cow_therapeutic_medicine.csvBrand name, generic name, drug code, YJ code, HOT9 code, NHI drug price standard codePrescription and injection orders
cow_recist.csvOutcome assessment ID, overall response, assessment date, sum of diameters, test typeImaging and response assessment
cow_reaction_data.csvAdverse event ID, observation/onset/resolution/action dates, causality, English and Japanese AE namesAdverse event curation
cow_laboratory.csvTest date, original test item code, test item name, value and unit, including CYFRA, CEA, CA19-9, neutrophils, lymphocytes, AST, ALT, sodium, potassium, creatinine/eGFR, CKLaboratory test records
cow_outcome.csvOutcome, last survival confirmation date, date of death, cause of deathOutcome and survival follow-up
Governance

Japan research data framework

Both datasets use research-grade governance for regulatory use and patient privacy.

Research-Grade Collection

Approved research agreements support regulatory and external-control use.

Data Access Requires a JRA

Patient-level data requires a JRA under Japanese research governance.

Feasibility Derisks Investment

Catalog queries show population availability before custom curation.

De-Identified Patient Data

Raw data is de-identified or pseudonymized, with sharing governed by JRA and IRB review.

Note: Patient-level raw data requires a Joint Research Agreement among the institution, Sumitomo, the RWD partner, and the pharma client. Catalog feasibility does not require a JRA.

Ready to start with Step 1?

Send the indication, biomarkers, and study question. We return patient counts, hospital coverage, and biomarker availability.

Request Feasibility Query