ThoughtSpot and Snowflake recently announced tight integration of their offerings. In the recently concluded Beyond 2019 (ThoughtSpot annual conference) in Dallas, ThoughtSpot showed how data stored in Snowflake can be analyzed in ThoughtSpot.

Since we have a deep partnership with both Snowflake and ThoughtSpot, we are obviously pretty excited with this. We have our own point of view on how to integrate the two products to make sure we are extracting the best from both of them. As a starter, this 2 blog series explains how to analyze publicly available clinical trial data using Snowflake and ThoughtSpot. The part1 of the blog focuses on ingesting this data (in XML format) into Snowflake and preparing it for analysis in ThoughtSpot. The part2 of the blog will focus on how to connect Snowflake to ThoughtSpot and how to analyze it.

Dataset used

For this demo, we have used Clinical Trials Data which is publically available on https://clinicaltrials.gov/ct2/search/advanced

Search term used – Heart Attack (no other filters used)

Found 1497 studies

Selected option of Downloading ZIP file for ALL studies

This creates a zip file and downloads it.

clinical data diagram

Data Ingestion

The Python code below explains various steps involved in loading the data into Snowflake and then preparing it for analyzing in ThoughtSpot. In the process, I also explain various constructs using which XML can we queried in Snowflake.

Read Snowflake Config Details and open Snowflake Connection

Just pull the credentials from the config file. This file is similar to typical properties file (key=value pair).

In [1]: from configparser import ConfigParser
config = ConfigParser()
config.read('config.properties')
# get config data
account = config.get('DatabaseSection', 'account')
user = config.get('DatabaseSection', 'user')
password = config.get('DatabaseSection', 'password')
data_path = config.get('DataSection', 'path')
Connect to Snowflake
In [58]: import snowflake.connector
from pandas import DataFrame

ctx = snowflake.connector.connect(
user=user,
password=password,
account=account
)

Some utility functions

In [30]: def execSQL(sql):
    ctx.cursor().execute(sql)
    
def execSQLAndGetDF(sql, max_rows = 10000):
    try:
        cur = ctx.cursor()
        cs = cur.execute(sql)
        rows = cs.fetchmany(max_rows)
    finally:
        cs.close()
    df = DataFrame(rows)
    df.columns = [col[0] for col in cur.description]
    return df

Setup database, schema, virtual warehouse to be used
In [60]: execSQL("USE DATABASE HealthCareSolution")
execSQL("USE SCHEMA ClinicalTrials")
execSQL("USE WAREHOUSE LoadWH")
execSQL("CREATE STAGE IF NOT EXISTS ClinicalTrialXMLStage")
SQL = "CREATE OR REPLACE FILE FORMAT ClinicalTrialXMLFileFormat 
       TYPE = 'XML' 
       COMPRESSION = 'AUTO' PRESERVE_SPACE = FALSE 
       STRIP_OUTER_ELEMENT = FALSE 
       DISABLE_SNOWFLAKE_DATA = FALSE 
       DISABLE_AUTO_CONVERT = FALSE 
       IGNORE_UTF8_ERRORS = TRUE;"
execSQL(SQL)
Push data to stage and copy into Snowflake table
In [12]: #Upload data to Stage
execSQL("PUT file://" + data_path  + 
        "* @ClinicalTrialXMLStage/heart_attack")
#create table
execSQL("CREATE TABLE IF NOT EXISTS ClinicalTrialsXML (X Variant);")
#Copy data from stage to the table
execSQL("COPY INTO ClinicalTrialsXML FROM 
        @ClinicalTrialXMLStage/heart_attack/  
        FILE_FORMAT = ClinicalTrialXMLFileFormat 
        on_error = continue;")
 

Basic verification using record count

In [20]: sql = "SELECT count(*) FROM ClinicalTrialsXML;"

try:
    cs = ctx.cursor().execute(sql)
    rows = cs.fetchone()
    print("Record Count = ", rows[0])
finally:
    cs.close()
 
Record Count =  1498

Query XML Nodes
The section below explains how to use XML constructs to find nodes like condition, minimum age, maximum age and show it

Use XMLGet to reach to a XML node.
In [40]: #Get min/max age. Note that 'condition' is an array and this query only brings in one of the conditions
sql = "SELECT 
GET(XMLGET(XMLGET(X, 'id_info', 0), 'nct_id', 0), '
  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

In [42]: #--To Get ALL conditions, not only first one, we need to use lateral flatten
sql = "SELECT 
GET(XMLGET(XMLGET(X, 'id_info', 0), 'nct_id', 0), '
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as nct_id, GET(XMLGET(X, ‘condition’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as condition_value, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘minimum_age’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as min_age, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘maximum_age’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as max_age FROM ClinicalTrialsXML limit 50;” df = execSQLAndGetDF(sql) df.head(3) Out[40]:

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as nct_id, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘minimum_age’, 0), ‘

  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as nct_id, GET(XMLGET(X, ‘condition’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as condition_value, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘minimum_age’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as min_age, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘maximum_age’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as max_age FROM ClinicalTrialsXML limit 50;” df = execSQLAndGetDF(sql) df.head(3) Out[40]:

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as min_age, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘maximum_age’, 0), ‘

  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as nct_id, GET(XMLGET(X, ‘condition’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as condition_value, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘minimum_age’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as min_age, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘maximum_age’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as max_age FROM ClinicalTrialsXML limit 50;” df = execSQLAndGetDF(sql) df.head(3) Out[40]:

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as max_age, GET(condition_array.value, ‘

  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as nct_id, GET(XMLGET(X, ‘condition’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as condition_value, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘minimum_age’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as min_age, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘maximum_age’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as max_age FROM ClinicalTrialsXML limit 50;” df = execSQLAndGetDF(sql) df.head(3) Out[40]:

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as condition_value FROM ClinicalTrialsXML, lateral flatten (ClinicalTrialsXML.X:$) condition_array where condition_array.value like ‘<condition>%’ limit 500;” df = execSQLAndGetDF(sql) df.head(5) Use of powerful lateral flatten function

  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as nct_id, GET(XMLGET(X, ‘condition’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as condition_value, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘minimum_age’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as min_age, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘maximum_age’, 0), ‘

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”

 

) as max_age FROM ClinicalTrialsXML limit 50;” df = execSQLAndGetDF(sql) df.head(3) Out[40]:

  NCT_ID CONDITION_VALUE MIN_AGE MAX_AGE
0 “NCT00000542” “Diabetes Mellitus” “55 Years” “100 Years”
1 “NCT00001629” “Myocardial Infarction” “N/A” “N/A”
2 “NCT00050765” “Myocardial Infarction” “18 Years” “80 Years”
 

Use of powerful lateral flatten function

 
  NCT_ID MIN_AGE MAX_AGE CONDITION_VALUE
0 “NCT00000542” “55 Years” “100 Years” “Cardiovascular Diseases”
1 “NCT00000542” “55 Years” “100 Years” “Coronary Disease”
2 “NCT00000542” “55 Years” “100 Years” “Diabetes Mellitus”
3 “NCT00000542” “55 Years” “100 Years” “Heart Diseases”
4 “NCT00000542” “55 Years” “100 Years” “Hypercholesterolemia”