ThoughtSpot and Snowflake recently announced tight integration of their offerings. In the recently concluded Beyond 2019 (ThoughtSpot annual conference) in Dallas, ThoughtSpot showed how data stored in Snowflake can be analyzed in ThoughtSpot.
Since we have a deep partnership with both Snowflake and ThoughtSpot, we are obviously pretty excited with this. We have our own point of view on how to integrate the two products to make sure we are extracting the best from both of them. As a starter, this 2 blog series explains how to analyze publicly available clinical trial data using Snowflake and ThoughtSpot. The part1 of the blog focuses on ingesting this data (in XML format) into Snowflake and preparing it for analysis in ThoughtSpot. The part2 of the blog will focus on how to connect Snowflake to ThoughtSpot and how to analyze it.
Dataset used
For this demo, we have used Clinical Trials Data which is publically available on https://clinicaltrials.gov/ct2/search/advanced
Search term used – Heart Attack (no other filters used)
Found 1497 studies
Selected option of Downloading ZIP file for ALL studies
This creates a zip file and downloads it.
Data Ingestion
The Python code below explains various steps involved in loading the data into Snowflake and then preparing it for analyzing in ThoughtSpot. In the process, I also explain various constructs using which XML can we queried in Snowflake.
Read Snowflake Config Details and open Snowflake Connection
Just pull the credentials from the config file. This file is similar to typical properties file (key=value pair).
In [1]: from configparser import ConfigParser
config = ConfigParser()
config.read('config.properties')
# get config data
account = config.get('DatabaseSection', 'account')
user = config.get('DatabaseSection', 'user')
password = config.get('DatabaseSection', 'password')
data_path = config.get('DataSection', 'path')
Connect to Snowflake
In [58]: import snowflake.connector
from pandas import DataFrame
ctx = snowflake.connector.connect(
user=user,
password=password,
account=account
)
Some utility functions
In [30]: def execSQL(sql): ctx.cursor().execute(sql) def execSQLAndGetDF(sql, max_rows = 10000): try: cur = ctx.cursor() cs = cur.execute(sql) rows = cs.fetchmany(max_rows) finally: cs.close() df = DataFrame(rows) df.columns = [col[0] for col in cur.description] return df Setup database, schema, virtual warehouse to be used
In [60]: execSQL("USE DATABASE HealthCareSolution") execSQL("USE SCHEMA ClinicalTrials") execSQL("USE WAREHOUSE LoadWH") execSQL("CREATE STAGE IF NOT EXISTS ClinicalTrialXMLStage") SQL = "CREATE OR REPLACE FILE FORMAT ClinicalTrialXMLFileFormat TYPE = 'XML' COMPRESSION = 'AUTO' PRESERVE_SPACE = FALSE STRIP_OUTER_ELEMENT = FALSE DISABLE_SNOWFLAKE_DATA = FALSE DISABLE_AUTO_CONVERT = FALSE IGNORE_UTF8_ERRORS = TRUE;" execSQL(SQL)
Push data to stage and copy into Snowflake table
In [12]: #Upload data to Stage execSQL("PUT file://" + data_path + "* @ClinicalTrialXMLStage/heart_attack") #create table execSQL("CREATE TABLE IF NOT EXISTS ClinicalTrialsXML (X Variant);") #Copy data from stage to the table execSQL("COPY INTO ClinicalTrialsXML FROM @ClinicalTrialXMLStage/heart_attack/ FILE_FORMAT = ClinicalTrialXMLFileFormat on_error = continue;")
Basic verification using record count
In [20]: sql = "SELECT count(*) FROM ClinicalTrialsXML;"
try:
cs = ctx.cursor().execute(sql)
rows = cs.fetchone()
print("Record Count = ", rows[0])
finally:
cs.close()
Use of powerful lateral flatten function
In [42]: #--To Get ALL conditions, not only first one, we need to use lateral flatten
sql = "SELECT
GET(XMLGET(XMLGET(X, 'id_info', 0), 'nct_id', 0), '
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as nct_id, GET(XMLGET(X, ‘condition’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as condition_value, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘minimum_age’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as min_age, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘maximum_age’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as max_age FROM ClinicalTrialsXML limit 50;” df = execSQLAndGetDF(sql) df.head(3) Out[40]:
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as nct_id, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘minimum_age’, 0), ‘
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as nct_id, GET(XMLGET(X, ‘condition’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as condition_value, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘minimum_age’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as min_age, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘maximum_age’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as max_age FROM ClinicalTrialsXML limit 50;” df = execSQLAndGetDF(sql) df.head(3) Out[40]:
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as min_age, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘maximum_age’, 0), ‘
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as nct_id, GET(XMLGET(X, ‘condition’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as condition_value, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘minimum_age’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as min_age, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘maximum_age’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as max_age FROM ClinicalTrialsXML limit 50;” df = execSQLAndGetDF(sql) df.head(3) Out[40]:
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as max_age, GET(condition_array.value, ‘
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as nct_id, GET(XMLGET(X, ‘condition’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as condition_value, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘minimum_age’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as min_age, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘maximum_age’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as max_age FROM ClinicalTrialsXML limit 50;” df = execSQLAndGetDF(sql) df.head(3) Out[40]:
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as condition_value FROM ClinicalTrialsXML, lateral flatten (ClinicalTrialsXML.X:“$“) condition_array where condition_array.value like ‘<condition>%’ limit 500;” df = execSQLAndGetDF(sql) df.head(5) Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as nct_id, GET(XMLGET(X, ‘condition’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as condition_value, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘minimum_age’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as min_age, GET(XMLGET(XMLGET(X, ‘eligibility’, 0), ‘maximum_age’, 0), ‘
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
) as max_age FROM ClinicalTrialsXML limit 50;” df = execSQLAndGetDF(sql) df.head(3) Out[40]:
NCT_ID | CONDITION_VALUE | MIN_AGE | MAX_AGE | |
---|---|---|---|---|
0 | “NCT00000542” | “Diabetes Mellitus” | “55 Years” | “100 Years” |
1 | “NCT00001629” | “Myocardial Infarction” | “N/A” | “N/A” |
2 | “NCT00050765” | “Myocardial Infarction” | “18 Years” | “80 Years” |
Use of powerful lateral flatten function
NCT_ID | MIN_AGE | MAX_AGE | CONDITION_VALUE | |
---|---|---|---|---|
0 | “NCT00000542” | “55 Years” | “100 Years” | “Cardiovascular Diseases” |
1 | “NCT00000542” | “55 Years” | “100 Years” | “Coronary Disease” |
2 | “NCT00000542” | “55 Years” | “100 Years” | “Diabetes Mellitus” |
3 | “NCT00000542” | “55 Years” | “100 Years” | “Heart Diseases” |
4 | “NCT00000542” | “55 Years” | “100 Years” | “Hypercholesterolemia” |
Find more content about
Big Data (18) Data analysis (4) Clinical Trial (2) Clinical Trial Data (2) Snowflake (2) ThoughtSpot (2)