Klaytn ETL and BigQuery Introduction

Klaytn ETL


Klaytn-ETL is the Python scripts for ETL (extract, transform and load) jobs for Klaytn blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, and internal transactions. The script is based on Ethereum ETL, and you can check the repository in Klaytn ETL. Stars and contributions are welcomed!

Klaytn ETL Schema (in BigQuery)


Blocks


Field Type Description
number INTEGER The block number
hash STRING Hash of the block
parent_hash STRING Hash of the parent block
logs_bloom STRING The bloom filter for the logs of the block
transaction_root STRING The root of the transaction trie of the block
state_root STRING The root of the final state trie of the block
receipts_root STRING The root of the receipts trie of the block
size STRING The size of this block in bytes
extra_data STRING The extra data field of this block
gas_used NUMERIC The total used gas by all transactions in this block
timestamp TIMESTAMP The UTC timestamp for when the block was collated
unix_timestamp FLOAT The unix timestamp for when the block was collated
transaction_count INTEGER The number of transactions in the block
block_score NUMERIC Former difficulty. Always 1 in the BFT consensus engine
total_block_score NUMERIC Integer of the total blockScore of the chain until this block
governance_data STRING RLP encoded governance configuration
vote_data STRING RLP encoded governance vote of the proposer
committee ARRAY[STRING] Array of addresses of committee members of this block. The committee is a subset of validators participated in the consensus protocol for this block
proposer STRING The address of the block proposer
reward_address STRING The address of the beneficiary to whom the block rewards were give
base_fee_per_gas NUMERIC The base fee per gas. This value is returned only when EthTxTypeCompatibleBlock is activated for that block number

Transactions


Field Type Description
hash STRING Hash of the transaction
nonce INTEGER The number of transactions made by the sender prior to this one
block_hash STRING Hash of the block
block_number INTEGER Block number corresponding
transaction_index INTEGER Integer of the transactions index position in the block
from_address STRING Address of the sender
to_address STRING Address of the receiver. null when its a contract creation transaction
value NUMERIC Value transferred in Wei
gas NUMERIC Gas provided by the sender
gas_price NUMERIC Gas price provided by the sender in Wei
input STRING The data sent along with the transaction
fee_payer STRING (optional) Address of the fee payer
fee_payer_signatures ARRAY[STRING, STRING, STRING] (optional) An array of fee payer’s signature objects. A signature object contains three fields (V, R, and S). V contains ECDSA recovery id. R contains ECDSA signature r while S contains ECDSA signature s
fee_ratio INTEGER (optional) Fee ratio of the fee payer. If it is 30, 30% of the fee will be paid by the fee payer. 70% will be paid by the sender
sender_tx_hash STRING from deserializer
signatures ARRAY[STRING, STRING, STRING] An array of signature objects. A signature object contains three fields (V, R, and S). V contains ECDSA recovery id. R contains ECDSA signature r while S contains ECDSA signature s
tx_type STRING A string representing the type of the transaction
tx_type_int INTEGER An integer representing the type of the transaction
block_timestamp TIMESTAMP The UTC timestamp for when the block was collated
block_unix_timestamp FLOAT The unix timestamp for when the block was collated
receipt_gas_used NUMERIC The amount of gas used by this specific transaction alone
receipt_contract_address STRING The contract address created, if the transaction was a contract creation, otherwise null
receipt_status INTEGER Either 1 (success) or 0 (failure) (post Byzantium)
dw_load_dt TIMESTAMP The UTC time when the data is inserted
max_priority_fee_per_gas NUMERIC A maximum amount to pay for the transaction to execute
max_fee_per_gas NUMERIC Gas tip cap for dynamic fee transaction in peb
access_list ARRAY[STRING, ARRAY[STRING]] An array of accessList

Logs


Field Type Description
block_number INTEGER Block number corresponding
block_hash STRING Hash of the block
block_timestamp TIMESTAMP The UTC timestamp for when the block was collated
block_unix_timestamp FLOAT The unix timestamp for when the block was collated
transaction_index INTEGER Integer of the transactions index position in the block
transaction_hash STRING Hash of the transactions
transaction_receipt_status INTEGER Either 1 (success) or 0 (failure) (post Byzantium)
log_index INTEGER Integer of the log index position in the block
address STRING Address from which this log originated
data STRING Contains one or more 32 Bytes non-indexed arguments of the log
topics ARRAY[STRING] Indexed log arguments (0 to 4 32-byte hex strings). (In solidity: The first topic is the hash of the signature of the event (e.g. Deposit(address,bytes32,uint256)), except you declared the event with the anonymous specifier.)

Token Transfers


Field Type Description
token_address STRING Token address
from_address STRING Address of the sender
to_address STRING Address of the receiver
value DECIMAL Amount of tokens transferred (ERC20) / id of the token transferred (ERC721). Use safe_cast for casting to NUMERIC or FLOAT64
block_hash STRING Hash of the block
block_number INTEGER Block number corresponding
block_timestamp TIMESTAMP The UTC timestamp for when the block was collated
block_unix_timestamp FLOAT The unix timestamp for when the block was collated
transaction_hash STRING Hash of the transactions
transaction_index INTEGER Integer of the transactions index position in the block
transaction_receipt_status INTEGER Either 1 (success) or 0 (failure) (post Byzantium)
log_index INTEGER Integer of the log index position in the block

Tokens

Field Type Description
address STRING The address of the token
symbol STRING The symbol of the token
name STRING The name of the token
decimals INTEGER The number of decimals the token uses. Use safe_cast for casting to NUMERIC or FLOAT64
total_supply NUMERIC The total token supply. Use safe_cast for casting to NUMERIC or FLOAT64
function_sighashes ARRAY[STRING] 4-byte function signature hashes
is_erc20 BOOLEAN Whether this contract is an ERC20 contract
is_erc721 BOOLEAN Whether this contract is an ERC721 contract
block_number INTEGER Block number corresponding
block_hash STRING Hash of the block
block_timestamp TIMESTAMP The UTC timestamp for when the block was collated
block_unix_timestamp FLOAT The unix timestamp for when the block was collated
transaction_hash STRING Hash of the transactions
transaction_index INTEGER Integer of the transactions index position in the block
transaction_receipt_status INTEGER Either 1 (success) or 0 (failure) (post Byzantium)
trace_index INTEGER Trace Index of the trace
trace_status INTEGER Either 1 (success) or 0 (failure, due to any operation that can cause the call itself or any top-level call to revert)
creator_address STRING Token creator address

Contracts

Field Type Description
address STRING Address of the contract
bytecode STRING Bytecode of the contract
function_sighashes ARRAY[STRING] 4-byte function signature hashes
is_erc20 BOOLEAN Whether this contract is an ERC20 contract
is_erc721 BOOLEAN Whether this contract is an ERC721 contract
block_number INTEGER Block number corresponding
block_hash STRING Hash of the block
block_timestamp TIMESTAMP The UTC timestamp for when the block was collated
block_unix_timestamp FLOAT The unix timestamp for when the block was collated
transaction_hash STRING Hash of the transactions
transaction_index INTEGER Integer of the transactions index position in the block
transaction_receipt_status INTEGER Either 1 (success) or 0 (failure) (post Byzantium)
trace_index INTEGER Trace Index of the trace
trace_status INTEGER Either 1 (success) or 0 (failure, due to any operation that can cause the call itself or any top-level call to revert)
creator_address STRING Token creator address

Traces


Field Type Description
block_number INTEGER Block number corresponding
block_hash STRING Hash of the block
block_timestamp TIMESTAMP The UTC timestamp for when the block was collated
block_unix_timestamp FLOAT The unix timestamp for when the block was collated
transaction_index INTEGER Integer of the transactions index position in the block
transaction_hash STRING Hash of the transactions
transaction_receipt_status INTEGER Either 1 (success) or 0 (failure) (post Byzantium)
from_address STRING Address of the sender, null when trace_type is reward
to_address STRING Address of the receiver if trace_type is call, address of new contract or null if trace_type is create, beneficiary address if trace_type is suicide, miner address if trace_type is reward
value NUMERIC Value transferred in Wei
input STRING The data sent along with the message call
output STRING The output of the message call, bytecode of contract when trace_type is create
trace_type STRING One of call, create, suicide, reward
call_type STRING One of call, callcode, delegatecall, staticcall
gas NUMERIC Gas provided with the message call
gas_used NUMERIC Gas used by the message call
subraces INTEGER Number of subtraces
trace_address ARRAY[STRING] Comma separated list of trace address in call tree
error STRING Error message
status INTEGER Either 1 (success) or 0 (failure, due to any operation that can cause the call itself or any top-level call to revert)
trace_index INTEGER Trace index of the trace

Receipts


Field Type Description
transaction_hash STRING Hash of the transactions
transaction_index INTEGER Integer of the transactions index position in the block
block_hash STRING Hash of the block
block_number INTEGER Block number corresponding
gas NUMERIC Gas provided by the sender
gas_price NUMERIC Gas price provided by the sender in Wei
gas_used NUMERIC The total used gas by all transactions in this block
effective_gas_price NUMERIC The actual value per gas deducted from the senders account
contract_address STRING The contract address created, if the transaction was a contract creation, otherwise null
logs_bloom STRING The bloom filter for the logs
nonce INTEGER The number of transactions made by the sender prior to this one
fee_payer STRING (optional) Address of the fee payer
fee_payer_signatures ARRAY[STRING, STRING, STRING] (optional) An array of fee payer’s signature objects. A signature object contains three fields (V, R, and S). V contains ECDSA recovery id. R contains ECDSA signature r while S contains ECDSA signature s
fee_ratio INTEGER (optional) Fee ratio of the fee payer. If it is 30, 30% of the fee will be paid by the fee payer. 70% will be paid by the sender
code_format STRING (optional) The code format of smart contract code
human_readable BOOLEAN (optional) true if the address is humanReadable, false if the address is not humanReadable
tx_error STRING (optional) detailed error code if status is equal to zero
key STRING (optional) Key of the newly created account
input STRING (optional) The data sent along with the transaction
from STRING Address of the sender
to STRING Address of the receiver. Null when it is a contract creation transaction
type_name STRING A string representing the type of the transaction
type_int INTEGER An integer representing the type of the transaction
sender_tx_hash STRING Hash of a transaction that is signed only by the sender
fee_payer_signatures ARRAY[STRING, STRING, STRING] An array of signature objects. A signature object contains three fields (V, R, and S). V contains ECDSA recovery id. R contains ECDSA signature r while S contains ECDSA signature s
status INTEGER Either 1 (success) or 0 (failure)
value NUMERIC (optional) Integer of the value sent with this transaction
input_json ARRAY (optional) The data sent along with the transaction
access_list ARRAY[STRING, ARRAY[STRING]] An array of accessList
chain_id INTEGER id of the chain
max_priority_fee_per_gas NUMERIC A maximum amount to pay for the transaction to execute
max_fee_per_gas NUMERIC Gas tip cap for dynamic fee transaction in peb

Klaytn ETL in Airflow


Klaytn runs enrich_block_group and enrich_trace_group CLIs in hourly batch using Airflow. The data will be exported in JSON.GZ format in GCS. From GCS, Klaytn create s external table in BigQuery.

Klaytn BigQuery Dataset


Currently 8 tables are available, containing data from 2022-09-14 to now.

  • data-test-361602.klaytn_data_test.cypress_blocks_table
  • data-test-361602.klaytn_data_test.cypress_transactions_table
  • data-test-361602.klaytn_data_test.cypress_traces_table
  • data-test-361602.klaytn_data_test.cypress_tokens_table
  • data-test-361602.klaytn_data_test.cypress_token_transfers_table
  • data-test-361602.klaytn_data_test.cypress_contracts_table
  • data-test-361602.klaytn_data_test.cypress_receipts_table
  • data-test-361602.klaytn_data_test.cypress_logs_table

You can query like following

### Daily Active Address Count
WITH TOTAL_ADDRESS AS (
SELECT 
  CAST(block_timestamp as DATE) as block_date,
  from_address as address
FROM
  data-test-361602.klaytn_data_test.cypress_transactions_table
WHERE 
  from_address IS NOT NULL
UNION ALL
select
    CAST(block_timestamp as DATE) as block_date,
    to_address as address
FROM
  data-test-361602.klaytn_data_test.cypress_transactions_table
WHERE 
  from_address IS NOT NULL)
SELECT 
  block_date, 
  COUNT(address) as active_address_count
FROM 
  TOTAL_ADDRESS
GROUP BY
  block_date
ORDER BY
  block_date

Since dataset is not optimized for now, query may take some time (~30s)

TODO

  • Support streaming
  • Make query faster with optimization
  • Add block date, block hour