STRUCT(id UBIGINT, login VARCHAR, gravatar_id VARCHAR, url VARCHAR, avatar_url VARCHAR)Īs we can see, the "actor", "repo" and "org" fields, which are JSON objects, have been converted to DuckDB structs. STRUCT(id UBIGINT, name VARCHAR, url VARCHAR) STRUCT(id UBIGINT, login VARCHAR, display_login VARCHAR, gravatar_id VARCHAR, url VARCHAR, avatar_url VARCHAR) Instead, we can create a DuckDB table like so: If we want to analyze the same data multiple times, decompressing and parsing every time is redundant. The least common event type is the GollumEvent, taking up less than 1% of all events, which is a creation or update of a wiki page. The most common event type is the PushEvent, taking up more than half of all events, unsurprisingly, which is people pushing their committed code to GitHub. So as we can see, data analysis is very fast once everything has been decompressed and parsed. This query takes around 7.4s, not much more than the count(*) query. SELECT type, count ( * ) count FROM 'gharchive_gz/*.json.gz' GROUP BY type ORDER BY count DESC type
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |