New ask Hacker News story: GitHub repositories with big number of commits per day

GitHub repositories with big number of commits per day
2 by valyala | 0 comments on Hacker News.
I'm developing VictoriaLogs [1] - an open source zero-config schemaless database for logs. I was searching for some good dataset for tests and benchmarks of wide events [2] (this is a new name for structured logs with hundreds of log fields), and found GitHub archive [3]. It contains GitHub events with hundreds of fields, such as git push, project star, pull request, issue comment, etc [4]. The GitHub archive events are published in hourly `json.gz` archives. For example, https://ift.tt/jta3ESV contains all the events for [15:00, 16:00) time range at January 21, 2025. Every event is published as a JSON line containing various event fields. This format ideally fits VictoriaLogs data model [5], so GitHub events can be ingested into VictoriaLogs with the following command: curl -s https://ift.tt/jta3ESV \ | curl -T - -X POST -H 'Content-Encoding: gzip' 'http://localhost:9428/insert/jsonline?_time_field=created_at&_stream_fields=type' This command streams the `json.gz` event data directly into VictoriaLogs data ingestion endpoint [6], without any intermediate transformations. So, I started using GitHub archive events as test data during VictoriaLogs development. I regularly query this data for some insights. Today I discovered an "interesting" repositories at GitHub, which contain thousands of commits per day, which are generated by a single GitHub user. For example, the https://ift.tt/et90cy5 repository has more than 6 million of commits, and this number increases by 35000 commits per day. Below is the list of GitHub repositories, which got more than 10K commits on a single day - January 21, 2025: pushes=28263 repo_url=https://ift.tt/et90cy5 pushes=24714 repo_url=https://ift.tt/19NfpWb pushes=23598 repo_url=https://ift.tt/OQZ0J3k pushes=17815 repo_url=https://ift.tt/Wz1436k pushes=15854 repo_url=https://ift.tt/QjLg3hZ pushes=13000 repo_url=https://ift.tt/2qJ1Ute pushes=12364 repo_url=https://ift.tt/HrTq8Rf pushes=11670 repo_url=https://ift.tt/wLNZrRp pushes=11221 repo_url=https://ift.tt/RKhjV95 You can investigate GitHub archive data on yourself at VictoriaLogs playground [7]. [1] https://ift.tt/bjYnQVz [2] https://ift.tt/wHYNctQ [3] https://ift.tt/AfWd4LF [4] https://ift.tt/Rf0jUFg [5] https://ift.tt/oU7izSt [6] https://ift.tt/eXZorhs [7] https://ift.tt/NgBVfU8

Comments