New ask Hacker News story: Ask HN: Does anyone need data 'shuffled' in a large DB table?
Ask HN: Does anyone need data 'shuffled' in a large DB table?
2 by didgetmaster | 1 comments on Hacker News.
If you have relational data (stored in a table, a CSV file, or a Json file) there might be a need to mix it all up for testing or anonymity purposes. So you might have a table with 20 columns (e.g. name, address, city, state, country, order_date, etc.) and a few million rows of customer data. Is there value in being able to mix up some or all of the data within a single column, multiple columns, or all the columns? The values would remain intact, but just assigned to different rows in a random manner. Doing this on a conventional row-oriented database could be very costly for time and I/O operations; but I have invented a new kind of data manager that uses Key-Value pairs to assign values to form relational tables. I just realized that I could implement a feature to shuffle the data very quickly (e.g. reorder all data in a million row table in just a few seconds). So a simple table with just 3 rows: name|city|state John|New York|New York Bob|Miami|Florida Jane|Dallas|Texas Might, after a shuffle, look like this instead: name|city|state Jane|New York|Florida Bob|Dallas|New York John|Miami|Texas I don't want to implement this feature if no one sees a reasonable need to do something like this.
2 by didgetmaster | 1 comments on Hacker News.
If you have relational data (stored in a table, a CSV file, or a Json file) there might be a need to mix it all up for testing or anonymity purposes. So you might have a table with 20 columns (e.g. name, address, city, state, country, order_date, etc.) and a few million rows of customer data. Is there value in being able to mix up some or all of the data within a single column, multiple columns, or all the columns? The values would remain intact, but just assigned to different rows in a random manner. Doing this on a conventional row-oriented database could be very costly for time and I/O operations; but I have invented a new kind of data manager that uses Key-Value pairs to assign values to form relational tables. I just realized that I could implement a feature to shuffle the data very quickly (e.g. reorder all data in a million row table in just a few seconds). So a simple table with just 3 rows: name|city|state John|New York|New York Bob|Miami|Florida Jane|Dallas|Texas Might, after a shuffle, look like this instead: name|city|state Jane|New York|Florida Bob|Dallas|New York John|Miami|Texas I don't want to implement this feature if no one sees a reasonable need to do something like this.
Comments
Post a Comment