Skip to content

Implement file chunk-based reading (#45).#68

Open
tdenniston wants to merge 1 commit into
wiseio:masterfrom
tdenniston:chunked-file-reading
Open

Implement file chunk-based reading (#45).#68
tdenniston wants to merge 1 commit into
wiseio:masterfrom
tdenniston:chunked-file-reading

Conversation

@tdenniston

Copy link
Copy Markdown

Hello,

We needed the ability to parse larger-than-memory CSV files, so this is my attempt at implementing that (issue #45). It's used something like this:

ParaText::CSV::ColBasedLoader loader;
ParaText::ParseParams params;
params.num_threads = 4;
params.chunked_file_reading = true;
params.file_chunk_size = 1024 * 1024; // Approximate number of bytes to read from the input file

loader.load(inputfile, params);
do {
  std::vector<float> col0vals;
  auto inserter = std::back_inserter(col0vals);
  loader.copy_column<decltype(inserter), size_t>(0, inserter);
} while (loader.load_next());

I'm grateful for any feedback on this, and I'd be happy to make any changes you guys may want.

This allows for reading through larger-than-memory CSV files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant