This is a Chainer Iterator class that executes prefetching training data from slow storages (such like parallel file systems) into fast storage (such as SSD) and generating mini-batches in the same time.
The aim of this study is to conceal the time for staging-in training dataset into node-local storages in computation nodes at HPC clusters (such as ABCI, TSUBAME, Cygnus, and so on).
This is an ONNX runtime implementation, such like onnxruntime or menoh.
For now, Gemm, Conv, MaxPool, Relu, Softmax, Dropout, Reshape are supported.
All backend implementation is my own implementation for CPU (This means that current backend implementation does not use optimized matrix libraries, such like Blas, Intel MKL-DDN, and so on.).