Projects

This is the reference implementation of my study “Accelerating Machine Learning I/O by Overlapping Data Staging and Mini-batch Generations”.
This is a Chainer Iterator class that executes prefetching training data from slow storages (such like parallel file systems) into fast storage (such as SSD) and generating mini-batches in the same time.
The aim of this study is to conceal the time for staging-in training dataset into node-local storages in computation nodes at HPC clusters (such as ABCI, TSUBAME, Cygnus, and so on).

This is the reference implementation of my paper, 深層ニューラルネットワークにおける訓練高速化のための自動最適化.
This is a Chainer Trainer extension class that optimizes mini-batch size adaptically for training a model with one GPU.

This is an ONNX runtime implementation, such like onnxruntime or menoh.
For now, Gemm, Conv, MaxPool, Relu, Softmax, Dropout, Reshape are supported.
All backend implementation is my own implementation for CPU (This means that current backend implementation does not use optimized matrix libraries, such like Blas, Intel MKL-DDN, and so on.).

This is an experimental project to study optimization for GEMM and GEMV.
This projects includes dgemm and dgemv implementations which are optimized by loop exchange, loop unloop, blcoknize, padding repeatedly.

This is an experimental project to study implementation for Convolution with direct and im2col style repeatedly.

This is a CLI tool to convert json config file into GNU CLI option style (such like --key1 value1 --key2 value2)