Sparse and dense linear algebra for machine learning on parallel-RDBMS using SQL
While computational modelling gets more complex and more accurate, its calculation costs have been increasing alike. However, working on big data environments usually involves several steps of massive unfiltered data transmission. In this paper, we continue our work on the PArADISE framework, which enables privacy aware distributed computation of big data scenarios, and present a study on how linear algebra operations can be calculated over parallel relational database systems using SQL. We investigate the ways to improve the computation performance of algebra operations over relational databases and show how using database techniques impacts the computation performance like the use of indexes, choice of schema, query formulation and others. We study the dense and sparse problems of linear algebra over relational databases and show that especially sparse problems can be efficiently computed using SQL. Furthermore, we present a simple but universal technique to improve intra-operator parallelism for linear algebra operations in order to support the parallel computation of big data.