JDBC batch insert performance
I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true
is the important parameter. useServerPrepStmts
is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true
improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch()
(Source). That means that instead of sending the following n
INSERT statements to the mysql server each time executeBatch()
is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1
) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3);
, you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3);
(It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
- It's a new table, or the amount to be inserted is greater then the already inserted data
- There are indexes on the table
- You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS
can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS
to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);
참고URL : https://stackoverflow.com/questions/2993251/jdbc-batch-insert-performance
'Development Tip' 카테고리의 다른 글
ActionBar의 오버플로 메뉴에 아이콘을 표시하는 방법 (0) | 2020.11.25 |
---|---|
암호 강도 측정기 (0) | 2020.11.25 |
Swift iOS에서 기기 방향 설정 (0) | 2020.11.25 |
nan 값을 0으로 변환 (0) | 2020.11.25 |
주어진 값이 양수인지 음수인지 확인하는 방법은 무엇입니까? (0) | 2020.11.25 |