大文本数据,导入到数据库
240w的数据,利用plsql的文本导入器,导了一个半小时,且数据出现缺失。
两种方式导入到数据库
一,使用sql load
创建表
create table ext_gv_tmp_amazon_sku
(
skucode varchar2(255 char),
eancode varchar2(255 char)
)
配置控制文件
more input.ctl
load data
infile input.txt
badfile t.bad
discardfile t.dsc
append into table ext_gv_tmp_amazon_sku ----导入的表
fields terminated by "|" ---分割符
trailing nullcols ---允许列有空值
(skucode,eancode) ---导入的字段
导入命令
先设置导入客户端字符集,不然中文会乱码
export NLS_LANG=AMERICAN_AMERICA.UTF8
同时看看系统的字符集是否设置正确
[oracle@rac-test1 pandump]$ cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"
最后导入数据
[oracle@rac-test1 pandump]$ sqlldr panhf/oracle control=input.ctl
SQL> select count(*) from ext_gv_tmp_amazon_sku;
COUNT(*)
----------
2409530
二,使用外部表
利用外部表的方式来导入数据
1,创建oracle目录,并授权给相关账号
CREATE OR REPLACE DIRECTORY pandump
AS '/oradata/pandump';
grant read,write on DIRECTORY pandump to panhf;
2,创建外部表,注意这里的字符集设置,如果不设置中文会无法读取导致数据丢失
create table ext_gv_tmp_amazon_sku
(
skucode varchar2(255 char),
eancode varchar2(255 char)
)
organization external (
type oracle_loader
default directory pandump
access parameters
(
records delimited by newline
logfile pandump:'ext_gv_tmp_amazon_sku.log'
badfile pandump:'ext_gv_tmp_amazon_sku.bad'
discardfile pandump:'ext_gv_tmp_amazon_sku.disc'
characterset 'AL32UTF8'
fields terminated by "|" lrtrim
missing field values are null
(
skucode,
eancode
)
)
location ('ext_ids_1.txt')
)
reject limit unlimited
/
3,将数据cts的方式拷贝到另外的表
select count(*) from ext_gv_tmp_amazon_sku;
COUNT(*)
----------
2409530
create table panhf.ext_gv_tmp_amazon_sku as select * from ext_gv_tmp_amazon_sku;
Table created.
三,数据导出文本
cat exp.sql
set echo off
set feedback off
set newpage none
set verify off
set pagesize 0
set term off
set trims on
set linesize 600
set heading off
set timing off
set numwidth 40
spool /home/oracle/empInfor.txt
select skucode||'|'||eancode from panhf.ext_gv_tmp_amazon_sku;
spool off
执行脚本
@exp.sql