分享知识,分享快乐

0%

查找solr服务器web地址

find / -name WEB-INF

[root@bigdata-3 lib]# pwd
/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/solr/server/solr-webapp/webapp/WEB-INF/lib

/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/solr/server/solr-webapp/webapp/WEB-INF/lib

/opt/cloudera/parcels/CDH/lib/solr/server/solr-webapp/webapp/WEB-INF/lib

添加ik jar包到指定位置 并修改权限

1
2
-rwxr-xr-x 1 root root 1184820 May  7 10:29 ik-analyzer-7.5.0.jar
[root@bigdata-3 lib]#

WEB-INF 创建classes 我们把IKAnalyzer.cfg.xml、stopword.dic拷贝到需要使用分词器的core的conf下面,

将resources目录下的5个配置文件放入solr服务的Jetty或Tomcat的webapp/WEB-INF/classes/目录下;

1
2
3
4
5
① IKAnalyzer.cfg.xml
② ext.dic
③ stopword.dic
④ ik.conf
⑤ dynamicdic.txt
阅读全文 »

生成本地配置

1
solrctl instancedir --generate $HOME/test_collection_config

上传到zk

1
solrctl instancedir --create test_collection_config $HOME/test_collection_config

创建collection

1
solrctl collection --create test_collection -s 1 -c test_collection_config

4. post数据

1
2
cd /opt/cloudera/parcels/CDH/share/doc/solr-doc*/example/exampledocs
java -Durl=http://bigdata-3.baofoo.cn:8983/solr/test_collection/update -jar post.jar *.xml

查看 hdfs dir

1
2
3
4
5
6
hadoop fs -ls -R /solr/test_co*/

drwxr-xr-x - solr solr 0 2019-05-08 22:07 /solr/test_collection/core_node2
drwxr-xr-x - solr solr 0 2019-05-08 22:07 /solr/test_collection/core_node2/data
drwxr-xr-x - solr solr 0 2019-05-08 22:12 /solr/test_collection/core_node2/data/index
-rwxr-xr-x 3 solr solr 82 2019-05-08 22:12 /solr/test_collection/core_node2/data/index/_0.dii
阅读全文 »

linux shell 配置文件中默认的字符集编码为UTF-8

accii 文件显示中文乱码

用iconv进行转换就可以了

1
iconv -f GBK -t UTF-8 1.csv > 3.csv

查了下iconv命令用法如下:

iconv [选项…] [文件…]

有如下选项可用:

输入/输出格式规范:
-f, --from-code=名称 原始文本编码
-t, --to-code=名称 输出编码

信息:
-l, --list 列举所有已知的字符集

输出控制:
-c 从输出中忽略无效的字符
-o, --output=FILE 输出文件
-s, --silent 关闭警告
–verbose 打印进度信息

阅读全文 »

JanusGraph Server搭建 hbase+ solr

https://blog.csdn.net/goandozhf/article/details/80105895#2068-1524796329245

创建janusgraph-hbase-solr-server.properties文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.batch-loading=true
storage.hbase.table = janusgraph-test
storage.hostname=172.20.85.111
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
ids.block-size=100000000
storage.buffer-size=102400
storage.hbase.region-count = 15
index.search.backend=solr
index.search.solr.mode=http
index.search.solr.http-urls=http://172.20.85.111:8983/solr
index.search.hostname=172.20.85.111
index.search.index-name=janusgraph-test
阅读全文 »

机器概况

  • 总内存:256G
  • 可分配内存:256*0.75=192G
  • 总硬盘:1.8T*12=21.6T
  • 可用硬盘空间:21.6T*0.85=18.36T

内存规划

Disk / Java Heap Ratio

Disk / Java Heap Ratio=Disk Size / Java Heap = RegionSize / MemstoreSize * ReplicationFactor * HeapFractionForMemstore * 2
一台RegionServer上1bytes的Java内存大小需要搭配多大的硬盘大小最合理。

公式解释:

  • 硬盘容量维度下Region个数: Disk Size / (RegionSize *ReplicationFactor)
  • Java Heap维度下Region个: Java Heap * HeapFractionForMemstore / (MemstoreSize / 2 )
  • 硬盘维度和Java Headp维度理论相等:Disk Size / (RegionSize *ReplicationFactor) = Java Heap * HeapFractionForMemstore / (MemstoreSize / 2 ) => Disk Size / Java Heap = RegionSize / MemstoreSize * ReplicationFactor * HeapFractionForMemstore * 2

默认配置:

  • RegionSize: hbase.hregion.max.filesize=10G
  • MemstoreSize: hbase.hregion.memstore.flush.size=128M
  • ReplicationFactor: dfs.replication=3
  • HeapFractionForMemstore: hbase.regionserver.global.memstore.lowerLimit = 0.4

计算为:10G / 128M * 3 * 0.4 * 2 = 192,即RegionServer上1bytes的Java内存大小需要搭配192bytes的硬盘大小最合理。

阅读全文 »

hbase pe

1
$ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 --nomapred increment 10

hbase ltt

1
2
3
hbase org.apache.hadoop.hbase.util.LoadTestTool -compression NONE -write 8:8 -num_keys 1048576

hbase ltt -compression NONE -write 8:8 -num_keys 1048576

hbase canary

Canary 工具可以帮助用户“测试”HBase 集群状态

测试每个表的每个区域的每个列族

1
hbase canary

对特定表格的每个区域的每个列族进行 Canary 测试

1
hbase canary test-01 test-02
阅读全文 »

1
2
3
4
5
#!/bin/bash
for DB in `cat db_name.txt`
do
hadoop distcp -D mapreduce.job.queuename=bf_yarn_pool.production -D ipc.client.fallback-to-simple-auth-allowed=true -i -overwrite hdfs://192.168.81.30:8020/user/hive/warehouse/$DB.db hdfs://172.20.85.39:8020/user/hive/warehouse/$DB.db
done
1
2
3
4
5
6
7
8
9
10
11
12
13
hadoop distcp \
-Dmapred.jobtracker.maxtasks.per.job=1800000 \ #任务最大map数(数据分成多map任务)
-Dmapred.job.max.map.running=4000 \ #最大map并发
-Ddistcp.bandwidth=150000000 \ #带宽
-Ddfs.replication=2 \ #复制因子,两副本
-Ddistcp.skip.dir=$skipPath \ #过滤的目录(不拷贝的目录)
-Dmapred.map.max.attempts=9 \ #每个task最大尝试次数
-Dmapred.fairscheduler.pool=distcp \ #指定任务运行的pool
-pugp \ #保留属性(用户,组,权限)
-i \ #忽略失败的task
-skipcrccheck \ #忽略CRC校验(防止源,目标集群hdfs版本不一致导致任务失败。)
hdfs://clusterA:9000/AAA/data \ #源地址
hdfs://clusterB:9000/BBB/data #目标地址

hadoop distcp -D mapreduce.job.queuename=xy_yarn_pool.development -D ipc.client.fallback-to-simple-auth-allowed=true -i -overwrite hdfs://192.168.81.30:8020/user/xy_app_spark/tables/fi_gw_express_order_idcard1_encrypt/pk_year=2018/pk_month=2018-10 hdfs://172.20.85.39:8020/user/hive/warehouse/credit_mining.db/fi_gw_express_order_idcard1_encrypt/pk_year=2018/pk_month=2018-10

hadoop distcp -D mapreduce.job.queuename=xy_yarn_pool.development -D ipc.client.fallback-to-simple-auth-allowed=true -i -overwrite hdfs://192.168.81.30:8020/user/xy_app_spark/tables/fi_gw_express_order_idcard1_encrypt/pk_year=2018/pk_month=2018-11 hdfs://172.20.85.39:8020/user/hive/warehouse/credit_mining.db/fi_gw_express_order_idcard1_encrypt/pk_year=2018/pk_month=2018-11

hadoop distcp -D mapreduce.job.queuename=xy_yarn_pool.development -D ipc.client.fallback-to-simple-auth-allowed=true -i -overwrite hdfs://192.168.81.30:8020/user/xy_app_spark/tables/fo_payment_encrypt/pk_year=2018/pk_month=2018-10 hdfs://172.20.85.39:8020/user/hive/warehouse/credit_mining.db/fo_payment_encrypt/pk_year=2018/pk_month=2018-10

hadoop distcp -D mapreduce.job.queuename=xy_yarn_pool.development -D ipc.client.fallback-to-simple-auth-allowed=true -i -overwrite hdfs://192.168.81.30:8020/user/xy_app_spark/tables/fo_payment_encrypt/pk_year=2018/pk_month=2018-11 hdfs://172.20.85.39:8020/user/hive/warehouse/credit_mining.db/fo_payment_encrypt/pk_year=2018/pk_month=2018-11

hadoop distcp -D mapreduce.job.queuename=xy_yarn_pool.development -D ipc.client.fallback-to-simple-auth-allowed=true -i -overwrite hdfs://192.168.81.30:8020//user/hive/warehouse/xy_ods.db/t_serve_business_order_real_time_v2_encrypt hdfs://172.20.85.39:8020/user/hive/warehouse/xy_ods.db/t_serve_business_order_real_time_v2_encrypt

hadoop distcp -D mapreduce.job.queuename=xy_yarn_pool.development -D ipc.client.fallback-to-simple-auth-allowed=true -i hdfs://192.168.81.30:8020/user/hive/warehouse/xy_ods_db.db/credit_logprocessor_rocord hdfs://172.20.85.39:8020/user/hive/warehouse/xy_ods_db.db/credit_logprocessor_rocord

hadoop distcp -D mapreduce.job.queuename=xy_yarn_pool.development -D ipc.client.fallback-to-simple-auth-allowed=true -i -overwrite hdfs://192.168.81.30:8020/user/hive/warehouse/xy_ods_db.db/credit_logprocessor_rocord/pk_day=2018-11-11 hdfs://172.20.85.39:8020/user/hive/warehouse/xy_ods_db.db/credit_logprocessor_rocord/pk_day=2018-11-11

hadoop distcp -D mapreduce.job.queuename=xy_yarn_pool.development -D ipc.client.fallback-to-simple-auth-allowed=true -i -overwrite hdfs://192.168.81.30:8020/user/hive/warehouse/xy_ods.db/ods_verification_cardno_d_incr/pk_year=2017 hdfs://172.20.85.39:8020/user/hive/warehouse/xy_ods.db/ods_verification_cardno_d_incr/pk_year=2017

阅读全文 »

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

[root@cdh85-42 ~]# free -h
total used free shared buff/cache available
Mem: 502G 34G 1.5G 4.2G 467G 461G
Swap: 4.0G 0B 4.0G
[root@cdh85-42 ~]#
[root@cdh85-42 ~]#
[root@cdh85-42 ~]# cat /proc/meminfo
MemTotal: 527318720 kB
MemFree: 1513916 kB
MemAvailable: 483579148 kB
Buffers: 28 kB
Cached: 472560904 kB
SwapCached: 0 kB
Active: 249755248 kB
Inactive: 254111684 kB
Active(anon): 25622464 kB
Inactive(anon): 10038512 kB
Active(file): 224132784 kB
Inactive(file): 244073172 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4194300 kB
SwapFree: 4194300 kB
Dirty: 2216 kB
Writeback: 64 kB
AnonPages: 31309320 kB
Mapped: 113004 kB
Shmem: 4352956 kB
Slab: 17466532 kB
SReclaimable: 15190848 kB
SUnreclaim: 2275684 kB
KernelStack: 38048 kB
PageTables: 574580 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 267853660 kB
Committed_AS: 313602836 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 1170752 kB
VmallocChunk: 34089979900 kB
HardwareCorrupted: 0 kB
AnonHugePages: 6144 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 372288 kB
DirectMap2M: 32319488 kB
DirectMap1G: 505413632 kB
[root@cdh85-42 ~]#

https://fivezh.github.io/2017/06/18/centos-7-memory-available/

https://blog.csdn.net/starshine/article/details/7434942

centos 6 7 linux 初始化脚本

https://blog.51cto.com/12445535/2362407

min_free_kbytes 调整 为 50G

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

cat /proc/sys/vm/min_free_kbytes
cat /etc/sysctl.conf # add vm.min_free_kbytes = 52428800
sysctl -p
cat /proc/sys/vm/min_free_kbytes




[root@cdh85-42 ~]# vim /etc/sysctl.conf
## add
vm.min_free_kbytes = 52428800
## add


[root@cdh85-42 ~]# cat /proc/sys/vm/min_free_kbytes
90112

[root@cdh85-42 ~]# sysctl -p

net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.ip_local_port_range = 10000 65000
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_max_tw_buckets = 5000
fs.file-max = 655350
net.ipv4.route.gc_timeout = 100
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_synack_retries = 1
net.core.netdev_max_backlog = 16384
net.ipv4.tcp_max_orphans = 16384
net.ipv4.tcp_fin_timeout = 2
net.core.somaxconn = 32768
kernel.threads-max = 196605
kernel.pid_max = 196605
vm.max_map_count = 393210
vm.swappiness = 0
vm.min_free_kbytes = 52428800


[root@cdh85-42 ~]#
[root@cdh85-42 ~]#
[root@cdh85-42 ~]# free -m
total used free shared buff/cache available
Mem: 514959 34901 82648 4258 397410 324618
Swap: 4095 0 4095
[root@cdh85-42 ~]# cat /proc/sys/vm/min_free_kbytes
52428800
阅读全文 »

hbase-conf:

  • RS堆栈大小: 32G
  • hbase.bucketcache.size=64 =64 * 1024M: 堆外缓存大小,单位为M
  • dfs.replication=3on=3: hdfs副本数
  • hbase.hregion.max.filesize=20G=20G: Region大小
  • hbase.hregion.memstore.flush.size=256=256M: Memstore刷新大小
  • hbase.regionserver.global.memstore.upperLimit=0.t=0.55: 整个RS中Memstore最大比例

#- hbase.regionserver.global.memstore.lowerLimit=0.t=0.5: 整个RS中Memstore最小比例 默认0.95

  • hbase.bucketcache.ioengine=off=offheap: 使用堆外缓存

#- hbase.bucketcache.percentage.in.combinebinedcache=0.9: 堆外读缓存所占比例,剩余为堆内元数据缓存大小

  • hfile.block.cache.size=0.2=0.2: 校验项,+upperLimit需要小于0.8
  • hbase.master.handler.count=256=256: Master处理客户端请求最大线程数
  • hbase.regionserver.handler.count=256=256: RS处理客户端请求最大线程数
  • hbase.hstore.blockingStoreFiles=100: storefile个数达到该值则block写入
  • hbase.hregion.memstore.block.multiplier=3:r=3: 强制刷新Memstore大小的倍数
  • hbase.client.retries.number: 3 : 3
  • hbase.rpc.timeout: 50: 5000

hbase-jvm:

HBASE_OFFHEAPSIZE=??G
HBASE_OPTS="-XX:MaxDirectMemorySize=??G -Xmx??G -Xms??G -Xmn1g -Xss256k -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=256m -XX:+UseParNewGC -XX:MaxTenuringThreshold=15 -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+CMSClassUnloadingEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSFullGCsBeforeCompaction=0 -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintTenuringDistribution -XX:SurvivorRatio=2 -XX:+UseConcMarkSweepGC -XX:-DisableExplicitGC $HBASE_OPTS"

-XX:+UseG1GC
-XX:InitiatingHeapOccupancyPercent=65
-XX:-ResizePLAB
-XX:MaxGCPauseMillis=90
-XX:+UnlockDiagnosticVMOptions
-XX:+G1SummarizeConcMark
-XX:+ParallelRefProcEnabled
-XX:G1HeapRegionSize=32m
-XX:G1HeapWastePercent=20
-XX:ConcGCThreads=4
-XX:ParallelGCThreads=16
-XX:MaxTenuringThreshold=1
-XX:G1MixedGCCountTarget=64
-XX:+UnlockExperimentalVMOptions
-XX:G1NewSizePercent=2
-XX:G1OldCSetRegionThresholdPercent=5

HDFS:

阅读全文 »

anaconda3-5.2.0

官网
https://www.anaconda.com/

历史版本下载地址

1
2
3
https://repo.continuum.io/archive/
https://repo.anaconda.com/archive/
https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=N&O=D

win10安装

https://blog.51cto.com/acevi/2103437

linux 7 安装 anaconda 3-5.2.0 tensorflow1.11.0

https://blog.51cto.com/moerjinrong/2155178

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
安装
chmod +x Anaconda3-5.2.0-Linux-x86_64.sh
./Anaconda3-5.2.0-Linux-x86_64.sh
安装过程中会需要不断回车来阅读并同意license。安装路径默认为用户目录(可以自己指定),最后需要确认将路径加入用户的.bashrc中。
In order to continue the installation process, please review the license
agreement.
Please, press ENTER to continue
>>> # 要继续安装过程,请查看许可证协议。请按ENTER继续

然后按空格阅读许可协议,

Do you accept the license terms? [yes|no]
[no] >>> yes # 是否接受协议,选yes

Anaconda3 will now be installed into this location:
/root/anaconda3

- Press ENTER to confirm the location
- Press CTRL-C to abort the installation
- Or specify a different location below

[/root/anaconda3] >>> # 是否安装到当前家目录的anaconda3目录中,默认回车即可

Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /root/.bashrc ? [yes|no]
[no] >>> yes # 是否添加环境变量到/root/.bashrc文件
重新加载环境变量,执行:

source ~/.bashrc
python -V
pip list
conda list

silent install

阅读全文 »