HDFS是Hadoop应用程序使用的主要分布式存储。HDFS群集主要由管理文件系统元数据的NameNode和存储实际数据的DataNode组成。客户端与NameNode联系以获取文件元数据或文件修改,并直接与DataNode执行实际的文件I/O。

0x00 hdfs命令行的语法

所有HDFS命令均由bin/hdfs脚本调用。运行不带任何参数的hdfs脚本会打印所有命令的描述。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
where COMMAND is one of:
dfs run a filesystem command on the file systems supported in Hadoop.
classpath prints the classpath
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
journalnode run the DFS journalnode
zkfc run the ZK Failover Controller daemon
datanode run a DFS datanode
dfsadmin run a DFS admin client
haadmin run a DFS HA admin client
fsck run a DFS filesystem checking utility
balancer run a cluster balancing utility
jmxget get JMX exported values from NameNode or DataNode.
mover run a utility to move block replicas across
storage types
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to an legacy fsimage
oev apply the offline edits viewer to an edits file
fetchdt fetch a delegation token from the NameNode
getconf get config values from configuration
groups get the groups which users belong to
snapshotDiff diff two snapshots of a directory or diff the
current directory contents with a snapshot
lsSnapshottableDir list all snapshottable dirs owned by the current user
Use -help to see options
portmap run a portmap service
nfs3 run an NFS version 3 gateway
cacheadmin configure the HDFS cache
crypto configure HDFS encryption zones
storagepolicies list/get/set block storage policies
version print the version

Most commands print help when invoked w/o parameters.

0x01 hdfs dfs

在Hadoop支持的文件系统上运行文件系统命令。目前Hadoop兼容文件系统有:Amazon S3,Azure Blob Storage,OpenStack Swift 。常用操作命令与 hadoop fs 类似,也建议使用 hadoop fs 命令。

File System Shell Guide

0x02 hdfs balancer

HDFS数据不一定总是在整个DataNode上均匀地放置。一个常见的原因是向现有群集中添加了新的DataNode。HDFS为管理员提供了一个工具balancer,可以分析整个DataNode上的块放置和重新平衡数据。

1
hdfs balancer -policy datanode -threshold 20 -include -f /tmp/hdfs-blancer.txt

0x03 hdfs dfsadmin

dfsadmin 命令用于管理HDFS集群,这些命令常用于管理员。

1
2
3
4
5
hdfs dfsadmin -report -live
hdfs dfsadmin -printTopology
hdfs dfsadmin -refreshNodes
hdfs dfsadmin -safemode get
hdfs dfsadmin -setBalancerBandwidth 6250000

参考文献

HDFS Commands Guide
数据仓库的初级手册