我在一台机器上运行 CDH 5.6(Hadoop 2.6,HBase 1.0.0)。只有 Hadoop 和 HBase 在运行。 Hadoop 配置为复制因子 1,Hbase 运行在 HDFS 之上,没有伪分布式模式。
在过去的三天里,我运行了一个简单的程序,该程序使用 10 个并行线程将行插入到 HBase。
现在检查它,我发现 HDFS 已经损坏,除了一个插入线程外,其他所有线程都失败了。
运行
hdfs fsck / | grep CORRUPT
我看到有一些损坏的 block 。
hbase hbck
如果没问题,什么都说。
重新启动时,hdfs fsck 突然再次显示其 HEALTHY。
开始插入在区域服务器日志中再次出现校验和错误(如下所示)。
我终于跑了
hdfs fsck / -delete
只有在重新启动一切之后,插入才能再次工作。
下面是一些细节。问题是:
HDFS 怎么可能会损坏,即使在单个节点上什么也不做,只是试图将数据插入 Hbase? 和
我以后如何防止这种情况发生?
如果 hbase hbck 说一切正常,是否意味着没有数据丢失?
详情:
我检查了区域服务器、数据节点、名称节点、hmaster、zookeeper 的日志。没有内存不足或类似的事情。所有进程一直up,一直响应到最后。没有磁盘空间短缺。
应用程序日志在 05/16 08:31:32 左右显示以下故障:
java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
java.net.SocketTimeoutException: callTimeout=60000, callDuration=64720: row 'a899ca26-45a4-4af6-af34-72c035b4b7da/f6f88f36-9263-4dbb-9588-aaf126be9117/9223370573623181733' on table 'tt_items' at region=tt_items,a899ca26-45a4-4af6-af34-72c035b4b7da/b50993c1-7ff4-4169-b58f-f53878697709/9223370573954024736,1462900815686.08255086d13380bd559a87dd93cc15ba., hostname=hb-desktop,16201,1463231294049, seqNum=51380656
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
java.net.SocketTimeoutException: callTimeout=60000, callDuration=64720: row 'a899ca26-45a4-4af6-af34-72c035b4b7da/f6f88f36-9263-4dbb-9588-aaf126be9117/9223370573623181733' on table 'tt_items' at region=tt_items,a899ca26-45a4-4af6-af34-72c035b4b7da/b50993c1-7ff4-4169-b58f-f53878697709/9223370573954024736,1462900815686.08255086d13380bd559a87dd93cc15ba., hostname=hb-desktop,16201,1463231294049, seqNum=51380656
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=64720: row 'a899ca26-45a4-4af6-af34-72c035b4b7da/f6f88f36-9263-4dbb-9588-aaf126be9117/9223370573623181733' on table 'tt_items' at region=tt_items,a899ca26-45a4-4af6-af34-72c035b4b7da/b50993c1-7ff4-4169-b58f-f53878697709/9223370573954024736,1462900815686.08255086d13380bd559a87dd93cc15ba., hostname=hb-desktop,16201,1463231294049, seqNum=51380656
Caused by: java.io.IOException: java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader reader=hdfs://localhost:9000/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/9bad8381ba89430daf008c28befedec3, compression=snappy, cacheConf=blockCache=LruBlockCache{blockCount=7248, currentSize=543459280, freeSize=15851184, maxSize=559310464, heapSize=543459280, minSize=531344928, minFactor=0.95, multiSize=265672464, multiFactor=0.5, singleSize=132836232, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, firstKey=a899ca26-45a4-4af6-af34-72c035b4b7da/b5099cdb-2148-4f95-866a-f6eba2813bab/9223370573623181721/d:co/1463268017037/Put, lastKey=a899ca26-45a4-4af6-af34-72c035b4b7da/fffffc52-8969-472a-a17b-65a4edd6fa66/9223370573623181744/d:r.tt.EPC/1463255430913/Put, avgKeyLen=118, avgValueLen=29, entries=20184807, length=465347831, cur=null] to key a899ca26-45a4-4af6-af34-72c035b4b7da/f6f88f36-9263-4dbb-9588-aaf126be9117/9223370573623181733/d:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/seqid=0
Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/9bad8381ba89430daf008c28befedec3 at 140550656 exp: -935597690 got: -1004657115
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader reader=hdfs://localhost:9000/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/9bad8381ba89430daf008c28befedec3, compression=snappy, cacheConf=blockCache=LruBlockCache{blockCount=7248, currentSize=543459280, freeSize=15851184, maxSize=559310464, heapSize=543459280, minSize=531344928, minFactor=0.95, multiSize=265672464, multiFactor=0.5, singleSize=132836232, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, firstKey=a899ca26-45a4-4af6-af34-72c035b4b7da/b5099cdb-2148-4f95-866a-f6eba2813bab/9223370573623181721/d:co/1463268017037/Put, lastKey=a899ca26-45a4-4af6-af34-72c035b4b7da/fffffc52-8969-472a-a17b-65a4edd6fa66/9223370573623181744/d:r.tt.EPC/1463255430913/Put, avgKeyLen=118, avgValueLen=29, entries=20184807, length=465347831, cur=null] to key a899ca26-45a4-4af6-af34-72c035b4b7da/f6f88f36-9263-4dbb-9588-aaf126be9117/9223370573623181733/d:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/seqid=0
大约在同一时间区域服务器说:
2016-05-16 08:30:37,390 WARN [B.defaultRpcServer.handler=8,queue=2,port=16201] hfile.HFile: HBase checksum verification failed for file hdfs://localhost:9000/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/9bad8381ba89430daf008c28befedec3 at offset 408984102 filesize 465347831. Retrying read with HDFS checksums turned on...
2016-05-16 08:30:37,433 WARN [B.defaultRpcServer.handler=8,queue=2,port=16201] hdfs.DFSClient: Found Checksum error for BP-130837870-192.168.178.29-1462900512452:blk_1073746899_6086 from DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK] at 140550656
2016-05-16 08:30:37,434 INFO [B.defaultRpcServer.handler=8,queue=2,port=16201] hdfs.DFSClient: Could not obtain BP-130837870-192.168.178.29-1462900512452:blk_1073746899_6086 from any node: java.io.IOException: No live nodes contain block BP-130837870-192.168.178.29-1462900512452:blk_1073746899_6086 after checking nodes = [DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK] Dead nodes: DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]. Will get new block locations from namenode and retry...
2016-05-16 08:30:37,434 WARN [B.defaultRpcServer.handler=8,queue=2,port=16201] hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 2146.2290363411184 msec.
.. some repetition...
2016-05-16 08:32:40,975 WARN [B.defaultRpcServer.handler=26,queue=2,port=16201] hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 2767.123497826818 msec.
2016-05-16 08:32:42,496 ERROR [regionserver/hb-desktop/192.168.178.29:16201-shortCompactions-1463232518856] regionserver.CompactSplitThread: Compaction failed Request = regionName=tt_items,a899ca26-45a4-4af6-af34-72c035b4b7da/b50993c1-7ff4-4169-b58f-f53878697709/9223370573954024736,1462900815686.08255086d13380bd559a87dd93cc15ba., storeName=d, fileCount=8, fileSize=1.0 G (443.8 M, 239.2 M, 124.8 M, 101.9 M, 78.5 M, 33.0 M, 32.9 M, 10.9 M), priority=2, time=152707213570012
java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for reader reader=hdfs://localhost:9000/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/9bad8381ba89430daf008c28befedec3, compression=snappy, cacheConf=blockCache=LruBlockCache{blockCount=7130, currentSize=535445256, freeSize=23865208, maxSize=559310464, heapSize=535445256, minSize=531344928, minFactor=0.95, multiSize=265672464, multiFactor=0.5, singleSize=132836232, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, firstKey=a899ca26-45a4-4af6-af34-72c035b4b7da/b5099cdb-2148-4f95-866a-f6eba2813bab/9223370573623181721/d:co/1463268017037/Put, lastKey=a899ca26-45a4-4af6-af34-72c035b4b7da/fffffc52-8969-472a-a17b-65a4edd6fa66/9223370573623181744/d:r.tt.EPC/1463255430913/Put, avgKeyLen=118, avgValueLen=29, entries=20184807, length=465347831, cur=a899ca26-45a4-4af6-af34-72c035b4b7da/f6f881d7-076a-4aec-82f0-17e33d8f4f84/9223370573623181728/d:pi/1463267434824/Put/vlen=73/seqid=0]
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:146)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:596)
at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:273)
at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:105)
at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1233)
at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1770)
at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:520)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/9bad8381ba89430daf008c28befedec3 at 140550656 exp: -935597690 got: -1004657115
at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:237)
at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:156)
at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:744)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:800)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:860)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:903)
数据节点说:
2016-05-16 08:26:01,750 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:46618, dest: /127.0.0.1:50010, bytes: 114196960, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747594_6781, duration: 14642101541
2016-05-16 08:26:01,750 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747594_6781, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2016-05-16 08:26:17,445 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow flushOrSync took 304ms (threshold=300ms), isSync:false, flushTotalNanos=304174121ns
2016-05-16 08:26:22,832 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747595_6782 src: /127.0.0.1:47502 dest: /127.0.0.1:50010
2016-05-16 08:26:23,385 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:432ms (threshold=300ms)
2016-05-16 08:26:24,684 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:47502, dest: /127.0.0.1:50010, bytes: 11485542, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747595_6782, duration: 1851995254
2016-05-16 08:26:24,685 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747595_6782, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2016-05-16 08:26:45,373 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:405ms (threshold=300ms)
2016-05-16 08:27:16,631 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747596_6783 src: /127.0.0.1:49517 dest: /127.0.0.1:50010
2016-05-16 08:27:17,729 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:49517, dest: /127.0.0.1:50010, bytes: 3455293, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747596_6783, duration: 1096622112
2016-05-16 08:27:17,729 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747596_6783, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
...some Slow BlockReceiver messages...
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/home/hb/seritrack-mts/nosql/data/data, DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017): no suitable block pools found to scan. Waiting 1334756421 ms.
2016-05-16 08:30:07,254 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:375ms (threshold=300ms)
2016-05-16 08:31:32,604 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Scheduling blk_1073747556_6743 file /home/hb/seritrack-mts/nosql/data/data/current/BP-130837870-192.168.178.29-1462900512452/current/finalized/subdir0/subdir22/blk_1073747556 for deletion
... some repetition...
2016-05-16 08:31:39,276 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747597_6784 src: /127.0.0.1:59092 dest: /127.0.0.1:50010
2016-05-16 08:31:40,380 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:59092, dest: /127.0.0.1:50010, bytes: 11480560, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747597_6784, duration: 1093600168
2016-05-16 08:31:40,380 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747597_6784, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2016-05-16 08:31:41,563 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747598_6785 src: /127.0.0.1:59147 dest: /127.0.0.1:50010
2016-05-16 08:31:44,761 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:408ms (threshold=300ms)
2016-05-16 08:31:48,238 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:411ms (threshold=300ms)
2016-05-16 08:31:51,398 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/home/hb/seritrack-mts/nosql/data/data, DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017): no suitable block pools found to scan. Waiting 1334619581 ms.
2016-05-16 08:31:51,529 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:389ms (threshold=300ms)
...some more of those ...
2016-05-16 08:31:57,547 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:59147, dest: /127.0.0.1:50010, bytes: 268435456, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747598_6785, duration: 15718340373
2016-05-16 08:31:57,547 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747598_6785, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2016-05-16 08:31:57,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747599_6786 src: /127.0.0.1:59440 dest: /127.0.0.1:50010
2016-05-16 08:31:58,657 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:343ms (threshold=300ms)
...some more of those ...
2016-05-16 08:32:13,559 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:59440, dest: /127.0.0.1:50010, bytes: 268435456, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747599_6786, duration: 15809336101
2016-05-16 08:32:13,559 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747599_6786, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2016-05-16 08:32:14,071 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747600_6787 src: /127.0.0.1:59678 dest: /127.0.0.1:50010
2016-05-16 08:32:16,251 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:493ms (threshold=300ms)
...some more of those ...
2016-05-16 08:32:29,087 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:59678, dest: /127.0.0.1:50010, bytes: 268435456, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747600_6787, duration: 15015849046
2016-05-16 08:32:29,087 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747600_6787, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2016-05-16 08:32:29,171 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-130837870-192.168.178.29-1462900512452:blk_1073747601_6788 src: /127.0.0.1:59907 dest: /127.0.0.1:50010
2016-05-16 08:32:30,420 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write data to disk cost:322ms (threshold=300ms)
...some more of those ...
2016-05-16 08:32:42,447 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:59907, dest: /127.0.0.1:50010, bytes: 205882206, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1749190019_1, offset: 0, srvID: 7bfbedce-3f6d-4f95-8eb4-d96438fedc36, blockid: BP-130837870-192.168.178.29-1462900512452:blk_1073747601_6788, duration: 13275544186
2016-05-16 08:32:42,447 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-130837870-192.168.178.29-1462900512452:blk_1073747601_6788, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2016-05-16 08:33:46,104 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Likely the client has stopped reading, disconnecting it (hb-desktop:50010:DataXceiver error processing READ_BLOCK operation src: /127.0.0.1:46586 dst: /127.0.0.1:50010); java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:50010 remote=/127.0.0.1:46586]
Namenode 说:
2016-05-16 08:30:39,662 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
2016-05-16 08:30:39,662 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1 because client machine reported it
2016-05-16 08:30:41,844 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
2016-05-16 08:30:41,844 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1 because client machine reported it
2016-05-16 08:30:43,495 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
2016-05-16 08:30:43,495 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1 because client machine reported it
2016-05-16 08:30:47,435 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
2016-05-16 08:30:47,436 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1 because client machine reported it
2016-05-16 08:30:51,842 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
2016-05-16 08:30:51,842 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1 because client machine reported it
2016-05-16 08:30:57,476 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
2016-05-16 08:30:57,476 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1 because client machine reported it
2016-05-16 08:31:06,539 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2016-05-16 08:31:06,539 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2016-05-16 08:31:08,643 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
2016-05-16 08:31:08,643 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1 because client machine reported it
2016-05-16 08:31:21,129 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
2016-05-16 08:31:21,130 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1 because client machine reported it
2016-05-16 08:31:28,993 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073747591_6778 127.0.0.1:50010
2016-05-16 08:31:28,993 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 60 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 36 SyncTimes(ms): 11435
2016-05-16 08:31:29,186 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073747556_6743 127.0.0.1:50010
2016-05-16 08:31:29,321 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073747574_6761 127.0.0.1:50010
2016-05-16 08:31:29,674 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073747593_6780 127.0.0.1:50010
2016-05-16 08:31:29,713 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073747558_6745 127.0.0.1:50010
2016-05-16 08:31:29,796 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073747575_6762 127.0.0.1:50010
2016-05-16 08:31:30,237 INFO BlockStateChange: BLOCK* BlockManager: ask 127.0.0.1:50010 to delete [blk_1073747556_6743, blk_1073747558_6745, blk_1073747574_6761, blk_1073747575_6762, blk_1073747591_6778, blk_1073747593_6780]
2016-05-16 08:31:32,007 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
2016-05-16 08:31:32,007 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1 because client machine reported it
2016-05-16 08:31:36,540 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2016-05-16 08:31:36,540 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
2016-05-16 08:31:38,849 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/9652e091531943848ee523a60bc5baa5. BP-130837870-192.168.178.29-1462900512452 blk_1073747597_6784{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]}
2016-05-16 08:31:40,381 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_1073747597_6784{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]} size 0
2016-05-16 08:31:40,745 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/9652e091531943848ee523a60bc5baa5 is closed by DFSClient_NONMAPREDUCE_-1749190019_1
2016-05-16 08:31:41,294 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/fb79c8ab90a5498089841431191c03ca. BP-130837870-192.168.178.29-1462900512452 blk_1073747598_6785{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]}
2016-05-16 08:31:57,547 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_1073747598_6785{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]} size 0
2016-05-16 08:31:57,551 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/fb79c8ab90a5498089841431191c03ca. BP-130837870-192.168.178.29-1462900512452 blk_1073747599_6786{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]}
2016-05-16 08:32:06,539 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2016-05-16 08:32:06,539 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2016-05-16 08:32:13,559 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_1073747599_6786{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]} size 0
2016-05-16 08:32:13,875 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/fb79c8ab90a5498089841431191c03ca. BP-130837870-192.168.178.29-1462900512452 blk_1073747600_6787{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]}
2016-05-16 08:32:29,087 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_1073747600_6787{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]} size 0
2016-05-16 08:32:29,088 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/fb79c8ab90a5498089841431191c03ca. BP-130837870-192.168.178.29-1462900512452 blk_1073747601_6788{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]}
2016-05-16 08:32:29,088 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 91 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 57 SyncTimes(ms): 14475
2016-05-16 08:32:36,539 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2016-05-16 08:32:36,539 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2016-05-16 08:32:42,043 INFO org.apache.hadoop.hdfs.StateChange: *DIR* reportBadBlocks
2016-05-16 08:32:42,044 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_1073746899 to add as corrupt on 127.0.0.1:50010 by /127.0.0.1 because client machine reported it
2016-05-16 08:32:42,447 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_1073747601_6788{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017:NORMAL:127.0.0.1:50010|RBW]]} size 0
2016-05-16 08:32:42,495 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/.tmp/fb79c8ab90a5498089841431191c03ca is closed by DFSClient_NONMAPREDUCE_-1749190019_1
最佳答案
由于没有进一步的回应,因为似乎 HDFS-8809 确实是问题的根源 - 这与其说是一个真正的问题,不如说是一个困惑,我建议将其作为答案。
关于hadoop - 担心 : Corrupt HDFS on single node - how to resolve,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37251422/
1.1.1 YARN的介绍 为克服Hadoop1.0中HDFS和MapReduce存在的各种问题⽽提出的,针对Hadoop1.0中的MapReduce在扩展性和多框架⽀持⽅⾯的不⾜,提出了全新的资源管理框架YARN. ApacheYARN(YetanotherResourceNegotiator的缩写)是Hadoop集群的资源管理系统,负责为计算程序提供服务器计算资源,相当于⼀个分布式的操作系统平台,⽽MapReduce等计算程序则相当于运⾏于操作系统之上的应⽤程序。 YARN被引⼊Hadoop2,最初是为了改善MapReduce的实现,但是因为具有⾜够的通⽤性,同样可以⽀持其他的分布式计算模
目录:一、简介二、HQL的执行流程三、索引四、索引案例五、Hive常用DDL操作六、Hive常用DML操作七、查询结果插入到表八、更新和删除操作九、查询结果写出到文件系统十、HiveCLI和Beeline命令行的基本使用十一、Hive配置一、简介Hive是一个构建在Hadoop之上的数据仓库,它可以将结构化的数据文件映射成表,并提供类SQL查询功能,用于查询的SQL语句会被转化为MapReduce作业,然后提交到Hadoop上运行。特点:简单、容易上手(提供了类似sql的查询语言hql),使得精通sql但是不了解Java编程的人也能很好地进行大数据分析;灵活性高,可以自定义用户函数(UDF)和
云计算实验中要求我们在Linux系统安装Hadoop,故来做一个简单的记录。· 注:我的操作系统环境是Ubuntu-20.04.3,安装的JDK版本为jdk1.8.0_301,安装的Hadoop版本为hadoop2.7.1。(不确定其他版本是否会出现版本兼容问题)Hadoop安装步骤如下: 一、更新apt和安装vim编辑器 二、配置本机无密码登录SSH 三、安装JAVA环境 四、下载安装Hadoop 五、伪分布式搭建一、更新apt和安装vim编辑器1、更新aptsudoapt-getupdate2、安装vim
互联网这头“猪”真的掉下来了流量红利已经一去不复返了!3年前业界其实已经发出各种密集信号,在当时无论是BAT还是一些经济学家在3年前都已经预测过,互联网的流量模式已经衰竭,并且它将一去不复返。曾经处于互联网大潮的我们这一代人有喜有有悲也有感慨。还在4-5年前不少程序员会发觉在一个地方工作一年再跳一家公司,工资翻倍是至少的。其实这不是能力的表现这只不过是因此我们赶上了互联网流利红利、风投资本红利的“风口”而己。“赶上风口就连老母猪都能上树"用于形容当时的情形一点不为过。可是这个“风”这次是真的过去了,因此这头“猪”掉了下来,而且这次摔了还挺狠,直接给摔成了肉饼。业务模式、生态、环境的变革是时代的
一、设置免密登录1、系统偏好设置-----共享----勾选远程登录,所有用户2、打开终端,输入命令ssh-keygen-trsa,一直回车即可2.查看生成的公钥和私钥 cd~/.ssh ls会看到~/.ssh目录下有两个文件:①私钥:id_rsa②公钥:id_rsa.pub3.将公钥内容写入到~/.ssh/authorized_keys中 cat~/.ssh/id_rsa.pub>>~/.ssh/authorized_keys4.测试在terminal终端输入 sshlocalhost如果出现以下询问输入yes,不需要输入密码就能登录,说明配置成功Areyousureyouw
目录一种简单上手的暴力论文分析方法——以区块链为例【含项目源码】太长不看版本:最终成果:情况说明论文推荐方面论文投稿方面以下是具体的实现,有其他研究方向想自行确定的请仔细阅读,授人以鱼不如授人以渔第一章、确定对象——研究热点的中国计算机研究生第二章、思路——基于爬虫结合关键字过滤暴力获取所需论文信息第一步:从CCF推荐目录中获取网址01、背景介绍02、数据预处理03、数据写入表格第二步:从中科院分区中获取期刊对应分区第三步:从期刊/会议对应网址中爬取到子网页并进入,获取到其中的标题、年份等信息第四步:针对获取到的表格数据进行分析和整理实际爬取数据量【其实就论文的标题+对应年份】
我的意思是,我曾经有这个小部件:检查jquery是否在托管网站上;否则它会加载它并执行一些ajax/jsonp请求。正如这个tutorial所建议的.好吧,我注意到CSSparents的大麻烦,集成其他jquery插件(比如cycle)并管理这个“接口(interface)”和ajax调用之间的数据。当我可以使用sngleiframe时为什么要这样做?我还注意到,如果我从widget.js加载jquery,并且(当然)我将它放入iframe(这样我就可以管理单独的函数),则库将从缓存中获取。所以没有任何过载。这个方法对你来说够好吗?或者我遗漏了什么?我还注意到90%的小部件(如FB、t
博学之,审问之,慎思之,明辨之,笃行之🏂hiveonspark搭建好后,任务提交会有问题,因为通过hive会话提交的任务一直存在且不会结束(除非关掉这个hive会话),根本原因是这些任务提交到了Yarn的同一个队列中,前面的任务没有执行完毕后面的任务不会执行,所以解决办法是增加一个Yarn队列,指定任务提交的队列,这样就不会出现任务的阻塞。目录一、情景复现二、原因三、Yarn队列配置—增加队列1.情景复现:搭建好hiveonspark后,在命令行直接进入hive会话,提交任务后,在ResourceManager上jps查看进程可以看到有个进程ApplicationMaster一直存在,打开Re
目录SparkStreaming的核心是DStream一、DStream简介二.DStream编程模型三.DStream转换操作SparkStreaming的核心是DStream一、DStream简介1.Spark Streaming提供了一个高级抽象的流,即DStream(离散流)。2.DStream的内部结构是由一系列连续的RDD组成,每个RDD都是一小段由时间分隔开来的数据集。二.DStream编程模型三.DStream转换操作transform()1.在3个节点启动zookeeper集群服务$zkServer.shstart2.启动kafka(3个节点都要)$/opt/module/k
目录基本语法一、上传二、下载三、其他增删改查操作3.1增3.2删3.3改3.4查基本语法hadoopfs和 hdfsdfs(hadoopfs和hdfsdfs命令等效。)-hdfs dfs只能操作HDFS文件系统-hadoopfs可操作任意文件系统,不仅仅是hdfs文件系统,使用范围更广[root@hadoop102hadoop-3.1.3]$bin/hadoopfs[-appendToFile...][-cat[-ignoreCrc]...][-chgrp[-R]GROUPPATH...][-chmod[-R]PATH...][-chown[-R][OWNER][:[GROUP]]PATH..