我是 Cloudera 和 Hadoop 的新手,Cloudera WordCount 1.0 示例 (part-00000) 的输出为空。我使用的步骤和文件是 here .我想提供任何有帮助的工作日志信息,同上版本 - 我只需要一些关于在哪里可以找到它们的指导。以下是作业输出和来源。在写入的其他部分(part-00001 到 part-00011)中,非空部分是 part-00001(再见 1)、part-00002(Hadoop 2)、part-00004(再见 1)、part-00005(世界2), 和 part-00009 (你好 2)。任何帮助都会很棒。
命令和输出如下:
[me@server ~]$ hadoop fs -cat /user/me/wordcount/input/file0
Hello World Bye World
[me@server ~]$ hadoop fs -cat /user/me/wordcount/input/file1
Hello Hadoop Goodbye Hadoop
[me@server ~]$ hadoop jar wordcount.jar org.myorg.WordCount /user/me/wordcount/input /user/me/wordcount/output
13/11/12 10:39:41 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/11/12 10:39:41 INFO mapred.FileInputFormat: Total input paths to process : 2
13/11/12 10:39:42 INFO mapred.JobClient: Running job: job_201311051201_0014
13/11/12 10:39:43 INFO mapred.JobClient: map 0% reduce 0%
13/11/12 10:39:49 INFO mapred.JobClient: map 33% reduce 0%
13/11/12 10:39:52 INFO mapred.JobClient: map 67% reduce 0%
13/11/12 10:39:53 INFO mapred.JobClient: map 100% reduce 0%
13/11/12 10:39:58 INFO mapred.JobClient: map 100% reduce 25%
13/11/12 10:40:01 INFO mapred.JobClient: map 100% reduce 100%
13/11/12 10:40:04 INFO mapred.JobClient: Job complete: job_201311051201_0014
13/11/12 10:40:04 INFO mapred.JobClient: Counters: 33
13/11/12 10:40:04 INFO mapred.JobClient: File System Counters
13/11/12 10:40:04 INFO mapred.JobClient: FILE: Number of bytes read=313
13/11/12 10:40:04 INFO mapred.JobClient: FILE: Number of bytes written=2695420
13/11/12 10:40:04 INFO mapred.JobClient: FILE: Number of read operations=0
13/11/12 10:40:04 INFO mapred.JobClient: FILE: Number of large read operations=0
13/11/12 10:40:04 INFO mapred.JobClient: FILE: Number of write operations=0
13/11/12 10:40:04 INFO mapred.JobClient: HDFS: Number of bytes read=410
13/11/12 10:40:04 INFO mapred.JobClient: HDFS: Number of bytes written=41
13/11/12 10:40:04 INFO mapred.JobClient: HDFS: Number of read operations=18
13/11/12 10:40:04 INFO mapred.JobClient: HDFS: Number of large read operations=0
13/11/12 10:40:04 INFO mapred.JobClient: HDFS: Number of write operations=24
13/11/12 10:40:04 INFO mapred.JobClient: Job Counters
13/11/12 10:40:04 INFO mapred.JobClient: Launched map tasks=3
13/11/12 10:40:04 INFO mapred.JobClient: Launched reduce tasks=12
13/11/12 10:40:04 INFO mapred.JobClient: Data-local map tasks=3
13/11/12 10:40:04 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=16392
13/11/12 10:40:04 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=61486
13/11/12 10:40:04 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/11/12 10:40:04 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/11/12 10:40:04 INFO mapred.JobClient: Map-Reduce Framework
13/11/12 10:40:04 INFO mapred.JobClient: Map input records=2
13/11/12 10:40:04 INFO mapred.JobClient: Map output records=8
13/11/12 10:40:04 INFO mapred.JobClient: Map output bytes=82
13/11/12 10:40:04 INFO mapred.JobClient: Input split bytes=357
13/11/12 10:40:04 INFO mapred.JobClient: Combine input records=8
13/11/12 10:40:04 INFO mapred.JobClient: Combine output records=6
13/11/12 10:40:04 INFO mapred.JobClient: Reduce input groups=5
13/11/12 10:40:04 INFO mapred.JobClient: Reduce shuffle bytes=649
13/11/12 10:40:04 INFO mapred.JobClient: Reduce input records=6
13/11/12 10:40:04 INFO mapred.JobClient: Reduce output records=5
13/11/12 10:40:04 INFO mapred.JobClient: Spilled Records=12
13/11/12 10:40:04 INFO mapred.JobClient: CPU time spent (ms)=15650
13/11/12 10:40:04 INFO mapred.JobClient: Physical memory (bytes) snapshot=3594293248
13/11/12 10:40:04 INFO mapred.JobClient: Virtual memory (bytes) snapshot=18375352320
13/11/12 10:40:04 INFO mapred.JobClient: Total committed heap usage (bytes)=6497697792
13/11/12 10:40:04 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
13/11/12 10:40:04 INFO mapred.JobClient: BYTES_READ=50
[me@server ~]$ hadoop fs -cat /user/me/wordcount/output/part-00000
[me@server ~]$ hdfs dfs -ls -R /user/me/wordcount/output
-rw-r--r-- 3 me me 0 2013-11-12 10:40 /user/me/wordcount/output/_SUCCESS
drwxr-xr-x - me me 0 2013-11-12 10:39 /user/me/wordcount/output/_logs
drwxr-xr-x - me me 0 2013-11-12 10:39 /user/me/wordcount/output/_logs/history
-rw-r--r-- 3 me me 67134 2013-11-12 10:40 /user/me/wordcount/output/_logs/history/job_201311051201_0014_1384270782432_me_wordcount
-rw-r--r-- 3 me me 81866 2013-11-12 10:39 /user/me/wordcount/output/_logs/history/job_201311051201_0014_conf.xml
-rw-r--r-- 3 me me 0 2013-11-12 10:39 /user/me/wordcount/output/part-00000
-rw-r--r-- 3 me me 6 2013-11-12 10:39 /user/me/wordcount/output/part-00001
-rw-r--r-- 3 me me 9 2013-11-12 10:39 /user/me/wordcount/output/part-00002
-rw-r--r-- 3 me me 0 2013-11-12 10:39 /user/me/wordcount/output/part-00003
-rw-r--r-- 3 me me 10 2013-11-12 10:39 /user/me/wordcount/output/part-00004
-rw-r--r-- 3 me me 8 2013-11-12 10:39 /user/me/wordcount/output/part-00005
-rw-r--r-- 3 me me 0 2013-11-12 10:39 /user/me/wordcount/output/part-00006
-rw-r--r-- 3 me me 0 2013-11-12 10:39 /user/me/wordcount/output/part-00007
-rw-r--r-- 3 me me 0 2013-11-12 10:39 /user/me/wordcount/output/part-00008
-rw-r--r-- 3 me me 8 2013-11-12 10:39 /user/me/wordcount/output/part-00009
-rw-r--r-- 3 me me 0 2013-11-12 10:39 /user/me/wordcount/output/part-00010
-rw-r--r-- 3 me me 0 2013-11-12 10:39 /user/me/wordcount/output/part-00011
[me@server ~]$
这是 source :
package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount {
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
最佳答案
您正在启动 12 个 reduce 任务 (Launched reduce tasks=12),尽管映射器只有五个输出:根据教程,您有五个预期的输出。在 CDH3 中,reducer 的数量被设置为 mapper 输出的数量:很可能这种行为在 CDH4 中发生了变化 - 查看您的配置文件,看看您是否有类似 mapred.reduce.tasks<> 或类似的。
关于hadoop - 运行 WordCount v1.0 示例时 part-00000 中没有输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19934233/
总的来说,我对ruby还比较陌生,我正在为我正在创建的对象编写一些rspec测试用例。许多测试用例都非常基础,我只是想确保正确填充和返回值。我想知道是否有办法使用循环结构来执行此操作。不必为我要测试的每个方法都设置一个assertEquals。例如:describeitem,"TestingtheItem"doit"willhaveanullvaluetostart"doitem=Item.new#HereIcoulddotheitem.name.shouldbe_nil#thenIcoulddoitem.category.shouldbe_nilendend但我想要一些方法来使用
我好像记得Lua有类似Ruby的method_missing的东西。还是我记错了? 最佳答案 表的metatable的__index和__newindex可以用于与Ruby的method_missing相同的效果。 关于ruby-难道Lua没有和Ruby的method_missing相媲美的东西吗?,我们在StackOverflow上找到一个类似的问题: https://stackoverflow.com/questions/7732154/
在选择我想要运行操作的频率时,唯一的选项是“每天”、“每小时”和“每10分钟”。谢谢!我想为我的Rails3.1应用程序运行调度程序。 最佳答案 这不是一个优雅的解决方案,但您可以安排它每天运行,并在实际开始工作之前检查日期是否为当月的第一天。 关于ruby-如何每月在Heroku运行一次Scheduler插件?,我们在StackOverflow上找到一个类似的问题: https://stackoverflow.com/questions/8692687/
为了将Cucumber用于命令行脚本,我按照提供的说明安装了arubagem。它在我的Gemfile中,我可以验证是否安装了正确的版本并且我已经包含了require'aruba/cucumber'在'features/env.rb'中为了确保它能正常工作,我写了以下场景:@announceScenario:Testingcucumber/arubaGivenablankslateThentheoutputfrom"ls-la"shouldcontain"drw"假设事情应该失败。它确实失败了,但失败的原因是错误的:@announceScenario:Testingcucumber/ar
exe应该在我打开页面时运行。异步进程需要运行。有什么方法可以在ruby中使用两个参数异步运行exe吗?我已经尝试过ruby命令-system()、exec()但它正在等待过程完成。我需要用参数启动exe,无需等待进程完成是否有任何rubygems会支持我的问题? 最佳答案 您可以使用Process.spawn和Process.wait2:pid=Process.spawn'your.exe','--option'#Later...pid,status=Process.wait2pid您的程序将作为解释器的子进程执行。除
我尝试运行2.x应用程序。我使用rvm并为此应用程序设置其他版本的ruby:$rvmuseree-1.8.7-head我尝试运行服务器,然后出现很多错误:$script/serverNOTE:Gem.source_indexisdeprecated,useSpecification.Itwillberemovedonorafter2011-11-01.Gem.source_indexcalledfrom/Users/serg/rails_projects_terminal/work_proj/spohelp/config/../vendor/rails/railties/lib/r
我有一个奇怪的问题:我在rvm上安装了rubyonrails。一切正常,我可以创建项目。但是在我输入“railsnew”时重新启动后,我有“程序'rails'当前未安装。”。SystemUbuntu12.04ruby-v"1.9.3p194"gemlistactionmailer(3.2.5)actionpack(3.2.5)activemodel(3.2.5)activerecord(3.2.5)activeresource(3.2.5)activesupport(3.2.5)arel(3.0.2)builder(3.0.0)bundler(1.1.4)coffee-rails(
我正在使用puppet为ruby程序提供一组常量。我需要提供一组主机名,我的程序将对其进行迭代。在我之前使用的bash脚本中,我只是将它作为一个puppet变量hosts=>"host1,host2"我将其提供给bash脚本作为HOSTS=显然这对ruby不太适用——我需要它的格式hosts=["host1","host2"]自从phosts和putsmy_array.inspect提供输出["host1","host2"]我希望使用其中之一。不幸的是,我终其一生都无法弄清楚如何让它发挥作用。我尝试了以下各项:我发现某处他们指出我需要在函数调用前放置“function_”……这
我想在一个没有Sass引擎的类中使用Sass颜色函数。我已经在项目中使用了sassgem,所以我认为搭载会像以下一样简单:classRectangleincludeSass::Script::FunctionsdefcolorSass::Script::Color.new([0x82,0x39,0x06])enddefrender#hamlengineexecutedwithcontextofself#sothatwithintemlateicouldcall#%stop{offset:'0%',stop:{color:lighten(color)}}endend更新:参见上面的#re
Sinatra新手;我正在运行一些rspec测试,但在日志中收到了一堆不需要的噪音。如何消除日志中过多的噪音?我仔细检查了环境是否设置为:test,这意味着记录器级别应设置为WARN而不是DEBUG。spec_helper:require"./app"require"sinatra"require"rspec"require"rack/test"require"database_cleaner"require"factory_girl"set:environment,:testFactoryGirl.definition_file_paths=%w{./factories./test/