作为我研究的一部分,我正在使用不同的并行计算语言实现 Totient 求和(Euler 的 Totient),老实说,我在 MapReduce 方面相当吃力。 主要目标是对运行时、效率等进行基准测试......
我的代码现在正在运行,我得到了正确的输出,但速度很慢,我想知道为什么。
是因为我的实现还是因为 Hadoop MadReduce 不是为此目的而设计的。 我还实现了一个组合器,因为根据我的阅读,它应该优化代码,但事实并非如此。 抱歉,如果这个问题看起来很愚蠢,但我在互联网上没有找到任何东西,而且我已经厌倦了尝试一切都没有任何结果。
我的输入文件是1到15000之间的值
1 2 3 4 5 6 ... 14998 14999 15000
我在 32 个节点的集群上工作,我的目标是让每个节点计算我的范围(组合器)的一部分,然后在 reducer 中对组合器的所有“子和”求和。
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class NewTotient {
public static long hcf(long x, long y)
{
long t;
while (y != 0) {
t = x % y;
x = y;
y = t;
}
return x;
}
public static boolean relprime(long x, long y)
{
return hcf(x, y) == 1;
}
public static long euler(long n)
{
long length, i;
length = 0;
for (i = 1; i < n; i++)
if (relprime(n, i))
length++;
return length;
}
public static class TotientMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
for (String val : value.toString().split(" ")) {
context.write(new Text(), new IntWritable(Integer.valueOf(val)));
}
}
}
public static class TotientCombiner extends Reducer<Text,IntWritable,Text,IntWritable> {
//private IntWritable result = new IntWritable();
protected void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += NewTotient.euler(val.get());
}
}
}
public static class TotientReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
//private IntWritable result = new IntWritable();
protected void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {
int sum = 1;
for (IntWritable val : values) {
sum += val.get();
}
context.write(null, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
System.out.println("\n\n__________________________________________________________\n"+"Starting Job\n"+"__________________________________________________________\n\n");
final long startTime = System.currentTimeMillis();
Job job = Job.getInstance(conf, "Sum of Totient");
job.setJarByClass(NewTotient.class);
job.setMapperClass(TotientMapper.class);
job.setCombinerClass(TotientCombiner.class);
job.setReducerClass(TotientReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//job.setOutputKeyClass(Text.class);
//job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
final double duration = (System.currentTimeMillis() - startTime)/1000.0;
System.out.println("\n\n__________________________________________________________\n"+"Job Finished in " + duration + " seconds\n"+"__________________________________________________________\n\n");
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
如果这对我有帮助的话,这是我从 0 到 10 的数据集的输出(所以基本上我只是计算前 10 个 Totient 的总和:
__________________________________________________________
Starting Job
__________________________________________________________
2018-04-02 06:09:27,583 INFO client.RMProxy: Connecting to ResourceManager at bwlf32/137.195.143.132:33312
2018-04-02 06:09:28,377 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2018-04-02 06:09:28,423 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/jo20/.staging/job_1522471222360_0016
2018-04-02 06:09:28,775 INFO input.FileInputFormat: Total input files to process : 1
2018-04-02 06:09:29,029 INFO mapreduce.JobSubmitter: number of splits:1
2018-04-02 06:09:29,101 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-04-02 06:09:29,288 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1522471222360_0016
2018-04-02 06:09:29,290 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-04-02 06:09:29,538 INFO conf.Configuration: resource-types.xml not found
2018-04-02 06:09:29,539 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-04-02 06:09:29,628 INFO impl.YarnClientImpl: Submitted application application_1522471222360_0016
2018-04-02 06:09:29,687 INFO mapreduce.Job: The url to track the job: http://bwlf32:33314/proxy/application_1522471222360_0016/
2018-04-02 06:09:29,688 INFO mapreduce.Job: Running job: job_1522471222360_0016
2018-04-02 06:09:37,849 INFO mapreduce.Job: Job job_1522471222360_0016 running in uber mode : false
2018-04-02 06:09:37,852 INFO mapreduce.Job: map 0% reduce 0%
2018-04-02 06:09:44,960 INFO mapreduce.Job: map 100% reduce 0%
2018-04-02 06:09:52,008 INFO mapreduce.Job: map 100% reduce 100%
2018-04-02 06:09:52,022 INFO mapreduce.Job: Job job_1522471222360_0016 completed successfully
2018-04-02 06:09:52,178 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=6
FILE: Number of bytes written=414497
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=123
HDFS: Number of bytes written=0
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=9126
Total time spent by all reduces in occupied slots (ms)=9688
Total time spent by all map tasks (ms)=4563
Total time spent by all reduce tasks (ms)=4844
Total vcore-milliseconds taken by all map tasks=4563
Total vcore-milliseconds taken by all reduce tasks=4844
Total megabyte-milliseconds taken by all map tasks=1168128
Total megabyte-milliseconds taken by all reduce tasks=1240064
Map-Reduce Framework
Map input records=1
Map output records=10
Map output bytes=50
Map output materialized bytes=6
Input split bytes=102
Combine input records=10
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=6
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=157
CPU time spent (ms)=2220
Physical memory (bytes) snapshot=507772928
Virtual memory (bytes) snapshot=3889602560
Total committed heap usage (bytes)=347078656
Peak Map Physical memory (bytes)=306073600
Peak Map Virtual memory (bytes)=1945808896
Peak Reduce Physical memory (bytes)=201699328
Peak Reduce Virtual memory (bytes)=1943793664
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=21
File Output Format Counters
Bytes Written=0
__________________________________________________________
Job Finished in 26.225 seconds
__________________________________________________________
2018-04-02 06:09:52,182 INFO mapreduce.Job: Running job: job_1522471222360_0016
2018-04-02 06:09:52,188 INFO mapreduce.Job: Job job_1522471222360_0016 running in uber mode : false
2018-04-02 06:09:52,188 INFO mapreduce.Job: map 100% reduce 100%
2018-04-02 06:09:52,193 INFO mapreduce.Job: Job job_1522471222360_0016 completed successfully
2018-04-02 06:09:52,201 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=6
FILE: Number of bytes written=414497
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=123
HDFS: Number of bytes written=0
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=9126
Total time spent by all reduces in occupied slots (ms)=9688
Total time spent by all map tasks (ms)=4563
Total time spent by all reduce tasks (ms)=4844
Total vcore-milliseconds taken by all map tasks=4563
Total vcore-milliseconds taken by all reduce tasks=4844
Total megabyte-milliseconds taken by all map tasks=1168128
Total megabyte-milliseconds taken by all reduce tasks=1240064
Map-Reduce Framework
Map input records=1
Map output records=10
Map output bytes=50
Map output materialized bytes=6
Input split bytes=102
Combine input records=10
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=6
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=157
CPU time spent (ms)=2220
Physical memory (bytes) snapshot=507772928
Virtual memory (bytes) snapshot=3889602560
Total committed heap usage (bytes)=347078656
Peak Map Physical memory (bytes)=306073600
Peak Map Virtual memory (bytes)=1945808896
Peak Reduce Physical memory (bytes)=201699328
Peak Reduce Virtual memory (bytes)=1943793664
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=21
File Output Format Counters
Bytes Written=0
在 Java 中使用我的顺序代码速度更快:
real 0m0.512s
user 0m0.279s
sys 0m0.142s
明确一点,我必须使用这种计算方式,因为它足够慢,可以在不同系统之间进行有趣的比较,即使我知道,我也无法使用更智能的计算方式提高我的系统速度有计算所有素因子及其倍数的想法,并从 n 中减去此计数以获得 totient 函数值(素因子和素因子的倍数不会使 gcd 为 1)。
最佳答案
此处您在单行中提供来自文件的输入。映射器中使用的键是新行,因此由于只有一行,它将由单个映射任务处理,因此它不会并行处理输入。 您可以做的一件事是在新行而不是空格中提供每个输入数字,并相应地更改映射器。 组合器在这里也没有多大意义,因为您没有在 map 输出中使用不同的键
关于java - Hadoop MapReduce - Euler 的 Totient/Sum of Totient(和其他数学运算),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49605690/
我试图在一个项目中使用rake,如果我把所有东西都放到Rakefile中,它会很大并且很难读取/找到东西,所以我试着将每个命名空间放在lib/rake中它自己的文件中,我添加了这个到我的rake文件的顶部:Dir['#{File.dirname(__FILE__)}/lib/rake/*.rake'].map{|f|requiref}它加载文件没问题,但没有任务。我现在只有一个.rake文件作为测试,名为“servers.rake”,它看起来像这样:namespace:serverdotask:testdoputs"test"endend所以当我运行rakeserver:testid时
我正在寻找执行以下操作的正确语法(在Perl、Shell或Ruby中):#variabletoaccessthedatalinesappendedasafileEND_OF_SCRIPT_MARKERrawdatastartshereanditcontinues. 最佳答案 Perl用__DATA__做这个:#!/usr/bin/perlusestrict;usewarnings;while(){print;}__DATA__Texttoprintgoeshere 关于ruby-如何将脚
我真的很习惯使用Ruby编写以下代码:my_hash={}my_hash['test']=1Java中对应的数据结构是什么? 最佳答案 HashMapmap=newHashMap();map.put("test",1);我假设? 关于java-等价于Java中的RubyHash,我们在StackOverflow上找到一个类似的问题: https://stackoverflow.com/questions/22737685/
请帮助我理解范围运算符...和..之间的区别,作为Ruby中使用的“触发器”。这是PragmaticProgrammersguidetoRuby中的一个示例:a=(11..20).collect{|i|(i%4==0)..(i%3==0)?i:nil}返回:[nil,12,nil,nil,nil,16,17,18,nil,20]还有:a=(11..20).collect{|i|(i%4==0)...(i%3==0)?i:nil}返回:[nil,12,13,14,15,16,17,18,nil,20] 最佳答案 触发器(又名f/f)是
我正在尝试使用boilerpipe来自JRuby。我看过guide从JRuby调用Java,并成功地将它与另一个Java包一起使用,但无法弄清楚为什么同样的东西不能用于boilerpipe。我正在尝试基本上从JRuby中执行与此Java等效的操作:URLurl=newURL("http://www.example.com/some-location/index.html");Stringtext=ArticleExtractor.INSTANCE.getText(url);在JRuby中试过这个:require'java'url=java.net.URL.new("http://www
我需要一些关于TDD概念的帮助。假设我有以下代码defexecute(command)casecommandwhen"c"create_new_characterwhen"i"display_inventoryendenddefcreate_new_character#dostufftocreatenewcharacterenddefdisplay_inventory#dostufftodisplayinventoryend现在我不确定要为什么编写单元测试。如果我为execute方法编写单元测试,那不是几乎涵盖了我对create_new_character和display_invent
我只想对我一直在思考的这个问题有其他意见,例如我有classuser_controller和classuserclassUserattr_accessor:name,:usernameendclassUserController//dosomethingaboutanythingaboutusersend问题是我的User类中是否应该有逻辑user=User.newuser.do_something(user1)oritshouldbeuser_controller=UserController.newuser_controller.do_something(user1,user2)我
什么是ruby的rack或python的Java的wsgi?还有一个路由库。 最佳答案 来自Python标准PEP333:Bycontrast,althoughJavahasjustasmanywebapplicationframeworksavailable,Java's"servlet"APImakesitpossibleforapplicationswrittenwithanyJavawebapplicationframeworktoruninanywebserverthatsupportstheservletAPI.ht
这篇文章是继上一篇文章“Observability:从零开始创建Java微服务并监控它(一)”的续篇。在上一篇文章中,我们讲述了如何创建一个Javaweb应用,并使用Filebeat来收集应用所生成的日志。在今天的文章中,我来详述如何收集应用的指标,使用APM来监控应用并监督web服务的在线情况。源码可以在地址 https://github.com/liu-xiao-guo/java_observability 进行下载。摄入指标指标被视为可以随时更改的时间点值。当前请求的数量可以改变任何毫秒。你可能有1000个请求的峰值,然后一切都回到一个请求。这也意味着这些指标可能不准确,你还想提取最小/
HashMap中为什么引入红黑树,而不是AVL树呢1.概述开始学习这个知识点之前我们需要知道,在JDK1.8以及之前,针对HashMap有什么不同。JDK1.7的时候,HashMap的底层实现是数组+链表JDK1.8的时候,HashMap的底层实现是数组+链表+红黑树我们要思考一个问题,为什么要从链表转为红黑树呢。首先先让我们了解下链表有什么不好???2.链表上述的截图其实就是链表的结构,我们来看下链表的增删改查的时间复杂度增:因为链表不是线性结构,所以每次添加的时候,只需要移动一个节点,所以可以理解为复杂度是N(1)删:算法时间复杂度跟增保持一致查:既然是非线性结构,所以查询某一个节点的时候