我尝试使用 HtmlUnit 保存 google 页面。但是我无法获得正确的用户界面。当我检查保存的页面代码样式标签是空的。
我的代码在这里。
public static void main(String[] args) throws IOException {
FileUtils.cleanDirectory(new File("/home/user1/Documents/Aaa"));
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setCssEnabled(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.waitForBackgroundJavaScriptStartingBefore(1000);
webClient.waitForBackgroundJavaScript(1000);
webClient.getOptions().setTimeout(5000);
System.out.println("******************loaded**********************************");
try {
HtmlPage page = webClient.getPage("https://www.google.com");
page.save(new File("/home/user1/Documents/Aaa/index.html"));
} catch (Exception e) {
System.out.println("******************catch***********************************");
e.printStackTrace();
}
webClient.close();
System.out.println("******************finished********************************");
}
我的页面看起来像
控制台日志
Dec 10, 2016 3:47:45 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Dec 10, 2016 3:47:46 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[TypeError: object is not iterable] sourceName=[https://www.google.co.in/xjs/_/js/k=xjs.s.en.igGBAtxEWN0.O/m=sx,c,sb,cdos,cr,elog,hsm,jsa,r,qsm,j,p,d,csi/am=AAiUPF6wAOL_ISBuIRxBasDAoA/rt=j/d=1/t=zcms/rs=ACT90oGjQTdwqicso-l4vNE-7GeAqTtjtw] line=[10] lineSource=[null] lineOffset=[0]
Dec 10, 2016 3:47:46 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'https://www.google.co.in/?gfe_rd=cr&ei=RtZLWL3cDsmL8QeW4Yb4AQ' [1:14018] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <URI>, <FUNCTION>, "progid:".)
Dec 10, 2016 3:47:46 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'https://www.google.co.in/?gfe_rd=cr&ei=RtZLWL3cDsmL8QeW4Yb4AQ' [1:14042] Error in expression. (Invalid token " ". Was expecting one of: <NUMBER>, "inherit", <IDENT>, <STRING>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <RESOLUTION_DPI>, <RESOLUTION_DPCM>, <PERCENTAGE>, <DIMENSION>, <URI>, <FUNCTION>, "progid:".)
Dec 10, 2016 3:47:46 PM com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine handleJavaScriptException
INFO: Caught script exception
======= EXCEPTION START ========
EcmaError: lineNumber=[551] column=[0] lineSource=[null] name=[TypeError] sourceName=[https://www.google.co.in/xjs/_/js/k=xjs.s.en.igGBAtxEWN0.O/m=sx,c,sb,cdos,cr,elog,hsm,jsa,r,qsm,j,p,d,csi/am=AAiUPF6wAOL_ISBuIRxBasDAoA/rt=j/d=1/t=zcms/rs=ACT90oGjQTdwqicso-l4vNE-7GeAqTtjtw] message=[TypeError: Cannot read property "Kf" from undefined (https://www.google.co.in/xjs/_/js/k=xjs.s.en.igGBAtxEWN0.O/m=sx,c,sb,cdos,cr,elog,hsm,jsa,r,qsm,j,p,d,csi/am=AAiUPF6wAOL_ISBuIRxBasDAoA/rt=j/d=1/t=zcms/rs=ACT90oGjQTdwqicso-l4vNE-7GeAqTtjtw#551)]
com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot read property "Kf" from undefined (https://www.google.co.in/xjs/_/js/k=xjs.s.en.igGBAtxEWN0.O/m=sx,c,sb,cdos,cr,elog,hsm,jsa,r,qsm,j,p,d,csi/am=AAiUPF6wAOL_ISBuIRxBasDAoA/rt=j/d=1/t=zcms/rs=ACT90oGjQTdwqicso-l4vNE-7GeAqTtjtw#551)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:921)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:852)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:824)
at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216)
at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.executeEventListeners(EventListenersContainer.java:258)
at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.executeBubblingListeners(EventListenersContainer.java:322)
at com.gargoylesoftware.htmlunit.javascript.host.event.EventTarget.fireEvent(EventTarget.java:206)
at com.gargoylesoftware.htmlunit.javascript.host.Window.dispatchEvent(Window.java:2033)
at com.gargoylesoftware.htmlunit.javascript.host.Window$2$1.run(Window.java:2119)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
at com.gargoylesoftware.htmlunit.javascript.host.Window$2.execute(Window.java:2124)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:966)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.access$500(JavaScriptEngine.java:101)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:916)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:852)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:824)
at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216)
at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.executeEventListeners(EventListenersContainer.java:258)
at com.gargoylesoftware.htmlunit.javascript.host.event.EventListenersContainer.executeBubblingListeners(EventListenersContainer.java:322)
at com.gargoylesoftware.htmlunit.javascript.host.event.EventTarget.fireEvent(EventTarget.java:191)
at com.gargoylesoftware.htmlunit.html.DomElement$2.run(DomElement.java:1190)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
at com.gargoylesoftware.htmlunit.html.DomElement.fireEvent(DomElement.java:1195)
at com.gargoylesoftware.htmlunit.html.DomElement.fireEvent(DomElement.java:1163)
at com.gargoylesoftware.htmlunit.InteractivePage.setFocusedElement(InteractivePage.java:100)
at com.gargoylesoftware.htmlunit.InteractivePage.setFocusedElement(InteractivePage.java:66)
at com.gargoylesoftware.htmlunit.html.HtmlElement.detach(HtmlElement.java:1254)
at com.gargoylesoftware.htmlunit.html.DomNode.remove(DomNode.java:1132)
at com.gargoylesoftware.htmlunit.html.DomNode.replace(DomNode.java:1210)
at com.gargoylesoftware.htmlunit.javascript.host.dom.Node.replaceChild(Node.java:444)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:153)
at net.sourceforge.htmlunit.corejs.javascript.FunctionObject.call(FunctionObject.java:448)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1540)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:800)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.iterativeMethod(NativeArray.java:1694)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.execIdCall(NativeArray.java:405)
at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call(IdFunctionObject.java:94)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.applyOrCall(ScriptRuntime.java:2575)
at net.sourceforge.htmlunit.corejs.javascript.BaseFunction.execIdCall(BaseFunction.java:321)
at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call(IdFunctionObject.java:94)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1540)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:800)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:413)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:252)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3264)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:794)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:906)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:803)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:779)
at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:975)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:352)
at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:238)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:966)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.access$500(JavaScriptEngine.java:101)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:916)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:515)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:852)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:824)
at com.gargoylesoftware.htmlunit.InteractivePage.executeJavaScriptFunctionIfPossible(InteractivePage.java:216)
at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:52)
at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:102)
at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:426)
at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:157)
at java.lang.Thread.run(Thread.java:745)
Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot read property "Kf" from undefined (https://www.google.co.in/xjs/_/js/k=xjs.s.en.igGBAtxEWN0.O/m=sx,c,sb,cdos,cr,elog,hsm,jsa,r,qsm,j,p,d,csi/am=AAiUPF6wAOL_ISBuIRxBasDAoA/rt=j/d=1/t=zcms/rs=ACT90oGjQTdwqicso-l4vNE-7GeAqTtjtw#551)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3915)
对这种情况有什么想法吗?
已编辑: 我认为错误是在解析 CSS3 过滤器属性时出现的。在这样的谷歌页面代码中。
filter:progid:DXImageTransform.Microsoft.gradient(startColorstr=#4387fd,endColorstr=#4683ea,GradientType=1)
在这个过滤器中有一个冒号,progrid 有一个冒号。所以这会导致 CSS 解析错误。有什么解决办法吗?
最佳答案
起初:
WARNING: CSS error: 'https://www.google.co.in/?gfe_rd=cr&ei=hyxFWPuPJMyL8QfigajQCw'
这是一个警告,或多或少意味着 css 解析器不喜欢给定的 css 规则之一。这并没有真正的影响,因为如果页面上的 js 代码要求样式的 html 元素,这可能会导致一些格式信息丢失。
第二个
webClient.waitForBackgroundJavaScriptStartingBefore(1000);
webClient.waitForBackgroundJavaScript(1000);
这两个调用不是配置选项——在给定的地方进行这些调用是没有意义的。
但两者都只是小事,你唯一真正的问题是
com.gargoylesoftware.htmlunit.ScriptException: Exception invoking constructor
异常。这通常会停止页面的 js 处理(但在您的情况下不会,因为您已将 ThrowExceptionOnScriptError 设置为 false)。
要分析问题,我们至少需要构造函数抛出的异常的堆栈跟踪。通常此堆栈跟踪也是日志的一部分。
但也许有更简单的解决方案。您尚未提供有关您正在使用的 HtmlUnit 版本的任何信息。我想建议使用最新可用的 SNAPSHOT 版本(您可以在 http://htmlunit.sourceforge.net/gettingLatestCode.html 找到一些如何获取的信息)。我用最新的 htmlunit 代码做了一个快速测试,并且能够毫无异常(exception)地获取页面。
public static void main(String[] args) throws IOException {
FileUtils.cleanDirectory(new File("c://temp/htmlunit/aa"));
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setCssEnabled(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setTimeout(5000);
System.out.println("******************loaded**********************************");
try {
HtmlPage page = webClient.getPage("https://www.google.com");
webClient.waitForBackgroundJavaScriptStartingBefore(1000);
webClient.waitForBackgroundJavaScript(1000);
page.save(new File("c://temp/htmlunit/aa/index.html"));
} catch (Exception e) {
System.out.println("******************catch***********************************");
e.printStackTrace();
}
webClient.close();
System.out.println("******************finished********************************");
}
关于java - HtmlUnit css 未正确应用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40969472/
对于具有离线功能的智能手机应用程序,我正在为Xml文件创建单向文本同步。我希望我的服务器将增量/差异(例如GNU差异补丁)发送到目标设备。这是计划:Time=0Server:hasversion_1ofXmlfile(~800kiB)Client:hasversion_1ofXmlfile(~800kiB)Time=1Server:hasversion_1andversion_2ofXmlfile(each~800kiB)computesdeltaoftheseversions(=patch)(~10kiB)sendspatchtoClient(~10kiBtransferred)Cl
我构建了两个需要相互通信和发送文件的Rails应用程序。例如,一个Rails应用程序会发送请求以查看其他应用程序数据库中的表。然后另一个应用程序将呈现该表的json并将其发回。我还希望一个应用程序将存储在其公共(public)目录中的文本文件发送到另一个应用程序的公共(public)目录。我从来没有做过这样的事情,所以我什至不知道从哪里开始。任何帮助,将不胜感激。谢谢! 最佳答案 无论Rails是什么,几乎所有Web应用程序都有您的要求,大多数现代Web应用程序都需要相互通信。但是有一个小小的理解需要你坚持下去,网站不应直接访问彼此
我尝试运行2.x应用程序。我使用rvm并为此应用程序设置其他版本的ruby:$rvmuseree-1.8.7-head我尝试运行服务器,然后出现很多错误:$script/serverNOTE:Gem.source_indexisdeprecated,useSpecification.Itwillberemovedonorafter2011-11-01.Gem.source_indexcalledfrom/Users/serg/rails_projects_terminal/work_proj/spohelp/config/../vendor/rails/railties/lib/r
我正在查看instance_variable_set的文档并看到给出的示例代码是这样做的:obj.instance_variable_set(:@instnc_var,"valuefortheinstancevariable")然后允许您在类的任何实例方法中以@instnc_var的形式访问该变量。我想知道为什么在@instnc_var之前需要一个冒号:。冒号有什么作用? 最佳答案 我的第一直觉是告诉你不要使用instance_variable_set除非你真的知道你用它做什么。它本质上是一种元编程工具或绕过实例变量可见性的黑客攻击
刚入门rails,开始慢慢理解。有人可以解释或给我一些关于在application_controller中编码的好处或时间和原因的想法吗?有哪些用例。您如何为Rails应用程序使用应用程序Controller?我不想在那里放太多代码,因为据我了解,每个请求都会调用此Controller。这是真的? 最佳答案 ApplicationController实际上是您应用程序中的每个其他Controller都将从中继承的类(尽管这不是强制性的)。我同意不要用太多代码弄乱它并保持干净整洁的态度,尽管在某些情况下ApplicationContr
我真的很习惯使用Ruby编写以下代码:my_hash={}my_hash['test']=1Java中对应的数据结构是什么? 最佳答案 HashMapmap=newHashMap();map.put("test",1);我假设? 关于java-等价于Java中的RubyHash,我们在StackOverflow上找到一个类似的问题: https://stackoverflow.com/questions/22737685/
我是一个Rails初学者,但我想从我的RailsView(html.haml文件)中查看Ruby变量的内容。我试图在ruby中打印出变量(认为它会在终端中出现),但没有得到任何结果。有什么建议吗?我知道Rails调试器,但更喜欢使用inspect来打印我的变量。 最佳答案 您可以在View中使用puts方法将信息输出到服务器控制台。您应该能够在View中的任何位置使用Haml执行以下操作:-puts@my_variable.inspect 关于ruby-on-rails-如何在我的R
question的一些答案关于redirect_to让我想到了其他一些问题。基本上,我正在使用Rails2.1编写博客应用程序。我一直在尝试自己完成大部分工作(因为我对Rails有所了解),但在需要时会引用Internet上的教程和引用资料。我设法让一个简单的博客正常运行,然后我尝试添加评论。靠我自己,我设法让它进入了可以从script/console添加评论的阶段,但我无法让表单正常工作。我遵循的其中一个教程建议在帖子Controller中创建一个“评论”操作,以添加评论。我的问题是:这是“标准”方式吗?我的另一个问题的答案之一似乎暗示应该有一个CommentsController参
我喜欢使用Textile或Markdown为我的项目编写自述文件,但是当我生成RDoc时,自述文件被解释为RDoc并且看起来非常糟糕。有没有办法让RDoc通过RedCloth或BlueCloth而不是它自己的格式化程序运行文件?它可以配置为自动检测文件后缀的格式吗?(例如README.textile通过RedCloth运行,但README.mdown通过BlueCloth运行) 最佳答案 使用YARD直接代替RDoc将允许您包含Textile或Markdown文件,只要它们的文件后缀是合理的。我经常使用类似于以下Rake任务的东西:
我一直致力于让我们的Rails2.3.8应用程序在JRuby下正确运行。一切正常,直到我启用config.threadsafe!以实现JRuby提供的并发性。这导致lib/中的模块和类不再自动加载。使用config.threadsafe!启用:$rubyscript/runner-eproduction'pSim::Sim200Provisioner'/Users/amchale/.rvm/gems/jruby-1.5.1@web-services/gems/activesupport-2.3.8/lib/active_support/dependencies.rb:105:in`co