1 Jun 2011 20:43
Re: A sudden msg of "java.io.IOException: Server not running, aborting"
Oh I'm so dumb, I didn't see that this is coming from _another_ region server. What happens when splitting is that we try to update the .META. table which requires talking to another region server, except that in this case the RS was aborting meaning that the split transaction had to be rolled back. Looking at the logs of the RS that carried .META. at that time would give you more clues into what really happened. J-D On Tue, May 31, 2011 at 5:24 PM, bijieshan <bijieshan@...> wrote: > I paste the logs as following. To short of the logs, some similar and closer logs I deleted: > >> 2011-05-25 09:15:44,232 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running rollback of failed split of ufdr,0065286138106876#4228000,1306260358978.37875b35a870957da534ad29fd2944d5.; java.io.IOException: Server not running, aborting >> at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2352) >> at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1653) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) >> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) >> >>>>[description] >> Before that time, everything seemed ok(it received so many open-region requests, and handled them), except several Exception happened during the time. >> >>>>[logs] >> 2011-05-25 09:15:36,214 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: ufdr,110018,1305617626846.f22eafaf06c3be7f7e5a447cec40c7fa. >> 2011-05-25 09:15:36,214 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: ufdr,111336,1305617636659.b47f1cf09b10aecaee6805a985646b30. >> 2011-05-25 09:15:36,215 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: ufdr,280884,1305617952747.7cac8ccc9c9770ef0e48807e07bc8bec. >> 2011-05-25 09:15:36,215 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: ufdr,280446,1305617949439.b332cfff565d77b2e2d4f1a2c52494b0. >> 2011-05-25 09:15:36,216 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: ufdr,101974,1305617623149.b60c71565f2c73b81477fe25ba80378b. >> 2011-05-25 09:15:36,413 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321 Attempting to transition node 11dc72d94c7a5a3d19b0c0c3c49624a5 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING >> 2011-05-25 09:15:36,423 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321 Successfully transitioned node 11dc72d94c7a5a3d19b0c0c3c49624a5 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING >> 2011-05-25 09:15:37,645 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321 Attempting to transition node c4c657deda864fa2018b38388ee928e6 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING >> 2011-05-25 09:15:37,653 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321 Attempting to transition node d2546cde1d6546133af74f25252b652c from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING >> 2011-05-25 09:15:37,660 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321 Successfully transitioned node c4c657deda864fa2018b38388ee928e6 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING >> 2011-05-25 09:15:37,662 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321 Attempting to transition node bd07db88de68c1022c5bba9e5471c9f9 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING >> 2011-05-25 09:15:37,668 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x32ffbc7e6e00321 Successfully transitioned node d2546cde1d6546133af74f25252b652c from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING >> >>>>[description] >> From the code, I saw that , only the Server has aborted, the msg of "Server not running, aborting" could be shown. It determined by the param of HRegionServer#abortRequested, if it was true, and if somewhere called the method of HRegionServer#checkOpen, this msg would been printed. >> >> /** >> * Called to verify that this server is up and running. >> * >> * <at> throws IOException >> */ >> protected void checkOpen() throws IOException { >> if (this.stopped || this.abortRequested) { >> throw new IOException("Server not running" >> + (this.abortRequested ? ", aborting" : "")); >> } >> if (!fsOk) { >> throw new IOException("File system not available"); >> } >> } >> >> For I said there's no aborted info been found in Logs: >> >> public void abort(String reason, Throwable cause) { >> if (cause != null) { >> LOG.fatal("ABORTING region server " + this + ": " + reason, cause); >> } else { >> LOG.fatal("ABORTING region server " + this + ": " + reason); >> } >> this.abortRequested = true; >> this.reservedSpace.clear(); >> if (this.metrics != null) { >> LOG.info("Dump of metrics: " + this.metrics); >> } >> stop(reason); >> } >> >> For there's no prev-aborting found in the logs, how did the sudden msg of " java.io.IOException: Server not running, aborting " occurred? >> Thanks! >> >> Jieshan Bean. >> >> >> >> >> >
RSS Feed