Tiago de Paula Peixoto | 20 Sep 2011 21:09
Picon
Gravatar

Re: Fast method for duplicated edge removal

Hi Giuseppe,

On 09/20/2011 04:36 PM, Giuseppe Profiti wrote:
> Hello,
> I'm using a cvsreader to parse a very big file and create an
> undirected graph accordingly.
> The file can contain duplicated edges (i.e. A B in one row, B A in
> another one), so I'm checking
>     if g.edge(v1,v2)==None:
>          e = g.add_edge(v1,v2)

> in order to discard them (v1 and v2 are vertices created from what
> it's read from the file).
> However the graph contains a lot of edges (few millions) and vertices
> (many thousands), with a potentially high degree for the vertices, and
> it takes a lot to process the data.
> As far as I read in the soruce code, the Graph.edge() method checks
> all the outgoing edges of the source node, but even if I check which
> node has the highest degree, it takes a lot of time to build the
> graph.
> 
> Is there any other way to remove the duplicated edges? Maybe an edge
> filter based on some lambda wizardry?

The library includes a 'remove_parallel_edges()' function which does
what you want, and is fast.

If you want to mask the parallel edges temporarily, you can use
filtering, as such:

   l = label_parallel_edges(g)
   g = GraphView(g, efilt=lambda e: l[e] == 0)

Cheers,
Tiago

--

-- 
Tiago de Paula Peixoto <tiago <at> skewed.de>

_______________________________________________
graph-tool mailing list
graph-tool <at> skewed.de
http://lists.skewed.de/mailman/listinfo/graph-tool

Gmane