Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: kabum <uu.kabum <at> gmail.com>
Subject: Re: [OSM-dev] Google Summer of Code
Newsgroups: gmane.comp.gis.openstreetmap.devel
Date: Thursday 5th April 2012 10:34:58 UTC (over 4 years ago)
Am 3. April 2012 20:02 schrieb Paul Norman :

> The problem with detecting when changesets are closed is that there is no
> way to determine exactly when they are closed short of an API query. You
> can fake it by assuming changesets are closed an hour after the last
change
> to them and 24 hours after the first change to them.
>

Open:  (http://www.openstreetmap.org/api/0.6/changeset/11187430)








Closed: (http://www.openstreetmap.org/api/0.6/changeset/11167430)







Or have I missed something?



>  It is better to detect problems when they occur, not up to 24 hours
after
> they’ve occurred.
>

That's correct. A good practise would be, to code it as abstract as
possible and so only parse modify/delete/create sets. The origin
(minute/hour-diff/changeset) will be ignored.

I try to take this into account in my proposal.

Thanks for all of your ideas! It's time to finish my proposal :)

Regards,
Morris



> ****
>
> ** **
>
> *From:* kabum [mailto:[email protected]]
> *Sent:* Tuesday, April 03, 2012 2:20 AM
> *To:* Derick Rethans
> *Cc:* OpenStreetMap dev list
>
> *Subject:* Re: [OSM-dev] Google Summer of Code****
>
> ** **
>
> Hi,****
>
> ** **
>
> Am 2. April 2012 22:20 schrieb Paul Norman :****
>
> A tool that operates on the changeset level is
> https://github.com/pnorman/osm-weirdness****
>
> It detects changesets that have a high probability  of being an import or
> mechanical edit. The detection is pretty crude but it does find a fair
> number of undocumented imports, mechanical edits, and other weirdness. If
> you point it an old state.txt file it will start in the past and work up
to
> the present.****
>
> ** **
>
> I've a look later this day on your script.****
>
>   ****
>
> When working with the minutely diffs there are some limitations:****
>
> Limited knowledge of changesets. In practice, if you start your detection
> an hour in the past you can have a list of all open changesets, but it is
> not possible to know the tags of the changesets.****
>
> No knowledge of the previous state of objects. You know where deleted
> objects were, but you can’t tell how far an object is moved or what
it’s
> tags were before. To tell this you need to query a service with a full
> history DB, and handling full history files is difficult.****
>
> No knowledge of way geometry if using existing nodes. Iandees’
> https://github.com/pnorman/osm-weirdness/tree/way_check
solves this by
> fetching nodes in a way that aren’t also in the changeset from jxapi
and it
> can then detect bad geometry (e.g. ways that trace over themselves)****
>
>  ****
>
> If you were to code a vandalism detection tool I think it should work on
> the minutely replication diffs (
> http://wiki.openstreetmap.org/wiki/Planet.osm/diffs)****
>
> ** **
>
> I thought about analyse the data after the changeset is closed, but this
> diffs sounds also good. I will check this way :) Thanks!****
>
>  ****
>
>  ****
>
> Am 3. April 2012 09:38 schrieb Derick Rethans :****
>
> On Mon, 2 Apr 2012, kabum wrote:
>
> > Result:
> > - each changeset has a total rating -> use a treshold value to divide
> them
> > into suspicious and not suspicious****
>
> Instead of just using static thresholds, I think that something like SVM
> (http://en.wikipedia.org/wiki/Support_vector_machine)
might be highly
> benificial here; and it's another cool technology to play with. There is
> a cool library for this (http://www.csie.ntu.edu.tw/~cjlin/libsvm/)
and
> I know there is at least an extension to use it from PHP:
> http://phpir.com/support-vector-machines-in-php****
>
> ** **
>
> Thanks for this method ... seems to be very suitable for our use
case.****
>
> ** **
>
> I've already some years of experience of PHP, but I wouldn't prefer it
for
> this part of the project. I thought about Python (libsvm has native
Python
> bindings ;)) ****
>
> ** **
>
> ** **
>
> ** **
>
>
> > Some questions came up within this preparation:
> > - Is there a prefered language? Has this to be specified within the
> > proposal? (language skill has to be rated, so I would decide this
during
> > the project phase)****
>
> Not really any preferred language. What did you have in mind? For the
> front end I was thinking PHP, but the engine, I wouldn't know. I think
> something high performant (so C or C++) might be benificial.****
>
> ** **
>
>
> My thoughts were that it's easy to setup and it's capable to call it easy
> from a terminal or to include it in other python scripts (i.e. web
> frontend).****
>
> ** **
>
> If C++ is necessary, because of it's speed, then I think I could master
> this. In the passed semester I participated in a software engineering
> partical training at university (in a team of five fellow students),
where
> we have an extensive use of C++ (https://github.com/brainafk/Empire).****
>
>  ****
>
>
> > - I also would like to discuss used libraries and framework within the
> > project phase, or should I decide this also in my proposal?
> > - Should the frontend integrate in the current website (ruby on rails
> > project) or should this just be an optional feature?****
>
> I think it can easily live as it's own website.****
>
> ** **
>
> Ok :)****
>
>  ****
>
>
> > - How detailed should be the proposal? Is it enough to formulate this
> draft?****
>
> That's a tricky one, the more information you provide the better I
> think, as it shows you have thought about it :-)****
>
> ** **
>
> I think it grows a lot by this discussion and I try to be as detailed as
> possible. :)****
>
> ** **
>
> Thanks for the response :)****
>
> ** **
>
> Regards,****
>
> Morris****
>
 
CD: 3ms