So I'm struggling with what I hoped would be a fairly straightforward select query and it's causing me a massive headache... anyone feel like they could be of a little assistance to me?
Cheers, I'll post it up here then. Using postgis to do this. I've got 3 tables of data: person (p), postcode (po) and roads (r). postcode and roads are point and line features from shapefiles so contain spatial information. person also contains postcode data so I can use a join on that. The following is my query which returns the information I want but I need to have it so that it orders by dist_m in ascending order. Code: SELECT DISTINCT ON (p.caseid) p.caseid, p.surname, p.postcode, r.dftnumber AS closest_a_road, ROUND((ST_Distance(r.geom, po.geom))::NUMERIC, 1) AS distance_m FROM env.roads r, env.postcode po INNER JOIN env.person p ON po.postcode = p.postcode WHERE r.classifica = 'A Road' ORDER BY p.caseid, ST_Distance(r.geom, po.geom); This is linking each record in the person table to the nearest A road from the roads table. If I don't use DISTINCT on the ID for each person it will return a record for each person to every A road, whereas I only want the closest A road.. so the shortest distance in the column distance_m. However using DISTINCT ON p.caseid means I can't order by the distance... any suggestions?
Its a little hard to tell without looking at the full schema and running some test queries, but would doing something like this get you the details you need? Code: SELECTp.caseid, MIN(ROUND((ST_Distance(r.geom, po.geom))::NUMERIC, 1)) FROM FROM env.roads r, env.postcode po INNER JOIN env.person p ON po.postcode = p.postcodep GROUP BY p.caseid
It's an awkward one, it should only return 11 results from the current data I have which it does from the original query. It's just the ordering by distance that doesn't play along and I can't seem to get the MIN aggregater to work. I could provide a backup of the pgadmin database plus query if you fancied a proper look at the data? Cheers for the response
Just replace your order by statement "ORDER BY p.caseid, ST_Distance(r.geom, po.geom)" with "order by 1, 5". which means order by the 1st column then the 5th. You can do this instead of using the column names.
Unfortunately that still orders the query by p.caseid first. I'm hoping once I've got the results (postcode for each caseid (11 results) and distance to closest A road only) to order the query only by the distance_m column (column 5).
Sorry, I thought that the query was failing to execute completely because of the order by statement. I think it may not be possible to do what you want with the query structured in its current form. If you look at the stack overflow post below you can see someone with the same problem as you. http://stackoverflow.com/questions/9795660/postgresql-distinct-on-with-different-order-by The "distinct on" operator requires that you order by what ever is used with "distinct on" (actually the left most term of what is used with distinct on), so if you use "distinct on" you will have to order by p.caseid, which is obviously not what you want. You are going to have to bust out the sub queries to get this one done I think. Is there any relationship between the roads table and the postcode table, do they have any columns in common?
I have almost 0 experiance with postgress sql but in MSSQL you can use partition by to achive your requirements, it may be that postgress has this functionality as well?
No there's no relationship between postcodes and road, they're pretty much just the tables from a shapefile .dbf, they both have geometry which is where the spatial part of the query comes in. That's an interesting post though, I'll have a crack at a sub query when I get back and see how that goes. Appreciate the help, I'm determined to solve it!
I have no idea really, I'm pretty new to sql in general. Normally use arcgis, but I'm using qgis for this in particular so it's all part of the package. Don't think I've come across partitioning before?
I think I've solved it. Albeit there's probably a more elegant solution but the following seems to give me the right answer. Code: SELECT * FROM ( SELECT DISTINCT ON (p.caseid) p.caseid, p.postcode, r.dftnumber AS "closestARoad", ROUND((ST_Distance(r.geom, po.geom))::NUMERIC, 1) AS distance_m FROM env.roads r, env.postcode po INNER JOIN env.person p ON po.postcode = p.postcode WHERE r.classifica = 'A Road' ORDER BY p.caseid, ST_Distance(r.geom, po.geom)) AS t ORDER BY t.distance_m; I've got a few more to do though so I might pop back in here in the near future to pick your brains again Really appreciate the help!
Well it aint pretty and its not exactly the sub-query I had in mind, but you can't argue with results. I think your table structures might not be the best, which could be part of the reason you are having trouble with the queries. For example (and without knowing the full context of the tables and what you are doing) surely each geometric point on a road would have an associated postcode. There's a whole process that goes into creating a set of tables that works well in a relational database, its more than just grouping like things together. If you run the ST_Distance query on its own Code: SELECT (ROUND((ST_Distance(r.geom, po.geom))::NUMERIC, 1) AS distance_m FROM env.roads r, env.postcode po WHERE r.classifica = 'A Road'; Are you getting more results back than you expect?