GeoPySpark中的地图代数

GeoPySpark中的地图代数

给定一组栅格图层,可能需要组合和过滤这些图层的内容。这是映射代数的函数。GeoPySpark提供了两类地图代数操作:localfocal操作。local操作考虑一个或多个栅格的像素或单元格,将函数应用于相应的单元格值。例如,添加两个栅格的像素值来形成一个新的图层是。

focal操作考虑一个输入栅格的每个像素周围的区域,并对每个区域应用操作。该操作的结果存储在输出栅格的相应像素中。例如,可以根据2d高斯分布对以像素为中心的5x5区域进行加权,以影响输入入栅格的模糊。有人可能会认为这相当于一个二维卷积运算。

注意: 映射代数操作只能在TiledRasterLayers上工作,如果一个local操作需要多个输入,那么这些输入必须具有相同的布局和投影。

在开始之前,本指南中的所有示例都需要导入一下包:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import geopyspark as gps
import numpy as np

from pyspark import SparkContext
from shapely.geometry import Point, MultiPolygon, LineString, box

conf = gps.geopyspark_conf(master="local[*]", appName="map-algebra")
pysc = SparkContext(conf=conf)

# Setting up the data

cells = np.array([[[3, 4, 1, 1, 1],
[7, 4, 0, 1, 0],
[3, 3, 7, 7, 1],
[0, 7, 2, 0, 0],
[6, 6, 6, 5, 5]]], dtype='int32')

extent = gps.ProjectedExtent(extent = gps.Extent(0, 0, 5, 5), epsg=4326)

layer = [(extent, gps.Tile.from_numpy_array(numpy_array=cells))]

rdd = pysc.parallelize(layer)
raster_layer = gps.RasterLayer.from_numpy_rdd(gps.LayerType.SPATIAL, rdd)
tiled_layer = raster_layer.tile_to_layout(layout=gps.LocalLayout(tile_size=5))

Local Operations

TiledRasterLayers上的Local操作可以使用intfloat或其他TiledRasterLayers+-*/**abs都是当前支持的Local操作。

1
2
3
4
5
6
7
8
9
(tiled_layer + 1)

(2 - (tiled_layer * 3))

((tiled_layer + tiled_layer) / (tiled_layer + 1))

abs(tiled_layer)

2 ** tiled_layer

Pyramid也可以用于Local操作。可以在Pyramid的本地操作中使用的类型有:intfloatTiledRasterLayer和其他的Pyramid

注意: 与使用TiledRasterLayer一样,在多个PyramidTiledRasterLayer上执行计算意味着它们必须具有相同的布局和投影。

1
2
3
4
5
6
# Creating out Pyramid
pyramid = tiled_layer.pyramid()

pyramid + 1

(pyramid - tiled_layer) * 2

Focal Operations

在GeoPySpark中,Focal操作是通过对图层中每个Tile的邻域执行给定操作来执行的。可以从Neighborhood枚举类中选择要使用的邻域。同样,可以从operation枚举类中选择一个操作类型。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# This creates an instance of Square with an extent of 1. This means that
# each operation will be performed on a 3x3
# neighborhood.

'''
A square neighborhood with an extent of 1.
o = source cell
x = cells that fall within the neighbhorhood

x x x
x o x
x x x
'''

square = gps.Square(extent=1)

Mean

1
tiled_layer.focal(operation=gps.Operation.MEAN, neighborhood=square)

Median

1
tiled_layer.focal(operation=gps.Operation.MEDIAN, neighborhood=square)

Mode

1
tiled_layer.focal(operation=gps.Operation.MODE, neighborhood=square)

Sum

1
tiled_layer.focal(operation=gps.Operation.SUM, neighborhood=square)

Standard Diviation

1
tiled_layer.focal(operation=gps.Operation.STANDARD_DEVIATION, neighborhood=square)

Min

1
tiled_layer.focal(operation=gps.Operation.MIN, neighborhood=square)

Max

1
tiled_layer.focal(operation=gps.Operation.MAX, neighborhood=square)

Slope

1
tiled_layer.focal(operation=gps.Operation.SLOPE, neighborhood=square)

Aspect

1
tiled_layer.focal(operation=gps.Operation.ASPECT, neighborhood=square)

各种各样的栅格操作

还有其他方法可以从栅格中提取信息并创建需要显示的栅格。这些是polygonal summaries, cost distance, and rasterization

Polygonal Summary Methods

除了lcoal和focal操作外,还可以在TiledRasterLayer上执行多边形摘要。这些是在与给定几何图形和层相交的区域中执行的操作。

注意: 重要的是,给定的几何图形在同一投影层。如果不是,那么要么返回错误的结果,要么只返回部分结果。

1
tiled_layer.layer_metadata

Polygonal Min
1
2
poly_min = box(0.0, 0.0, 1.0, 1.0)
tiled_layer.polygonal_min(geometry=poly_min, data_type=int)
Polygonal Max
1
2
poly_max = box(1.0, 0.0, 2.0, 2.5)
tiled_layer.polygonal_min(geometry=poly_max, data_type=int)
Polygonal Sum
1
2
poly_sum = box(0.0, 0.0, 1.0, 1.0)
tiled_layer.polygonal_min(geometry=poly_sum, data_type=int)
Polygonal Mean
1
2
poly_max = box(1.0, 0.0, 2.0, 2.0)
tiled_layer.polygonal_min(geometry=poly_max, data_type=int)

Cost Distance

cost_distance() 是一种迭代方法,用于逼近从栅格单元到给定几何图形的加权距离。cost_distance函数包含一个几何图形和一个“范围图层”,它本质上描述了遍历每个栅格单元的困难程度。落在几何形状中的单元格的最终成本为零,而包含noData值的范围单元格将对应于最终结果中的noData值。所有其他单元格都有一个值,该值描述从该单元格到几何图形的最小遍历成本。如果范围图层是均匀的,这个函数近似于欧几里得距离,取某个标量值的模。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cost_distance_cells = np.array([[[1.0, 1.0, 1.0, 1.0, 1.0],
[1.0, 1.0, 1.0, 1.0, 1.0],
[1.0, 1.0, 1.0, 1.0, 1.0],
[1.0, 1.0, 1.0, 1.0, 1.0],
[1.0, 1.0, 1.0, 1.0, 0.0]]])

tile = gps.Tile.from_numpy_array(numpy_array=cost_distance_cells, no_data_value=-1.0)
cost_distance_extent = gps.ProjectedExtent(extent=gps.Extent(xmin=0.0, ymin=0.0, xmax=5.0, ymax=5.0), epsg=4326)
cost_distance_layer = [(cost_distance_extent, tile)]

cost_distance_rdd = pysc.parallelize(cost_distance_layer)
cost_distance_raster_layer = gps.RasterLayer.from_numpy_rdd(gps.LayerType.SPATIAL, cost_distance_rdd)
cost_distance_tiled_layer = cost_distance_raster_layer.tile_to_layout(layout=gps.LocalLayout(tile_size=5))

gps.cost_distance(friction_layer=cost_distance_tiled_layer, geometries=[Point(0.0, 5.0)], max_distance=144000.0)

Rasterization

最好将矢量数据转换成栅格图层。因为我们提供了rasterize()函数,该函数确定每个向量元素所覆盖的像素值集,并将提供的值分配给目标栅格中的像素集。例如,如果有一组代表美国各县的多边形,以及每个县的收入中值,就可以制作一个栅格来表示这些数据。

GeoPySpark的rasterize功能可以采取[shapely.geometry], (shapely.geometry),或PythonRDD[shaply .geometry]。这些几何图形将转换为栅格,然后平铺到给定的布局,然后返回一个包含这些值的TiledRasterLayer

Rasterize MultiPolygons

1
2
3
4
5
raster_poly_1 = box(0.0, 0.0, 5.0, 10.0)
raster_poly_2 = box(3.0, 6.0, 15.0, 20.0)
raster_poly_3 = box(13.5, 17.0, 30.0, 20.0)

raster_multi_poly = MultiPolygon([raster_poly_1, raster_poly_2, raster_poly_3])
1
2
# Creates a TiledRasterLayer with a CRS of EPSG:4326 at zoom level 5.
gps.rasterize(geoms=[raster_multi_poly], crs=4326, zoom=5, fill_value=1)

Rasterize a PythonRDD of Polygons

1
2
3
4
poly_rdd = pysc.parallelize([raster_poly_1, raster_poly_2, raster_poly_3])

# Creates a TiledRasterLayer with a CRS of EPSG:3857 at zoom level 5.
gps.rasterize(geoms=poly_rdd, crs=3857, zoom=3, fill_value=10)

Rasterize LineStrings

1
2
3
line_1 = LineString(((0.0, 0.0), (0.0, 5.0)))
line_2 = LineString(((7.0, 5.0), (9.0, 12.0), (12.5, 15.0)))
line_3 = LineString(((12.0, 13.0), (14.5, 20.0)))
1
2
# Creates a TiledRasterLayer whose cells have a data type of int16.
gps.rasterize(geoms=[line_1, line_2, line_3], crs=4326, zoom=3, fill_value=2, cell_type=gps.CellType.INT16)

Rasterize Polygons and LineStrings

1
2
# Creates a TiledRasterLayer from both LineStrings and MultiPolygons
gps.rasterize(geoms=[line_1, line_2, line_3, raster_multi_poly], crs=4326, zoom=5, fill_value=2)