从 shapefile 加载 ROI

本示例展示了如何将 shapefile ( *.shp ) 作为 easyidp.ROI 对象打开。

荷花池

包和数据准备

使用以下代码加载 easyidp 包和示例数据集

[1]:
import easyidp as idp

test_data = idp.data.TestData()

如果您是第一次运行,它将自动从 Google Drive 下载大约 400MB 的数据集,请参考 Data 了解更多详情。

这里是本文件中使用的三个示例 shapefiles:

  • test_data.shp.lotus_shp

  • test_data.shp.utm53n_shp

  • test_data.shp.complex_shp

每个变量提供 *.shp 文件的路径:

[2]:
test_data.shp.lotus_shp
[2]:
PosixPath('/Users/hwang/Library/Application Support/easyidp.data/data_for_tests/shp_test/lotus_plots.shp')

处理 ESPG:4326(经度、纬度)

lotus_shp 使用 EPGS:4326 作为地理投影坐标

[3]:
lotus_roi = idp.ROI(test_data.shp.lotus_shp, name_field=0)
[shp][proj] Use projection [WGS 84] for loaded shapefile [lotus_plots.shp]
Read shapefile [lotus_plots.shp]: 100%|██████████| 112/112 [00:00<00:00, 1740.90it/s]

使用此方法检查该 shapefile 的 CRS(地理投影坐标):

[4]:
lotus_roi.crs
[4]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

我们可以通过以下方式检查地块多边形边界值:

[5]:
lotus_roi
[5]:
<easyidp.ROI> with 112 items
[0]     N1W1
array([[139.54052962,  35.73475194],
       [139.54055106,  35.73475596],
       [139.54055592,  35.73473843],
       [139.54053438,  35.73473446],
       [139.54052962,  35.73475194]])
[1]     N1W2
array([[139.54053488,  35.73473289],
       [139.54055632,  35.73473691],
       [139.54056118,  35.73471937],
       [139.54053963,  35.73471541],
       [139.54053488,  35.73473289]])
...
[110]   S4E6
array([[139.54090456,  35.73453742],
       [139.540926  ,  35.73454144],
       [139.54093086,  35.7345239 ],
       [139.54090932,  35.73451994],
       [139.54090456,  35.73453742]])
[111]   S4E7
array([[139.54090986,  35.73451856],
       [139.54093129,  35.73452258],
       [139.54093616,  35.73450504],
       [139.54091461,  35.73450107],
       [139.54090986,  35.73451856]])

注意

在 easyidp 包中,ROI 的顺序是 (经度,纬度), 而对于一些其他包如 pyproj, shapely, 可能使用 (纬度,经度) 顺序,在包之间转换时请注意这一点

处理 UTM/Zone 地理坐标

utm53_shp 使用另一种地理坐标,而不是 (经度, 纬度)

[6]:
utm_roi = idp.ROI(test_data.shp.utm53n_shp, name_field=0)
[shp][proj] Use projection [WGS 84 / UTM zone 53N] for loaded shapefile [lon_lat_utm53n.shp]
Read shapefile [lon_lat_utm53n.shp]: 100%|██████████| 120/120 [00:00<00:00, 2506.63it/s]

然后我们可以检查此文件的 CRS(地理坐标):

[7]:
utm_roi.crs
[7]:
<Derived Projected CRS: EPSG:32653>
Name: WGS 84 / UTM zone 53N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: UTM zone 53N
- method: Transverse Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

以及地块多边形坐标值:

[8]:
utm_roi
[8]:
<easyidp.ROI> with 120 items
[0]     1_02
array([[ 484581.96481965, 3862282.54659697],
       [ 484581.71899749, 3862283.05047413],
       [ 484581.92015662, 3862283.14994052],
       [ 484582.17518015, 3862282.68486107],
       [ 484581.96481965, 3862282.54659697]])
[1]     1_03
array([[ 484583.12131417, 3862280.30678878],
       [ 484582.87549189, 3862280.81066591],
       [ 484583.07665105, 3862280.91013233],
       [ 484583.3316747 , 3862280.44505291],
       [ 484583.12131417, 3862280.30678878]])
...
[118]   6_35
array([[ 484589.18991156, 3862259.98283581],
       [ 484588.50691422, 3862261.2903283 ],
       [ 484588.97321497, 3862261.50689677],
       [ 484589.6197077 , 3862260.22275374],
       [ 484589.18991156, 3862259.98283581]])
[119]   6_36
array([[ 484590.32812571, 3862257.73529679],
       [ 484589.64512804, 3862259.04278919],
       [ 484590.11143074, 3862259.26046666],
       [ 484590.75792192, 3862257.97521477],
       [ 484590.32812571, 3862257.73529679]])

在这种坐标下,单位是米,X(第一列)是东西(水平)方向,而 Y(第二列)是南北(垂直)方向。轴顺序与 (经度,纬度) 相同

在 CRS 之间转换

在某些情况下,例如,不是同一个人准备 DOM 和 shapefile,它们不共享相同的坐标并且多边形边界值非常不同。因此它们不能直接放在一起,需要转换。

注意

虽然 EasyIDP 支持不同 CRS 之间的转换,但如果 roi 数量庞大,可能会有精度损失并需要一些计算时间。

建议在准备 shp 和 DOM/DSM/PCD 时确保它们共享相同的 CRS。

例如,在莲花案例中,它使用 EPSG:4326,而 DOM/DSM 使用 UTM/Zone。

[9]:
lotus_roi.crs
[9]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
[10]:
dsm = idp.GeoTiff(test_data.metashape.lotus_dsm)
dsm.crs
[10]:
<Derived Projected CRS: EPSG:32654>
Name: WGS 84 / UTM zone 54N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 138°E and 144°E, northern hemisphere between equator and 84°N, onshore and offshore. Japan. Russian Federation.
- bounds: (138.0, 0.0, 144.0, 84.0)
Coordinate Operation:
- name: UTM zone 54N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

因此需要进行 CRS 转换以匹配它们。更推荐转移 ROI,因为它只是坐标数字,比 GeoTiff 的像素矩阵更容易转换。

在转换之前,地块坐标看起来像这样:

[11]:
lotus_roi
[11]:
<easyidp.ROI> with 112 items
[0]     N1W1
array([[139.54052962,  35.73475194],
       [139.54055106,  35.73475596],
       [139.54055592,  35.73473843],
       [139.54053438,  35.73473446],
       [139.54052962,  35.73475194]])
[1]     N1W2
array([[139.54053488,  35.73473289],
       [139.54055632,  35.73473691],
       [139.54056118,  35.73471937],
       [139.54053963,  35.73471541],
       [139.54053488,  35.73473289]])
...
[110]   S4E6
array([[139.54090456,  35.73453742],
       [139.540926  ,  35.73454144],
       [139.54093086,  35.7345239 ],
       [139.54090932,  35.73451994],
       [139.54090456,  35.73453742]])
[111]   S4E7
array([[139.54090986,  35.73451856],
       [139.54093129,  35.73452258],
       [139.54093616,  35.73450504],
       [139.54091461,  35.73450107],
       [139.54090986,  35.73451856]])

并应用转换:

[12]:
lotus_roi.change_crs(dsm.crs)

现在 ROI 的 CRS 已经改变:

[13]:
lotus_roi.crs
[13]:
<Derived Projected CRS: EPSG:32654>
Name: WGS 84 / UTM zone 54N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 138°E and 144°E, northern hemisphere between equator and 84°N, onshore and offshore. Japan. Russian Federation.
- bounds: (138.0, 0.0, 144.0, 84.0)
Coordinate Operation:
- name: UTM zone 54N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

以及坐标值:

[14]:
lotus_roi
[14]:
<easyidp.ROI> with 112 items
[0]     N1W1
array([[ 368017.7565143 , 3955511.08102276],
       [ 368019.70190232, 3955511.49811902],
       [ 368020.11263046, 3955509.54636219],
       [ 368018.15769062, 3955509.13563382],
       [ 368017.7565143 , 3955511.08102276]])
[1]     N1W2
array([[ 368018.20042946, 3955508.96051697],
       [ 368020.14581791, 3955509.37761334],
       [ 368020.55654627, 3955507.42585654],
       [ 368018.601606  , 3955507.01512806],
       [ 368018.20042946, 3955508.96051697]])
...
[110]   S4E6
array([[ 368051.31139629, 3955486.78103425],
       [ 368053.25678767, 3955487.19813795],
       [ 368053.66752456, 3955485.24638299],
       [ 368051.71258131, 3955484.83564713],
       [ 368051.31139629, 3955486.78103425]])
[111]   S4E7
array([[ 368051.75902187, 3955484.68169527],
       [ 368053.70441367, 3955485.09879908],
       [ 368054.11515079, 3955483.14704415],
       [ 368052.16020711, 3955482.73630818],
       [ 368051.75902187, 3955484.68169527]])

更改 ROI 标签

您必须注意在打开 shapefile 时的 name_field

lotus_roi = idp.ROI(test_data.shp.lotus_shp, name_field=0)

在本节中,将介绍有关此参数和其他控制的更多详细信息。

对于某些 shapefile,它不是用 utf-8 编码的,默认加载可能会失败:

>>> idp.shp.show_shp_fields(test_data.shp.complex_shp)
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/Users/hwang/OneDrive/Program/GitHub/EasyIDP/docs/jupyter/load_roi.ipynb Cell 35 in <cell line: 1>()
----> 1 idp.shp.show_shp_fields(test_data.shp.complex_shp)

File ~/OneDrive/Program/GitHub/EasyIDP/easyidp/shp.py:114, in show_shp_fields(shp_path, encoding)
    111 head = ["[-1]"] + [f"[{v}] {k}" for k, v in shp_fields.items()]
    112 data = []
--> 114 row_num = len(shp.records())
    115 col_num = len(shp.records()[0])
    117 col_align = ["right"] + ["center"] * col_num

File ~/anaconda/miniconda3/envs/easyidp/lib/python3.8/site-packages/shapefile.py:1306, in Reader.records(self)
   1304 f.seek(self.__dbfHdrLength)
   1305 for i in range(self.numRecords):
-> 1306     r = self.__record(oid=i)
   1307     if r:
   1308         records.append(r)

File ~/anaconda/miniconda3/envs/easyidp/lib/python3.8/site-packages/shapefile.py:1281, in Reader.__record(self, oid)
   1278             value = None # unknown value is set to missing
   1279 else:
   1280     # anything else is forced to string/unicode
-> 1281     value = u(value, self.encoding, self.encodingErrors)
   1282     value = value.strip()
...
    105     elif isinstance(v, str):
    106         # Already str.
    107         return v

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 2: invalid continuation byte
[15]:
idp.shp.show_shp_fields(test_data.shp.complex_shp, encoding='gbk')
  [-1]            [0] ID                [1] MASSIFID       [2] CROPTYPE    [3] CROPDATE    [4] CROPAREA    [5] ATTID
------  ---------------------------  -------------------  --------------  --------------  --------------  -----------
     0  230104112201809010000000000  2301041120000000000       小麦         2018-09-01     61525.26302
     1  230104112201809010000000012  2301041120000000012       蔬菜         2018-09-01      2802.33512
     2  230104112201809010000000014  2301041120000000014       玉米         2018-09-01      6960.7745
   ...              ...                      ...               ...             ...             ...            ...
   320  230104112201809010000000583  2301041120000000583       大豆         2018-09-01      380.41704
   321  230104112201809010000000584  2301041120000000584       其它         2018-09-01      9133.25998
   322  230104112201809010000000585  2301041120000000585       其它         2018-09-01      1704.27193

默认情况下,如果未给出 name_field,将使用第一个编号 id 作为标签

[16]:
complex_roi = idp.ROI(test_data.shp.complex_shp, encoding='gbk')
/Users/hwang/OneDrive/Program/GitHub/EasyIDP/easyidp/shp.py:399: UserWarning: Not specifying parameter 'name_field', will using the row id (from 0 to end) as the index for each polygon.Please using idp.shp.show_shp_field(shp_path) to display the full available indexs
  warnings.warn(
[shp][proj] Use projection [WGS 84] for loaded shapefile [complex_shp_review.shp]
Read shapefile [complex_shp_review.shp]: 100%|██████████| 323/323 [00:00<00:00, 64383.62it/s]
[17]:
complex_roi
[17]:
<easyidp.ROI> with 323 items
[0]     0
array([[126.84383445,  45.83319255],
       [126.84212197,  45.83222256],
       [126.84142718,  45.83186291],
       ...,
       [126.84373784,  45.83328959],
       [126.84381378,  45.83321205],
       [126.84383445,  45.83319255]])
[1]     1
array([[126.85042747,  45.84588275],
       [126.85042684,  45.84570483],
       [126.8504088 ,  45.84570504],
       ...,
       [126.85006453,  45.84588545],
       [126.85033812,  45.84588368],
       [126.85042747,  45.84588275]])
...
[321]   321
array([[126.83385574,  45.84337042],
       [126.83385363,  45.84331501],
       [126.83384205,  45.8432624 ],
       ...,
       [126.83353615,  45.84359357],
       [126.83370583,  45.84350153],
       [126.83385574,  45.84337042]])
[322]   322
array([[126.83056335,  45.84335627],
       [126.83063002,  45.84332396],
       [126.83067953,  45.84328632],
       ...,
       [126.83025148,  45.84341107],
       [126.83039799,  45.84340243],
       [126.83056335,  45.84335627]])

We can specify the ‘MASSIFID’ as the label instead

[18]:
complex_roi = idp.ROI(test_data.shp.complex_shp, name_field='MASSIFID', encoding='gbk')
[shp][proj] Use projection [WGS 84] for loaded shapefile [complex_shp_review.shp]
Read shapefile [complex_shp_review.shp]: 100%|██████████| 323/323 [00:00<00:00, 333.24it/s]

或者通过列 id 达到相同效果:

complex_roi = idp.ROI(test_data.shp.complex_shp, name_field=1, encoding='gbk')

现在标签已经改变:

[19]:
complex_roi
[19]:
<easyidp.ROI> with 323 items
[0]     2301041120000000000
array([[126.84383445,  45.83319255],
       [126.84212197,  45.83222256],
       [126.84142718,  45.83186291],
       ...,
       [126.84373784,  45.83328959],
       [126.84381378,  45.83321205],
       [126.84383445,  45.83319255]])
[1]     2301041120000000012
array([[126.85042747,  45.84588275],
       [126.85042684,  45.84570483],
       [126.8504088 ,  45.84570504],
       ...,
       [126.85006453,  45.84588545],
       [126.85033812,  45.84588368],
       [126.85042747,  45.84588275]])
...
[321]   2301041120000000584
array([[126.83385574,  45.84337042],
       [126.83385363,  45.84331501],
       [126.83384205,  45.8432624 ],
       ...,
       [126.83353615,  45.84359357],
       [126.83370583,  45.84350153],
       [126.83385574,  45.84337042]])
[322]   2301041120000000585
array([[126.83056335,  45.84335627],
       [126.83063002,  45.84332396],
       [126.83067953,  45.84328632],
       ...,
       [126.83025148,  45.84341107],
       [126.83039799,  45.84340243],
       [126.83056335,  45.84335627]])

或者您可以通过提供列表 name_field=['CROPTYPE', 'MASSIFID'] 将几列组合在一起

[20]:
complex_roi = idp.ROI(test_data.shp.complex_shp, name_field=['CROPTYPE', 'MASSIFID'], encoding='gbk')
[shp][proj] Use projection [WGS 84] for loaded shapefile [complex_shp_review.shp]
Read shapefile [complex_shp_review.shp]: 100%|██████████| 323/323 [00:01<00:00, 163.54it/s]

或者通过列 id 列表 name_field=[2, 1] 达到相同效果

complex_roi = idp.ROI(test_data.shp.complex_shp, name_field=[2, 1], encoding='gbk')

现在标签已经改变:

[21]:
complex_roi
[21]:
<easyidp.ROI> with 323 items
[0]     小麦_2301041120000000000
array([[126.84383445,  45.83319255],
       [126.84212197,  45.83222256],
       [126.84142718,  45.83186291],
       ...,
       [126.84373784,  45.83328959],
       [126.84381378,  45.83321205],
       [126.84383445,  45.83319255]])
[1]     蔬菜_2301041120000000012
array([[126.85042747,  45.84588275],
       [126.85042684,  45.84570483],
       [126.8504088 ,  45.84570504],
       ...,
       [126.85006453,  45.84588545],
       [126.85033812,  45.84588368],
       [126.85042747,  45.84588275]])
...
[321]   其它_2301041120000000584
array([[126.83385574,  45.84337042],
       [126.83385363,  45.84331501],
       [126.83384205,  45.8432624 ],
       ...,
       [126.83353615,  45.84359357],
       [126.83370583,  45.84350153],
       [126.83385574,  45.84337042]])
[322]   其它_2301041120000000585
array([[126.83056335,  45.84335627],
       [126.83063002,  45.84332396],
       [126.83067953,  45.84328632],
       ...,
       [126.83025148,  45.84341107],
       [126.83039799,  45.84340243],
       [126.83056335,  45.84335627]])

或者通过提供 include_title=True 将列标题也添加到其中

[22]:
complex_roi = idp.ROI(test_data.shp.complex_shp, name_field=['CROPTYPE', 'MASSIFID'], include_title=True, encoding='gbk')
[shp][proj] Use projection [WGS 84] for loaded shapefile [complex_shp_review.shp]
Read shapefile [complex_shp_review.shp]: 100%|██████████| 323/323 [00:01<00:00, 171.82it/s]
[23]:
complex_roi
[23]:
<easyidp.ROI> with 323 items
[0]     CROPTYPE_小麦_MASSIFID_2301041120000000000
array([[126.84383445,  45.83319255],
       [126.84212197,  45.83222256],
       [126.84142718,  45.83186291],
       ...,
       [126.84373784,  45.83328959],
       [126.84381378,  45.83321205],
       [126.84383445,  45.83319255]])
[1]     CROPTYPE_蔬菜_MASSIFID_2301041120000000012
array([[126.85042747,  45.84588275],
       [126.85042684,  45.84570483],
       [126.8504088 ,  45.84570504],
       ...,
       [126.85006453,  45.84588545],
       [126.85033812,  45.84588368],
       [126.85042747,  45.84588275]])
...
[321]   CROPTYPE_其它_MASSIFID_2301041120000000584
array([[126.83385574,  45.84337042],
       [126.83385363,  45.84331501],
       [126.83384205,  45.8432624 ],
       ...,
       [126.83353615,  45.84359357],
       [126.83370583,  45.84350153],
       [126.83385574,  45.84337042]])
[322]   CROPTYPE_其它_MASSIFID_2301041120000000585
array([[126.83056335,  45.84335627],
       [126.83063002,  45.84332396],
       [126.83067953,  45.84328632],
       ...,
       [126.83025148,  45.84341107],
       [126.83039799,  45.84340243],
       [126.83056335,  45.84335627]])

注意

只有具有唯一值的列应作为 ROI 标签使用,否则存在后续重复标签覆盖之前标签的风险。easyidp 可以处理这种情况并抛出错误。

>>> complex_roi = idp.ROI(test_data.shp.complex_shp, name_field='CROPTYPE',encoding='gbk')
[shp][proj] Use projection [WGS 84] for loaded shapefile [complex_shp_review.shp]
Read shapefile [complex_shp_review.shp]:   1%|          | 4/323 [00:00<00:01, 221.13it/s]
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/Users/hwang/OneDrive/Program/GitHub/EasyIDP/docs/jupyter/load_roi.ipynb Cell 51 in <cell line: 1>()
----> 1 complex_roi = idp.ROI(test_data.shp.complex_shp, name_field='CROPTYPE',encoding='gbk')

File ~/OneDrive/Program/GitHub/EasyIDP/easyidp/roi.py:77, in ROI.__init__(self, target_path, **kwargs)
     74 self.source = target_path
     76 if target_path is not None:
---> 77     self.open(target_path, **kwargs)

File ~/OneDrive/Program/GitHub/EasyIDP/easyidp/roi.py:136, in ROI.open(self, target_path, **kwargs)
    134 ext = os.path.splitext(target_path)[-1]
    135 if ext == ".shp":
--> 136     self.read_shp(target_path, **kwargs)
    137 elif ext == ".json":
    138     self.read_labelme_json(target_path)

File ~/OneDrive/Program/GitHub/EasyIDP/easyidp/roi.py:215, in ROI.read_shp(self, shp_path, shp_proj, name_field, include_title, encoding)
    142 """read ROI from shp file
    143
    144 Parameters
   (...)
    211
    212 """
    213 # if geotiff_proj is not None and shp_proj is not None and shp_proj.name != geotiff_proj.name:
...
--> 323         raise KeyError(f"Meet with duplicated key [{plot_name}] for current shapefile, please specify another `name_field` from {shp_fields} or simple leave it blank `name_field=None`")
    325     shp_dict[plot_name] = coord_np
    327 if return_proj:

KeyError: "Meet with duplicated key [玉米] for current shapefile, please specify another `name_field` from {'ID': 0, 'MASSIFID': 1, 'CROPTYPE': 2, 'CROPDATE': 3, 'CROPAREA': 4, 'ATTID': 5} or simple leave it blank `name_field=None`"