Load ROI from shapefile#

This example shows how to open the shapefile ( *.shp ) as the easyidp.ROI objects.

lotus plot

Package and data prepare#

Using the following code to load easyidp package and demo dataset

[1]:
import easyidp as idp

test_data = idp.data.TestData()

If you run for the first time, it will download around 400MB dataset automatically from Google Drive, please refer to Data for more details.

Here are thee demo shapefiles used in this documents are:

  • test_data.shp.lotus_shp

  • test_data.shp.utm53n_shp

  • test_data.shp.complex_shp

Each variable provides the path to the *.shp file:

[2]:
test_data.shp.lotus_shp
[2]:
PosixPath('/Users/hwang/Library/Application Support/easyidp.data/data_for_tests/shp_test/lotus_plots.shp')

Deal with ESPG:4326 (longitude, latitude)#

The lotus_shp used the EPGS:4326 as the Geo-projection coordinates

[3]:
lotus_roi = idp.ROI(test_data.shp.lotus_shp, name_field=0)
[shp][proj] Use projection [WGS 84] for loaded shapefile [lotus_plots.shp]
Read shapefile [lotus_plots.shp]: 100%|██████████| 112/112 [00:00<00:00, 1740.90it/s]

Using this method to check the CRS (geo-projection coordinate) for that shapefile:

[4]:
lotus_roi.crs
[4]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

And we can check the plot polygon boundary values by:

[5]:
lotus_roi
[5]:
<easyidp.ROI> with 112 items
[0]     N1W1
array([[139.54052962,  35.73475194],
       [139.54055106,  35.73475596],
       [139.54055592,  35.73473843],
       [139.54053438,  35.73473446],
       [139.54052962,  35.73475194]])
[1]     N1W2
array([[139.54053488,  35.73473289],
       [139.54055632,  35.73473691],
       [139.54056118,  35.73471937],
       [139.54053963,  35.73471541],
       [139.54053488,  35.73473289]])
...
[110]   S4E6
array([[139.54090456,  35.73453742],
       [139.540926  ,  35.73454144],
       [139.54093086,  35.7345239 ],
       [139.54090932,  35.73451994],
       [139.54090456,  35.73453742]])
[111]   S4E7
array([[139.54090986,  35.73451856],
       [139.54093129,  35.73452258],
       [139.54093616,  35.73450504],
       [139.54091461,  35.73450107],
       [139.54090986,  35.73451856]])

Caution

In the easyidp package, the ROI order is (longitude, latitude), while for some other packages like pyproj, shapely, may using the (latitude, longitude) order, please pay attention to it when transfering between packages

Deal with UTM/Zone geo coordainte#

The utm53_shp using another geo coordinate other than (longitude, latitude)

[6]:
utm_roi = idp.ROI(test_data.shp.utm53n_shp, name_field=0)
[shp][proj] Use projection [WGS 84 / UTM zone 53N] for loaded shapefile [lon_lat_utm53n.shp]
Read shapefile [lon_lat_utm53n.shp]: 100%|██████████| 120/120 [00:00<00:00, 2506.63it/s]

Then we can check the CRS (geo-coordinate) of this file:

[7]:
utm_roi.crs
[7]:
<Derived Projected CRS: EPSG:32653>
Name: WGS 84 / UTM zone 53N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: UTM zone 53N
- method: Transverse Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

And the plot polygon coordinate values:

[8]:
utm_roi
[8]:
<easyidp.ROI> with 120 items
[0]     1_02
array([[ 484581.96481965, 3862282.54659697],
       [ 484581.71899749, 3862283.05047413],
       [ 484581.92015662, 3862283.14994052],
       [ 484582.17518015, 3862282.68486107],
       [ 484581.96481965, 3862282.54659697]])
[1]     1_03
array([[ 484583.12131417, 3862280.30678878],
       [ 484582.87549189, 3862280.81066591],
       [ 484583.07665105, 3862280.91013233],
       [ 484583.3316747 , 3862280.44505291],
       [ 484583.12131417, 3862280.30678878]])
...
[118]   6_35
array([[ 484589.18991156, 3862259.98283581],
       [ 484588.50691422, 3862261.2903283 ],
       [ 484588.97321497, 3862261.50689677],
       [ 484589.6197077 , 3862260.22275374],
       [ 484589.18991156, 3862259.98283581]])
[119]   6_36
array([[ 484590.32812571, 3862257.73529679],
       [ 484589.64512804, 3862259.04278919],
       [ 484590.11143074, 3862259.26046666],
       [ 484590.75792192, 3862257.97521477],
       [ 484590.32812571, 3862257.73529679]])

Under this coordinate, the unit is meter, and the X (the first column) is the East-West (horizontal) direction, while the Y (the second column) is the North-Sourth (vertical) direction. The axis order is the same with (longitude, latitude)

Transform between CRS#

In some cases, for example, not the same person who prepare the DOM and shapefile, they do not share the same coordinate and have very different polygon boundary values. So they can not put together directly and need convertion.

Caution

Although the EasyIDP support the transformation between different CRS, it may have precision loss and require some computation time if the roi number is huge.

It is recommended to ensure the shp and DOM/DSM/PCD share the same CRS when preparing them.

For example, in the lotus case, it uses the EPSG:4326, while the DOM/DSM uses the UTM/Zone.

[9]:
lotus_roi.crs
[9]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
[10]:
dsm = idp.GeoTiff(test_data.metashape.lotus_dsm)
dsm.crs
[10]:
<Derived Projected CRS: EPSG:32654>
Name: WGS 84 / UTM zone 54N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 138°E and 144°E, northern hemisphere between equator and 84°N, onshore and offshore. Japan. Russian Federation.
- bounds: (138.0, 0.0, 144.0, 84.0)
Coordinate Operation:
- name: UTM zone 54N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Hence need to do the CRS transformation to match them. It is more recommended to transfer ROI because it is only the coordinate numbers, much easier to transfer than the GeoTiff are pixel matrix.

Before transfer, the plot coordinate looks like this:

[11]:
lotus_roi
[11]:
<easyidp.ROI> with 112 items
[0]     N1W1
array([[139.54052962,  35.73475194],
       [139.54055106,  35.73475596],
       [139.54055592,  35.73473843],
       [139.54053438,  35.73473446],
       [139.54052962,  35.73475194]])
[1]     N1W2
array([[139.54053488,  35.73473289],
       [139.54055632,  35.73473691],
       [139.54056118,  35.73471937],
       [139.54053963,  35.73471541],
       [139.54053488,  35.73473289]])
...
[110]   S4E6
array([[139.54090456,  35.73453742],
       [139.540926  ,  35.73454144],
       [139.54093086,  35.7345239 ],
       [139.54090932,  35.73451994],
       [139.54090456,  35.73453742]])
[111]   S4E7
array([[139.54090986,  35.73451856],
       [139.54093129,  35.73452258],
       [139.54093616,  35.73450504],
       [139.54091461,  35.73450107],
       [139.54090986,  35.73451856]])

And apply the transfer:

[12]:
lotus_roi.change_crs(dsm.crs)

Now the CRS of ROI has been changed:

[13]:
lotus_roi.crs
[13]:
<Derived Projected CRS: EPSG:32654>
Name: WGS 84 / UTM zone 54N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 138°E and 144°E, northern hemisphere between equator and 84°N, onshore and offshore. Japan. Russian Federation.
- bounds: (138.0, 0.0, 144.0, 84.0)
Coordinate Operation:
- name: UTM zone 54N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

And also the coordinate values:

[14]:
lotus_roi
[14]:
<easyidp.ROI> with 112 items
[0]     N1W1
array([[ 368017.7565143 , 3955511.08102276],
       [ 368019.70190232, 3955511.49811902],
       [ 368020.11263046, 3955509.54636219],
       [ 368018.15769062, 3955509.13563382],
       [ 368017.7565143 , 3955511.08102276]])
[1]     N1W2
array([[ 368018.20042946, 3955508.96051697],
       [ 368020.14581791, 3955509.37761334],
       [ 368020.55654627, 3955507.42585654],
       [ 368018.601606  , 3955507.01512806],
       [ 368018.20042946, 3955508.96051697]])
...
[110]   S4E6
array([[ 368051.31139629, 3955486.78103425],
       [ 368053.25678767, 3955487.19813795],
       [ 368053.66752456, 3955485.24638299],
       [ 368051.71258131, 3955484.83564713],
       [ 368051.31139629, 3955486.78103425]])
[111]   S4E7
array([[ 368051.75902187, 3955484.68169527],
       [ 368053.70441367, 3955485.09879908],
       [ 368054.11515079, 3955483.14704415],
       [ 368052.16020711, 3955482.73630818],
       [ 368051.75902187, 3955484.68169527]])

Change the ROI labels#

You have must notice the name_field value when opening the shapefile

lotus_roi = idp.ROI(test_data.shp.lotus_shp, name_field=0)

In this section, more details about this parameter and other controls will be introduced.

For some shapefile, it is not encoded in utf-8, and the default loading may fail:

>>> idp.shp.show_shp_fields(test_data.shp.complex_shp)
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/Users/hwang/OneDrive/Program/GitHub/EasyIDP/docs/jupyter/load_roi.ipynb Cell 35 in <cell line: 1>()
----> 1 idp.shp.show_shp_fields(test_data.shp.complex_shp)

File ~/OneDrive/Program/GitHub/EasyIDP/easyidp/shp.py:114, in show_shp_fields(shp_path, encoding)
    111 head = ["[-1]"] + [f"[{v}] {k}" for k, v in shp_fields.items()]
    112 data = []
--> 114 row_num = len(shp.records())
    115 col_num = len(shp.records()[0])
    117 col_align = ["right"] + ["center"] * col_num

File ~/anaconda/miniconda3/envs/easyidp/lib/python3.8/site-packages/shapefile.py:1306, in Reader.records(self)
   1304 f.seek(self.__dbfHdrLength)
   1305 for i in range(self.numRecords):
-> 1306     r = self.__record(oid=i)
   1307     if r:
   1308         records.append(r)

File ~/anaconda/miniconda3/envs/easyidp/lib/python3.8/site-packages/shapefile.py:1281, in Reader.__record(self, oid)
   1278             value = None # unknown value is set to missing
   1279 else:
   1280     # anything else is forced to string/unicode
-> 1281     value = u(value, self.encoding, self.encodingErrors)
   1282     value = value.strip()
...
    105     elif isinstance(v, str):
    106         # Already str.
    107         return v

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 2: invalid continuation byte
[15]:
idp.shp.show_shp_fields(test_data.shp.complex_shp, encoding='gbk')
  [-1]            [0] ID                [1] MASSIFID       [2] CROPTYPE    [3] CROPDATE    [4] CROPAREA    [5] ATTID
------  ---------------------------  -------------------  --------------  --------------  --------------  -----------
     0  230104112201809010000000000  2301041120000000000       小麦         2018-09-01     61525.26302
     1  230104112201809010000000012  2301041120000000012       蔬菜         2018-09-01      2802.33512
     2  230104112201809010000000014  2301041120000000014       玉米         2018-09-01      6960.7745
   ...              ...                      ...               ...             ...             ...            ...
   320  230104112201809010000000583  2301041120000000583       大豆         2018-09-01      380.41704
   321  230104112201809010000000584  2301041120000000584       其它         2018-09-01      9133.25998
   322  230104112201809010000000585  2301041120000000585       其它         2018-09-01      1704.27193

By default, if name_field not given, it will using the first number id as label

[16]:
complex_roi = idp.ROI(test_data.shp.complex_shp, encoding='gbk')
/Users/hwang/OneDrive/Program/GitHub/EasyIDP/easyidp/shp.py:399: UserWarning: Not specifying parameter 'name_field', will using the row id (from 0 to end) as the index for each polygon.Please using idp.shp.show_shp_field(shp_path) to display the full available indexs
  warnings.warn(
[shp][proj] Use projection [WGS 84] for loaded shapefile [complex_shp_review.shp]
Read shapefile [complex_shp_review.shp]: 100%|██████████| 323/323 [00:00<00:00, 64383.62it/s]
[17]:
complex_roi
[17]:
<easyidp.ROI> with 323 items
[0]     0
array([[126.84383445,  45.83319255],
       [126.84212197,  45.83222256],
       [126.84142718,  45.83186291],
       ...,
       [126.84373784,  45.83328959],
       [126.84381378,  45.83321205],
       [126.84383445,  45.83319255]])
[1]     1
array([[126.85042747,  45.84588275],
       [126.85042684,  45.84570483],
       [126.8504088 ,  45.84570504],
       ...,
       [126.85006453,  45.84588545],
       [126.85033812,  45.84588368],
       [126.85042747,  45.84588275]])
...
[321]   321
array([[126.83385574,  45.84337042],
       [126.83385363,  45.84331501],
       [126.83384205,  45.8432624 ],
       ...,
       [126.83353615,  45.84359357],
       [126.83370583,  45.84350153],
       [126.83385574,  45.84337042]])
[322]   322
array([[126.83056335,  45.84335627],
       [126.83063002,  45.84332396],
       [126.83067953,  45.84328632],
       ...,
       [126.83025148,  45.84341107],
       [126.83039799,  45.84340243],
       [126.83056335,  45.84335627]])

We can specify the ‘MASSIFID’ as the label instead

[18]:
complex_roi = idp.ROI(test_data.shp.complex_shp, name_field='MASSIFID', encoding='gbk')
[shp][proj] Use projection [WGS 84] for loaded shapefile [complex_shp_review.shp]
Read shapefile [complex_shp_review.shp]: 100%|██████████| 323/323 [00:00<00:00, 333.24it/s]

Or the equal effects by colume id:

complex_roi = idp.ROI(test_data.shp.complex_shp, name_field=1, encoding='gbk')

Now the label has been changed:

[19]:
complex_roi
[19]:
<easyidp.ROI> with 323 items
[0]     2301041120000000000
array([[126.84383445,  45.83319255],
       [126.84212197,  45.83222256],
       [126.84142718,  45.83186291],
       ...,
       [126.84373784,  45.83328959],
       [126.84381378,  45.83321205],
       [126.84383445,  45.83319255]])
[1]     2301041120000000012
array([[126.85042747,  45.84588275],
       [126.85042684,  45.84570483],
       [126.8504088 ,  45.84570504],
       ...,
       [126.85006453,  45.84588545],
       [126.85033812,  45.84588368],
       [126.85042747,  45.84588275]])
...
[321]   2301041120000000584
array([[126.83385574,  45.84337042],
       [126.83385363,  45.84331501],
       [126.83384205,  45.8432624 ],
       ...,
       [126.83353615,  45.84359357],
       [126.83370583,  45.84350153],
       [126.83385574,  45.84337042]])
[322]   2301041120000000585
array([[126.83056335,  45.84335627],
       [126.83063002,  45.84332396],
       [126.83067953,  45.84328632],
       ...,
       [126.83025148,  45.84341107],
       [126.83039799,  45.84340243],
       [126.83056335,  45.84335627]])

Or you can combine several columns together by giving a list name_field=['CROPTYPE', 'MASSIFID']

[20]:
complex_roi = idp.ROI(test_data.shp.complex_shp, name_field=['CROPTYPE', 'MASSIFID'], encoding='gbk')
[shp][proj] Use projection [WGS 84] for loaded shapefile [complex_shp_review.shp]
Read shapefile [complex_shp_review.shp]: 100%|██████████| 323/323 [00:01<00:00, 163.54it/s]

Or the equal effects by a list of colume id name_field=[2, 1]

complex_roi = idp.ROI(test_data.shp.complex_shp, name_field=[2, 1], encoding='gbk')

Now the label has been changed:

[21]:
complex_roi
[21]:
<easyidp.ROI> with 323 items
[0]     小麦_2301041120000000000
array([[126.84383445,  45.83319255],
       [126.84212197,  45.83222256],
       [126.84142718,  45.83186291],
       ...,
       [126.84373784,  45.83328959],
       [126.84381378,  45.83321205],
       [126.84383445,  45.83319255]])
[1]     蔬菜_2301041120000000012
array([[126.85042747,  45.84588275],
       [126.85042684,  45.84570483],
       [126.8504088 ,  45.84570504],
       ...,
       [126.85006453,  45.84588545],
       [126.85033812,  45.84588368],
       [126.85042747,  45.84588275]])
...
[321]   其它_2301041120000000584
array([[126.83385574,  45.84337042],
       [126.83385363,  45.84331501],
       [126.83384205,  45.8432624 ],
       ...,
       [126.83353615,  45.84359357],
       [126.83370583,  45.84350153],
       [126.83385574,  45.84337042]])
[322]   其它_2301041120000000585
array([[126.83056335,  45.84335627],
       [126.83063002,  45.84332396],
       [126.83067953,  45.84328632],
       ...,
       [126.83025148,  45.84341107],
       [126.83039799,  45.84340243],
       [126.83056335,  45.84335627]])

Or even add the colume title into it by giving include_title=True

[22]:
complex_roi = idp.ROI(test_data.shp.complex_shp, name_field=['CROPTYPE', 'MASSIFID'], include_title=True, encoding='gbk')
[shp][proj] Use projection [WGS 84] for loaded shapefile [complex_shp_review.shp]
Read shapefile [complex_shp_review.shp]: 100%|██████████| 323/323 [00:01<00:00, 171.82it/s]
[23]:
complex_roi
[23]:
<easyidp.ROI> with 323 items
[0]     CROPTYPE_小麦_MASSIFID_2301041120000000000
array([[126.84383445,  45.83319255],
       [126.84212197,  45.83222256],
       [126.84142718,  45.83186291],
       ...,
       [126.84373784,  45.83328959],
       [126.84381378,  45.83321205],
       [126.84383445,  45.83319255]])
[1]     CROPTYPE_蔬菜_MASSIFID_2301041120000000012
array([[126.85042747,  45.84588275],
       [126.85042684,  45.84570483],
       [126.8504088 ,  45.84570504],
       ...,
       [126.85006453,  45.84588545],
       [126.85033812,  45.84588368],
       [126.85042747,  45.84588275]])
...
[321]   CROPTYPE_其它_MASSIFID_2301041120000000584
array([[126.83385574,  45.84337042],
       [126.83385363,  45.84331501],
       [126.83384205,  45.8432624 ],
       ...,
       [126.83353615,  45.84359357],
       [126.83370583,  45.84350153],
       [126.83385574,  45.84337042]])
[322]   CROPTYPE_其它_MASSIFID_2301041120000000585
array([[126.83056335,  45.84335627],
       [126.83063002,  45.84332396],
       [126.83067953,  45.84328632],
       ...,
       [126.83025148,  45.84341107],
       [126.83039799,  45.84340243],
       [126.83056335,  45.84335627]])

Caution

Only the colume with unique values should be used as the ROI label, otherwise has the risk that later duplicated label overwrites previous label. The easyidp could handle such case and raise an Error.

>>> complex_roi = idp.ROI(test_data.shp.complex_shp, name_field='CROPTYPE',encoding='gbk')
[shp][proj] Use projection [WGS 84] for loaded shapefile [complex_shp_review.shp]
Read shapefile [complex_shp_review.shp]:   1%|          | 4/323 [00:00<00:01, 221.13it/s]
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/Users/hwang/OneDrive/Program/GitHub/EasyIDP/docs/jupyter/load_roi.ipynb Cell 51 in <cell line: 1>()
----> 1 complex_roi = idp.ROI(test_data.shp.complex_shp, name_field='CROPTYPE',encoding='gbk')

File ~/OneDrive/Program/GitHub/EasyIDP/easyidp/roi.py:77, in ROI.__init__(self, target_path, **kwargs)
     74 self.source = target_path
     76 if target_path is not None:
---> 77     self.open(target_path, **kwargs)

File ~/OneDrive/Program/GitHub/EasyIDP/easyidp/roi.py:136, in ROI.open(self, target_path, **kwargs)
    134 ext = os.path.splitext(target_path)[-1]
    135 if ext == ".shp":
--> 136     self.read_shp(target_path, **kwargs)
    137 elif ext == ".json":
    138     self.read_labelme_json(target_path)

File ~/OneDrive/Program/GitHub/EasyIDP/easyidp/roi.py:215, in ROI.read_shp(self, shp_path, shp_proj, name_field, include_title, encoding)
    142 """read ROI from shp file
    143
    144 Parameters
   (...)
    211
    212 """
    213 # if geotiff_proj is not None and shp_proj is not None and shp_proj.name != geotiff_proj.name:
...
--> 323         raise KeyError(f"Meet with duplicated key [{plot_name}] for current shapefile, please specify another `name_field` from {shp_fields} or simple leave it blank `name_field=None`")
    325     shp_dict[plot_name] = coord_np
    327 if return_proj:

KeyError: "Meet with duplicated key [玉米] for current shapefile, please specify another `name_field` from {'ID': 0, 'MASSIFID': 1, 'CROPTYPE': 2, 'CROPDATE': 3, 'CROPAREA': 4, 'ATTID': 5} or simple leave it blank `name_field=None`"