Frame join¶
-
join
(self, right, left_on, right_on=None, how='inner', name=None)¶ [BETA] Join operation on one or two frames, creating a new frame.
Parameters: right : Frame
Another frame to join with
left_on : str
Name of the column in the left frame used to match up the two frames.
right_on : str (default=None)
Name of the column in the right frame used to match up the two frames. Default is the same as the left frame.
how : str (default=inner)
How to qualify the data to be joined together. Must be one of the following: ‘left’, ‘right’, ‘inner’, ‘outer’. Default is ‘inner’
name : str (default=None)
Name of the result grouped frame
Returns: : Frame
A new frame with the results of the join
Create a new frame from a SQL JOIN operation with another frame. The frame on the ‘left’ is the currently active frame. The frame on the ‘right’ is another frame. This method takes a column in the left frame and matches its values with a column in the right frame. Using the default ‘how’ option [‘inner’] will only allow data in the resultant frame if both the left and right frames have the same value in the matching column. Using the ‘left’ ‘how’ option will allow any data in the resultant frame if it exists in the left frame, but will allow any data from the right frame if it has a value in its column which matches the value in the left frame column. Using the ‘right’ option works similarly, except it keeps all the data from the right frame and only the data from the left frame when it matches. The ‘outer’ option provides a frame with data from both frames where the left and right frames did not have the same value in the matching column.
Notes
When a column is named the same in both frames, it will result in two columns in the new frame. The column from the left frame (originally the current frame) will be copied and the column name will have the string “_L” added to it. The same thing will happen with the column from the right frame, except its name has the string “_R” appended. The order of columns after this method is called is not guaranteed.
It is recommended that you rename the columns to meaningful terms prior to using the
join
method. Keep in mind that unicode in column names will likely cause the drop_frames() method (and others) to fail!Examples
Consider two frames: codes and colors
>>> codes.inspect() [#] numbers ============ [0] 1 [1] 3 [2] 1 [3] 0 [4] 2 [5] 1 [6] 5 [7] 3
>>> colors.inspect() [#] numbers color ==================== [0] 1 red [1] 2 yellow [2] 3 green [3] 4 blue
Join them on the ‘numbers’ column (‘inner’ join by default)
>>> j = codes.join(colors, 'numbers') [===Job Progress===]
>>> j.inspect() [#] numbers color ==================== [0] 1 red [1] 3 green [2] 1 red [3] 2 yellow [4] 1 red [5] 3 green
(The join adds an extra column *_R which is the join column from the right frame; it may be disregarded)
Try a ‘left’ join, which includes all the rows of the codes frame.
>>> j_left = codes.join(colors, 'numbers', how='left') [===Job Progress===]
>>> j_left.inspect() [#] numbers color ==================== [0] 1 red [1] 3 green [2] 1 red [3] 0 None [4] 2 yellow [5] 1 red [6] 5 None [7] 3 green
And an outer join:
>>> j_outer = codes.join(colors, 'numbers', how='outer') [===Job Progress===]
>>> j_outer.inspect() [#] numbers color ==================== [0] 0 None [1] 1 red [2] 1 red [3] 1 red [4] 2 yellow [5] 3 green [6] 3 green [7] 4 blue [8] 5 None
More examples can be found in the user manual.