@@ -308,8 +308,8 @@ Sorting in SAS is accomplished via ``PROC SORT``
308308String processing
309309-----------------
310310
311- Length
312- ~~~~~~
311+ Finding length of string
312+ ~~~~~~~~~~~~~~~~~~~~~~~~
313313
314314SAS determines the length of a character string with the
315315`LENGTHN <https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002284668.htm >`__
@@ -327,8 +327,8 @@ functions. ``LENGTHN`` excludes trailing blanks and ``LENGTHC`` includes trailin
327327 .. include :: includes/length.rst
328328
329329
330- Find
331- ~~~~
330+ Finding position of substring
331+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
332332
333333SAS determines the position of a character in a string with the
334334`FINDW <https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002978282.htm >`__ function.
@@ -342,19 +342,11 @@ you supply as the second argument.
342342 put(FINDW(sex,' ale' ));
343343 run;
344344
345- Python determines the position of a character in a string with the
346- ``find `` function. ``find `` searches for the first position of the
347- substring. If the substring is found, the function returns its
348- position. Keep in mind that Python indexes are zero-based and
349- the function will return -1 if it fails to find the substring.
350-
351- .. ipython :: python
352-
353- tips[" sex" ].str.find(" ale" ).head()
345+ .. include :: includes/find_substring.rst
354346
355347
356- Substring
357- ~~~~~~~~~
348+ Extracting substring by position
349+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
358350
359351SAS extracts a substring from a string based on its position with the
360352`SUBSTR <https://www2.sas.com/proceedings/sugi25/25/cc/25p088.pdf >`__ function.
@@ -366,17 +358,11 @@ SAS extracts a substring from a string based on its position with the
366358 put(substr(sex,1 ,1 ));
367359 run;
368360
369- With pandas you can use ``[] `` notation to extract a substring
370- from a string by position locations. Keep in mind that Python
371- indexes are zero-based.
361+ .. include :: includes/extract_substring.rst
372362
373- .. ipython :: python
374363
375- tips[" sex" ].str[0 :1 ].head()
376-
377-
378- Scan
379- ~~~~
364+ Extracting nth word
365+ ~~~~~~~~~~~~~~~~~~~
380366
381367The SAS `SCAN <https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000214639.htm >`__
382368function returns the nth word from a string. The first argument is the string you want to parse and the
@@ -394,20 +380,11 @@ second argument specifies which word you want to extract.
394380 ;;;
395381 run;
396382
397- Python extracts a substring from a string based on its text
398- by using regular expressions. There are much more powerful
399- approaches, but this just shows a simple approach.
400-
401- .. ipython :: python
402-
403- firstlast = pd.DataFrame({" String" : [" John Smith" , " Jane Cook" ]})
404- firstlast[" First_Name" ] = firstlast[" String" ].str.split(" " , expand = True )[0 ]
405- firstlast[" Last_Name" ] = firstlast[" String" ].str.rsplit(" " , expand = True )[0 ]
406- firstlast
383+ .. include :: includes/nth_word.rst
407384
408385
409- Upcase, lowcase, and propcase
410- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
386+ Changing case
387+ ~~~~~~~~~~~~~
411388
412389The SAS `UPCASE <https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245965.htm >`__
413390`LOWCASE <https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245912.htm >`__ and
@@ -427,27 +404,13 @@ functions change the case of the argument.
427404 ;;;
428405 run;
429406
430- The equivalent Python functions are `` upper ``, `` lower ``, and `` title ``.
407+ .. include :: includes/case.rst
431408
432- .. ipython :: python
433-
434- firstlast = pd.DataFrame({" String" : [" John Smith" , " Jane Cook" ]})
435- firstlast[" string_up" ] = firstlast[" String" ].str.upper()
436- firstlast[" string_low" ] = firstlast[" String" ].str.lower()
437- firstlast[" string_prop" ] = firstlast[" String" ].str.title()
438- firstlast
439409
440410Merging
441411-------
442412
443- The following tables will be used in the merge examples
444-
445- .. ipython :: python
446-
447- df1 = pd.DataFrame({" key" : [" A" , " B" , " C" , " D" ], " value" : np.random.randn(4 )})
448- df1
449- df2 = pd.DataFrame({" key" : [" B" , " D" , " D" , " E" ], " value" : np.random.randn(4 )})
450- df2
413+ .. include :: includes/merge_setup.rst
451414
452415In SAS, data must be explicitly sorted before merging. Different
453416types of joins are accomplished using the ``in= `` dummy
@@ -473,39 +436,13 @@ input frames.
473436 if a or b then output outer_join;
474437 run;
475438
476- pandas DataFrames have a :meth: `~DataFrame.merge ` method, which provides
477- similar functionality. Note that the data does not have
478- to be sorted ahead of time, and different join
479- types are accomplished via the ``how `` keyword.
480-
481- .. ipython :: python
482-
483- inner_join = df1.merge(df2, on = [" key" ], how = " inner" )
484- inner_join
485-
486- left_join = df1.merge(df2, on = [" key" ], how = " left" )
487- left_join
488-
489- right_join = df1.merge(df2, on = [" key" ], how = " right" )
490- right_join
491-
492- outer_join = df1.merge(df2, on = [" key" ], how = " outer" )
493- outer_join
439+ .. include :: includes/merge.rst
494440
495441
496442Missing data
497443------------
498444
499- Like SAS, pandas has a representation for missing data - which is the
500- special float value ``NaN `` (not a number). Many of the semantics
501- are the same, for example missing data propagates through numeric
502- operations, and is ignored by default for aggregations.
503-
504- .. ipython :: python
505-
506- outer_join
507- outer_join[" value_x" ] + outer_join[" value_y" ]
508- outer_join[" value_x" ].sum()
445+ .. include :: includes/missing_intro.rst
509446
510447One difference is that missing data cannot be compared to its sentinel value.
511448For example, in SAS you could do this to filter missing values.
@@ -522,25 +459,7 @@ For example, in SAS you could do this to filter missing values.
522459 if value_x ^= .;
523460 run;
524461
525- Which doesn't work in pandas. Instead, the ``pd.isna `` or ``pd.notna `` functions
526- should be used for comparisons.
527-
528- .. ipython :: python
529-
530- outer_join[pd.isna(outer_join[" value_x" ])]
531- outer_join[pd.notna(outer_join[" value_x" ])]
532-
533- pandas also provides a variety of methods to work with missing data - some of
534- which would be challenging to express in SAS. For example, there are methods to
535- drop all rows with any missing values, replacing missing values with a specified
536- value, like the mean, or forward filling from previous rows. See the
537- :ref: `missing data documentation<missing_data> ` for more.
538-
539- .. ipython :: python
540-
541- outer_join.dropna()
542- outer_join.fillna(method = " ffill" )
543- outer_join[" value_x" ].fillna(outer_join[" value_x" ].mean())
462+ .. include :: includes/missing.rst
544463
545464
546465GroupBy
@@ -549,7 +468,7 @@ GroupBy
549468Aggregation
550469~~~~~~~~~~~
551470
552- SAS's PROC SUMMARY can be used to group by one or
471+ SAS's `` PROC SUMMARY `` can be used to group by one or
553472more key variables and compute aggregations on
554473numeric columns.
555474
@@ -561,14 +480,7 @@ numeric columns.
561480 output out= tips_summed sum = ;
562481 run;
563482
564- pandas provides a flexible ``groupby `` mechanism that
565- allows similar aggregations. See the :ref: `groupby documentation<groupby> `
566- for more details and examples.
567-
568- .. ipython :: python
569-
570- tips_summed = tips.groupby([" sex" , " smoker" ])[[" total_bill" , " tip" ]].sum()
571- tips_summed.head()
483+ .. include :: includes/groupby.rst
572484
573485
574486Transformation
@@ -597,16 +509,7 @@ example, to subtract the mean for each observation by smoker group.
597509 if a and b;
598510 run;
599511
600-
601- pandas ``groupby `` provides a ``transform `` mechanism that allows
602- these type of operations to be succinctly expressed in one
603- operation.
604-
605- .. ipython :: python
606-
607- gb = tips.groupby(" smoker" )[" total_bill" ]
608- tips[" adj_total_bill" ] = tips[" total_bill" ] - gb.transform(" mean" )
609- tips.head()
512+ .. include :: includes/transform.rst
610513
611514
612515By group processing
0 commit comments