Education Python - Reading a CSV, reordering the cells into a new output

Discussion in 'General' started by DeadP1xels, 8 Apr 2020.

  1. DeadP1xels

    DeadP1xels Social distancing since 92

    Joined:
    30 Nov 2009
    Posts:
    5,843
    Likes Received:
    717
    Hey folks,

    Apologies if this is entirely the wrong place for this,

    Hoping someone might be able to help, I've been trying a bit of Python with all this spare time. I've done the basics so have a very loose grasp on the syntax.

    Basically I want to read in a CSV row by row, each row will relate to an imaginary manager and the staff working beneath them. I want break that row out into multiple rows where the amount of staff is not fixed (i.e. one manager might have 15 staff where as one has 1)


    For example:

    [​IMG]

    The top part of this CSV would be an example of an input with the bottom section an example of the desired output.

    I've been using the openpyxl library to begin, importing an existing workbook and doing general moving stuff between them but I'm struggling with how I would do the loop portion. I can logically talk my way through it but translating that to Python isn't so easy

    Heres where I've got to in my failed quest.

    Code:
    import openpyxl
    from openpyxl import Workbook
    import time
    
    #user prompt for file name
    print('Enter file name:')
    inputfile = input()
    
    #loads that workbook
    book = openpyxl.load_workbook(inputfile)
    HRsheet = book.active
    
    #Create blank workbook for output
    newbook = Workbook()
    newsheet = newbook.active
    
    #set up column headers for outputfile
    newsheet['A1']='Manager'
    newsheet['B1']='Age'
    newsheet['C1']='Sex'
    newsheet['D1']='Department'
    newsheet['E1']='Staff No'
    
    #Ask user how many managers there are/ how many time it needs to iterate through rows?
    print('Number of managers')
    managerno = input()
    
    #Where to start if the xlsx had headings.
    #print('Does your sheet have headings? Y/N')
    #headings = input()
    #if headings == 'Y'or'y':
    #    start = 'A2'
    #else:
    #    start = 'A1'
    
    #Takes values from HRSheet A2 stores them as a variable and puts them into specific cells within newsheet A3
    managername = HRsheet['A2']
    newsheet['A3']= (managername.value)
    
    #Give the file a name
    print('Give it a name:')
    outputfile = input()
    newbook.save(outputfile)#Save
    

    Any guidance appreciated! :)
     
  2. MLyons

    MLyons 70% Dev, 30% Doge. DevDoge Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    3 Mar 2017
    Posts:
    3,705
    Likes Received:
    1,894
    I'd highly recommend reading Automate the Boring Stuff with Python. The image you posted doesn't seem to work. Can you try to post it again and I'll have a look
     
    deathtaker27 likes this.
  3. DeadP1xels

    DeadP1xels Social distancing since 92

    Joined:
    30 Nov 2009
    Posts:
    5,843
    Likes Received:
    717
    Yeah I set the imgBB to delete after 5 minutes... smart ey!?

    I reloaded so should be up again.

    [​IMG]
     
  4. DeadP1xels

    DeadP1xels Social distancing since 92

    Joined:
    30 Nov 2009
    Posts:
    5,843
    Likes Received:
    717
    Sorry for the double folks,

    Been plugging away at this today and trying other solutions, I'm getting there slowly (but probably not efficiently)

    Below I'm reading the first row, after every count it will copy the cell value into a variable, it then copies that variable into a specific cell value of the new worksheet.

    I think I need a loop prior to this that will then loop rows so effectively

    1) Read row 1 - copy every cell value into variable and put that variable in the new worksheet
    2) When no more cell values are in row 1 the loop breaks
    3) returns to the first loop and row value +1
    4) Reads row 2 - '''' "" ""
    5) No more rows in file - breaks loop output saves new workbook!



    Code:
    import openpyxl
    from openpyxl import Workbook
    
    wb = openpyxl.load_workbook('GradeSample.xlsx')
    sheet = wb.active
    
    newwb = Workbook()
    newsheet = newwb.active
    
    newsheet['A1']='Name'
    newsheet['B1']='Section'
    newsheet['C1']='Writing Grade'
    newsheet['D1']='Speaking Grade'
    
    list(sheet.rows)[1] # Get the first row
    
    a = 0
    for cellObj in list(sheet.rows)[1]:
        if a == 0:
            pass
            Name=(cellObj.value)
            print(Name)
            newsheet.cell(row=2, column=1).value = Name
        if a == 1:
            pass
            Age=(cellObj.value)
            print(Age)
            newsheet.cell(row=2, column=2).value = Age
        if a == 2:
            pass
            Sex=(cellObj.value)
            print(Sex)
            newsheet.cell(row=2, column=3).value = Sex
        if a == 3:
            pass
            Department=(cellObj.value)
            print(Department)
            newsheet.cell(row=2, column=4).value = Department   
        a +=1
    
    
    
    
    print('Save file as:')
    newfile=input()
    newwb.save(newfile)#Save
    
    Am I getting closer...?
     
  5. MLyons

    MLyons 70% Dev, 30% Doge. DevDoge Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    3 Mar 2017
    Posts:
    3,705
    Likes Received:
    1,894
    Just having a look now. Are you after a finished working solution or would you prefer pointers?

    EDIT: I've got a highly commented working solution. It's about 50 lines including comments and runs instantly. It should also support if more departments are added in the future
     
    Last edited: 8 Apr 2020
    DeadP1xels likes this.
  6. DeadP1xels

    DeadP1xels Social distancing since 92

    Joined:
    30 Nov 2009
    Posts:
    5,843
    Likes Received:
    717
    I was more after pointers but if you do now have a solution that would be better, I can then reverse engineer it (or read the comments...) to understand how you got there :)
     
  7. MLyons

    MLyons 70% Dev, 30% Doge. DevDoge Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    3 Mar 2017
    Posts:
    3,705
    Likes Received:
    1,894
    OK ill post it in a bit. Be sure to go over that book I sent you further up. Its what I reccomend to anyone doing python
     
  8. DeadP1xels

    DeadP1xels Social distancing since 92

    Joined:
    30 Nov 2009
    Posts:
    5,843
    Likes Received:
    717
    Awesome cheers pal, yeah I've been plugging away at it, I'm going to buy a physical copy though. I just can't seem to absorb from the screen!
     
  9. GeorgeStorm

    GeorgeStorm Aggressive PC Builder

    Joined:
    16 Dec 2008
    Posts:
    6,759
    Likes Received:
    434
    Just thought I'd also mention pluralsight is free for april, never actually used any of their stuff but have been recommended it before and just signed up myself.
    Should be plenty of python stuff on there.
     
  10. MLyons

    MLyons 70% Dev, 30% Doge. DevDoge Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    3 Mar 2017
    Posts:
    3,705
    Likes Received:
    1,894
    Code:
    import openpyxl
    from openpyxl import Workbook
    
    sheet = openpyxl.load_workbook("GradeSample.xlsx").active
    
    newwb = Workbook()
    newsheet = newwb.active
    
    newsheet["A1"] = "Manager"
    newsheet["B1"] = "Age"
    newsheet["C1"] = "Sex"
    newsheet["D1"] = "Department"
    newsheet["E1"] = "Staff No"
    
    for index, row in enumerate(sheet.iter_rows()):
        # skip the first row
        if index <= 1:
            continue
        # remove set the value to a set variable and remove any whitespace surroundign the value
        managersName = row[0].value
        managersAge = row[1].value
        managersSex = row[2].value
        # stop the loop once we hit a blank cell
        if not managersName:
            break
    
        numberOfStaff = 0
        managers = []
        for cell in row[3:]:
            # if the cell isn't blank we increment the total of managers and add the manager to the list
            if cell.value != None:
                numberOfStaff += 1
                managers.append(
                    (
                        None,
                        None,
                        None,
                        sheet.cell(row=2, column=cell.column).value,
                        cell.value,
                    )
                )
        # Add the summary row to the sheet
        newsheet.append((managersName, managersAge, managersSex, None, numberOfStaff))
        # add each manager and department underneath
        for manager in managers:
            newsheet.append(manager)
    
    
    print("Save file as:")
    newfile = input()"
    newwb.save(newfile + ".xlsx")  # Save
    
    I was hoping that was going to have syntax highlighting but it seems it only supports HTML and PHP. You'll see the new filename has the extension automatically added.
     
  11. DeadP1xels

    DeadP1xels Social distancing since 92

    Joined:
    30 Nov 2009
    Posts:
    5,843
    Likes Received:
    717
    Thanks mate!

    I get an error....

    ---------------------------------------------------------------------------
    TypeError Traceback (most recent call last)
    <ipython-input-1-b7c9ad7abb48> in <module>()
    36 None,
    37 None,
    ---> 38 sheet.cell(row=2, column=cell.column).value,
    39 cell.value,
    40 )

    /anaconda3/lib/python3.7/site-packages/openpyxl/worksheet/worksheet.py in cell(self, row, column, value)
    293 """
    294
    --> 295 if row < 1 or column < 1:
    296 raise ValueError("Row or column values must be at least 1")
    297

    TypeError: '<' not supported between instances of 'str' and 'int'
     
  12. MLyons

    MLyons 70% Dev, 30% Doge. DevDoge Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    3 Mar 2017
    Posts:
    3,705
    Likes Received:
    1,894
    Try to remove the first blank line from the top of the Excel sheet
     
  13. DeadP1xels

    DeadP1xels Social distancing since 92

    Joined:
    30 Nov 2009
    Posts:
    5,843
    Likes Received:
    717
    No blank line at the top of the excel sheet :(

    When I commented out the line giving the issues it worked for the first two manager rows.
     
  14. MLyons

    MLyons 70% Dev, 30% Doge. DevDoge Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    3 Mar 2017
    Posts:
    3,705
    Likes Received:
    1,894
    odd. Here's the exact sheet I'm using.
    upload_2020-4-9_9-53-3.png
    Seems I did leave a row at the top. Which version of python are you using?
     
  15. DeadP1xels

    DeadP1xels Social distancing since 92

    Joined:
    30 Nov 2009
    Posts:
    5,843
    Likes Received:
    717
    So just using Jupiter notebooks, just says Python 3, the error sees to suggest Python 3.7

    If I comment out

    sheet.cell(row=2, column=cell.column).value

    I get almost the right thing, It however misses the first manager row and the names come under department.

    [​IMG]
     
    Last edited: 9 Apr 2020
  16. MLyons

    MLyons 70% Dev, 30% Doge. DevDoge Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    3 Mar 2017
    Posts:
    3,705
    Likes Received:
    1,894
    So to fix missing the first one edit the part i use to check for the first row. Can you try it in a normal python terminal.
     
  17. DeadP1xels

    DeadP1xels Social distancing since 92

    Joined:
    30 Nov 2009
    Posts:
    5,843
    Likes Received:
    717
    So changed the check row and that worked :)

    ran it here: https://repl.it/languages/python3

    and it works fine... It just throws and error in Jupyter Notebooks

    Really appreciate your help
     
  18. MLyons

    MLyons 70% Dev, 30% Doge. DevDoge Lover of bit-tech Administrator Super Moderator Moderator

    Joined:
    3 Mar 2017
    Posts:
    3,705
    Likes Received:
    1,894
    Hope the code makes sense. I tried to comment it as well as possible and format it with black but there still may be a few things you've not seen before.
     
  19. DeadP1xels

    DeadP1xels Social distancing since 92

    Joined:
    30 Nov 2009
    Posts:
    5,843
    Likes Received:
    717
    It does make sense! just trying to make sure I totally understand what's happening so I can expand on it further :)

    I think the difference between Jupyter Notebooks & the online editor is the version of openpyxl they're using
     

Share This Page